# Introductory tutorial

[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/DOV-Vlaanderen/pydov/master?filepath=docs%2Fnotebooks%2FIntroductory_tutorial.ipynb)

pydov provides machine access to the data that can be visualized with the DOV viewer at [this link](https://www.dov.vlaanderen.be/portaal/?module=verkenner)

All the pydov functionalities rely on the existing DOV webservices. An in-depth overview of the available services and endpoints is provided on the [accessing DOV data page](https://pydov.readthedocs.io/en/latest/endpoints.html#endpoints). To retrieve data, pydov uses a combination of the available WFS services and the XML representation of the core DOV data.  

As pydov relies on the XML data returned by the existing DOV webservices, downloading DOV data with pydov is governed by the same [disclaimer](https://www.dov.vlaanderen.be/page/disclaimer) that applies to the other DOV services. Be sure to consult it when using DOV data with pydov!

pydov interfaces a database hosted by the Flemish governement. Therefore, some syntax of the API as well as the descriptions provided by the backend are in Dutch. 

## Use case: gather data for a hydrogeological model

In [84]:
%matplotlib inline
import inspect, sys

In [85]:
# check pydov path
import pydov

### pydov: general info

To get started with pydov you should first determine which information you want to search for. DOV provides a lot of different datasets about soil, subsoil and groundwater of Flanders, some of which can be queried using pydov. See https://pydov.readthedocs.io/en/latest/quickstart.html for the supported datasets.

In this case, to start with a hydrogeological model, we are interested in the hydrostratigraphic interpretation of the borehole data and the groundwater level. These datasets can be found with the following search objects:
- Hydrodstratigrapic interpretation: https://pydov.readthedocs.io/en/latest/reference.html#pydov.search.interpretaties.HydrogeologischeStratigrafieSearch
- Groundwater level:
https://pydov.readthedocs.io/en/latest/reference.html#pydov.search.grondwaterfilter.GrondwaterFilterSearch

Indeed, each of the datasets can be queried using a search object for the specific dataset. While the search objects are different, the workflow is the same for each dataset. Relevant classes can be imported from the pydov.search package, for example if we’d like to query the dataset with hydrogeological interpretations of borehole data:

In [86]:
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
hs = HydrogeologischeStratigrafieSearch()

If you would like some more information, you can query the search object. Sincy pydov interfaces a database from Flemish government agencies, the descriptions are in Dutch:

In [87]:
hs.get_description()

'De hydrostratigrafie geeft, op basis van de (gecodeerde) lithologie, een indeling weer naar de al dan niet watervoerende eigenschappen van een bepaald beschreven diepte-interval. Deze interpretatie respecteert de lithostratigrafie van het Tertiair, maar deelt deze anders in. De hiervoor gebruikte standaard is de Hydrogeologische Codering van de Ondergrond van Vlaanderen (HCOV). Deze kan beschouwd worden als de officiele hydrogeologische codering voor het Vlaams Gewest.'

The different fields that are available for objects of the 'Hydrogeologische Stratigrafie' datatype can be requested with the get_fields() method:

In [88]:
fields = hs.get_fields()
# print available fields
for f in fields.values():
    print(f['name'])

pkey_interpretatie
Type_proef
Proefnummer
pkey_boring
x
y
Z_mTAW
diepte_tot_m
gemeente
Auteurs
Datum
Opdrachten
betrouwbaarheid_interpretatie
Geldig_van
Geldig_tot
diepte_laag_van
diepte_laag_tot
aquifer


You can get more information of a field by requesting it from the fields dictionary:

- name: name of the field
- definition: definition of this field
- cost: currently this is either 1 or 10, depending on the datasource of the field. It is an indication of the expected time it will take to retrieve this field in the output dataframe.
- notnull: whether the field is mandatory or not
- type: datatype of the values of this field

In [89]:
fields['pkey_interpretatie']

{'name': 'pkey_interpretatie',
 'definition': "URL die verwijst naar de gegevens van deze hydrogeologische stratigrafie op de website. Voeg '.xml' toe om een XML voorstelling van deze gegevens te verkrijgen.",
 'type': 'string',
 'notnull': False,
 'query': True,
 'cost': 1}

The fields `pkey_interpretatie` and `pkey_boring` are important identifiers. In this case `pkey_interpretatie` is the unique identifier of this interpretation and is also the **permanent url** where the data can be viewed (~https://www.dov.vlaanderen.be/data/interpretatie/...), or retrieved if you additionally add ... + **.** *xml* 

The `pkey_boring` is the identifier of the borehole from which this interpretation was made. As mentioned before, it is also  the **permanent url** (~https://www.dov.vlaanderen.be/data/boring/...). 

Optionally, if the values of a field have a specific domain the possible values are listed as *values*:

In [90]:
fields['aquifer']['values']

{'0000': 'Onbekend',
 '0100': 'Quartaire aquifersystemen',
 '0110': 'Ophogingen',
 '0120': 'Duinen',
 '0130': 'Polderafzettingen',
 '0131': 'Kleiige polderafzettingen van de kustvlakte',
 '0132': 'Kleiige polderafzettingen van het Meetjesland',
 '0133': 'Kleiige polderafzettingen van Waasland-Antwerpen',
 '0134': 'Zandige kreekruggen',
 '0135': 'Veen-kleiige poelgronden',
 '0140': 'Alluviale deklagen',
 '0150': 'Deklagen',
 '0151': 'Zandige deklagen',
 '0152': 'Zand-lemige deklagen',
 '0153': 'Lemige deklagen',
 '0154': 'Kleiige deklagen',
 '0160': 'Pleistocene afzettingen',
 '0161': 'Pleistoceen van de kustvlakte',
 '0162': 'Pleistoceen van de Vlaamse Vallei',
 '0163': 'Pleistoceen van de riviervalleien',
 '0170': 'Maas- en Rijnafzettingen',
 '0171': 'Afzettingen Hoofdterras',
 '0172': 'Afzettingen Tussenterassen',
 '0173': 'Afzettingen Maasvlakte',
 '0200': 'Kempens Aquifersysteem',
 '0210': 'Kiezeloolietformatie ten noorden van Feldbiss',
 '0211': 'Zandige eenheid boven de Brunssum 

### Query the data with pydov

#### Attributes

The data can be queried on **attributes**, **location** or both. To query on attributes, the OGC filter functions from OWSLib are used:

In [91]:
# list available query methods
methods = [i for i,j in inspect.getmembers(sys.modules['owslib.fes'], 
                                           inspect.isclass) 
           if 'Property' in i]
print(*methods, sep = "\n") 

PropertyIsBetween
PropertyIsEqualTo
PropertyIsGreaterThan
PropertyIsGreaterThanOrEqualTo
PropertyIsLessThan
PropertyIsLessThanOrEqualTo
PropertyIsLike
PropertyIsNotEqualTo
PropertyIsNull
SortProperty


If you are for example interested in all the hydrostratigraphic interpretations in the city of Louvain, you compose the query like below (mind that the values are in Dutch):

In [92]:
from owslib.fes import PropertyIsEqualTo
query = PropertyIsEqualTo(
            propertyname='gemeente',
            literal='Leuven')
dfhs = hs.search(query=query)
dfhs.head()

[000/038] cccccccccccccccccccccccccccccccccccccc


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1889...,goed,169506.0,173442.0,0.0,5.5,163
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1889...,goed,169506.0,173442.0,5.5,22.8,450
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1934...,goed,177360.0,175051.0,0.0,21.0,0
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1934...,goed,177360.0,175051.0,21.0,25.0,450
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1934...,goed,177360.0,175051.0,25.0,33.0,612


This yielded 38 interpretations from 38, or less, boreholes. It can be less than 38 boreholes because multiple interpretations can be made of a single borehole. 

If you would like to narrow the search down to for example data below 200 meters, you can combine features in the search using the **logical operators And, Or** provided by OWSLib:

In [93]:
from owslib.fes import And
from owslib.fes import PropertyIsGreaterThan
query = And([
    PropertyIsEqualTo(
            propertyname='gemeente',
            literal='Leuven'),
    PropertyIsGreaterThan(
            propertyname='diepte_tot_m',
            literal='200')
    ])
dfhs = hs.search(query=query)
dfhs.head()

[000/001] c


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,0.0,4.5,162
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,4.5,59.0,620
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,59.0,90.7,900
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,90.7,110.0,1013
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,110.0,130.0,1014


Mind the difference between attributes `diepte_tot_m` and *diepte_laag_...*. The former is defined in the WFS service and can be used as attribute in the query. The latter attributes are defined in the linked XML document, from which the information is only available after it has been gathered from the DOV webservice. All the attributes with a *cost* of 10 are not available in the intial query and should be used in a subsequent filtering of the Pandas DataFrame

In [94]:
print('Cost of search\n', 
      'WFS attribute (diepte_tot_m): ' + str(fields['diepte_tot_m']['cost']) + '\n', 
      'XML attribute (diepte_laag_tot): '+ str(fields['diepte_laag_tot']['cost']))

Cost of search
 WFS attribute (diepte_tot_m): 1
 XML attribute (diepte_laag_tot): 10


More information on querying attribute properties is given in the [docs](https://pydov.readthedocs.io/en/latest/query_attribute.html). Worth mentioning is the query using lists where pydov extends the default OGC filter expressions described with a new expression **PropertyInList** that allows you to use lists (of strings) in search queries.

One last goodie is the possibility to join searches using common attibutes. For example the `pkey_boring` field, denoting the borehole. As such, you can get the boreholes for which a hydrostratigraphical interpretation is available, and also query the lithological description of that borehole. Like below:

In [95]:
from pydov.util.query import Join
from pydov.search.interpretaties import LithologischeBeschrijvingenSearch

ls = LithologischeBeschrijvingenSearch()
dfls = ls.search(query=Join(dfhs, 'pkey_boring'))
df_joined = pd.merge(dfhs, dfls.loc[:, ['pkey_boring','diepte_laag_van', 'diepte_laag_tot', 'beschrijving']],  
                     how='left', 
                     left_on=['pkey_boring','diepte_laag_van', 'diepte_laag_tot'], 
                     right_on = ['pkey_boring','diepte_laag_van', 'diepte_laag_tot']
                    )
df_joined.head()

[000/001] c


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer,beschrijving
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,0.0,4.5,162,
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,4.5,59.0,620,"sable gris quartzeux, avec grès gris quartzeux..."
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,59.0,90.7,900,argile grise finement sableuse
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,90.7,110.0,1013,
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,110.0,130.0,1014,"sable argileux gris, avec petits débris broyés..."


#### Location

One can also query on **location**, using the location objects and spatial filters from the pydov.util.location module. For example, to request all stratigraphic interpretations in a given bounding **box**:

In [96]:
from pydov.util.location import Within, Box
location = Within(Box(170000, 171000, 172000, 173000))
df = hs.search(location=location
                       )
df.head()

[000/005] ccccc


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170085.97,171027.67,0.0,6.0,140
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170085.97,171027.67,6.0,10.0,160
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170085.97,171027.67,10.0,30.0,920
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170085.97,171027.67,30.0,39.0,1010
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1987...,goed,170640.0,172567.0,0.0,6.2,163


Alternatively, you can define a **Point** or a **GML document** for the spatial query as is described in the [docs](https://pydov.readthedocs.io/en/latest/query_location.html). For example, if you are interested in a site you can define the point with a search radius of for example 500 meters like this:

In [97]:
from pydov.util.location import WithinDistance, Point
location = WithinDistance(
            Point(171500, 172500), 
            500, 
            distance_unit='meter'
            )
df = hs.search(location=location
                )
df.head()

[000/001] c


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.64,172680.85,0.0,0.6,110
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.64,172680.85,0.6,14.4,100
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.64,172680.85,14.4,95.1,0
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.64,172680.85,95.1,118.9,1100
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.64,172680.85,118.9,124.5,1300


#### Groundwater head data

Querying the groundwater head data follows the same workflow as mentioned above for the interpretation of borehole data with the definition of a search object and the subsequent query with selection on attribute or location properties. 

In [98]:
from pydov.search.grondwaterfilter import GrondwaterFilterSearch
gws = GrondwaterFilterSearch()
fields = gws.get_fields()
# print available fields
for f in fields.values():
    print(f['name'])

gw_id
pkey_grondwaterlocatie
filternummer
pkey_filter
namen
filtergrafiek
putgrafiek
aquifer
diepte_onderkant_filter
lengte_filter
putsoort
filtertype
meetnet
x
y
start_grondwaterlocatie_mtaw
gemeente
grondwaterlichaam
regime
datum_in_filter
datum_uit_filter
stijghoogterapport
analyserapport
boornummer
boringfiche
peilmetingen_van
peilmetingen_tot
kwaliteitsmetingen_van
kwaliteitsmetingen_tot
recentste_exploitant
beheerder
mv_mtaw
meetnet_code
aquifer_code
grondwaterlichaam_code
datum
tijdstip
peil_mtaw
betrouwbaarheid
methode
filterstatus
filtertoestand


For example query all data in a bounding box from filters that are situated in the phreatic aquifer and are property of the Vlaamse Milieumaatschappij:

In [99]:
query = PropertyIsEqualTo(
            propertyname='regime',
            literal='freatisch')
location = Within(Box(170000, 171000, 173000, 174000))
df = gws.search(
                query=query,
                location=location)
df.head()

[000/013] ccccccccccccc


Unnamed: 0,pkey_filter,pkey_grondwaterlocatie,gw_id,filternummer,filtertype,x,y,start_grondwaterlocatie_mtaw,mv_mtaw,gemeente,...,regime,diepte_onderkant_filter,lengte_filter,datum,tijdstip,peil_mtaw,betrouwbaarheid,methode,filterstatus,filtertoestand
0,https://www.dov.vlaanderen.be/data/filter/2016...,https://www.dov.vlaanderen.be/data/put/2019-01...,3008-066,1,peilfilter,170085.97,171027.67,23.86,23.86,Bertem,...,freatisch,3.5,1.0,,,,,,,
1,https://www.dov.vlaanderen.be/data/filter/2016...,https://www.dov.vlaanderen.be/data/put/2019-01...,3008-066,2,peilfilter,170085.97,171027.67,23.86,23.86,Bertem,...,freatisch,9.3,1.0,,,,,,,
2,https://www.dov.vlaanderen.be/data/filter/1948...,https://www.dov.vlaanderen.be/data/put/2019-03...,3008-018,0,pompfilter,170202.0,171179.0,23.12,23.12,Bertem,...,freatisch,11.23,2.5,,,,,,,
3,https://www.dov.vlaanderen.be/data/filter/1975...,https://www.dov.vlaanderen.be/data/put/2019-04...,3008-019,0,pompfilter,170131.0,171105.0,23.39,23.39,Bertem,...,freatisch,8.83,2.0,,,,,,,
4,https://www.dov.vlaanderen.be/data/filter/2017...,https://www.dov.vlaanderen.be/data/put/2019-01...,2-103155,1,pompfilter,170270.0,173561.0,-1.0,-1.0,Leuven,...,freatisch,,,,,,,,,


One important difference is the presence of time-related data. More specifically the attributes `datum` and `tijdstip`. These can be combined to create a date.datetime object that can be used in the subsequent manipuliation of the Pandas DataFrame. Make sure to **remove** the records without a valid `datum` and **fill** the empty  `tijdstip` fields with a default timestamp (!)

In [100]:
import pandas as pd
df.reset_index(inplace=True)
df = df.loc[~df.datum.isna()]
df['tijdstip'] = df.tijdstip.fillna('00:00:00')
df['tijd'] = pd.to_datetime(df.datum.astype(str) + ' ' + df.tijdstip.astype(str))
df.tijd.head()

12   2012-11-30 00:00:00
13   2012-12-06 00:00:00
14   2013-01-09 12:00:00
15   2013-01-22 00:00:00
16   2013-02-09 12:00:00
Name: tijd, dtype: datetime64[ns]

More examples for the timeseries processing and analysis is available in the Notebooks of pydov.

### Data cache

Notice the cc in the progress bar while loading of the data? It means the data was loaded from your local cache instead of being downloaded, as it was already part of an earlier data request. See the [caching documentation](https://pydov.readthedocs.io/en/latest/caching.html#caching) for more in-depth information about the default directory, how to change and/or clean it, and even how to create some custom cache format.

## Putting it all together

In [102]:
# imports
import pandas as pd
import pydov
from pydov.util.location import WithinDistance, Point
from pydov.util.query import Join
from pydov.search.interpretaties import LithologischeBeschrijvingenSearch
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
from pydov.search.grondwaterfilter import GrondwaterFilterSearch
from owslib.fes import PropertyIsEqualTo

# define search objects
hs = HydrogeologischeStratigrafieSearch()
ls = LithologischeBeschrijvingenSearch()
gws = GrondwaterFilterSearch()

# search hydrostratigraphic interpretations based on location
location = WithinDistance(
    Point(171500, 172500), 
    500,
    distance_unit='meter'
    )
dfhs = hs.search(location=location)

# join the lithostratigraphic desriptions
dfls = ls.search(query=Join(dfhs, 'pkey_boring'))
df_joined = pd.merge(dfhs, dfls.loc[:, ['pkey_boring','diepte_laag_van', 'diepte_laag_tot', 'beschrijving']],  
                     how='left', 
                     left_on=['pkey_boring','diepte_laag_van', 'diepte_laag_tot'], 
                     right_on = ['pkey_boring','diepte_laag_van', 'diepte_laag_tot']
                    )

# search the groundwater head data of the phreatic aquifers in the neighbourhoud
query = PropertyIsEqualTo(
            propertyname='regime',
            literal='freatisch')
location = Within(Box(170000, 171000, 173000, 174000))
dfgw = gws.search(query=query,
                location=location)

# create date.datetime objects for further processing
dfgw.reset_index(inplace=True)
dfgw = dfgw.loc[~dfgw.datum.isna()]
dfgw['tijdstip'] = dfgw.tijdstip.fillna('00:00:00')
dfgw['tijd'] = pd.to_datetime(dfgw.datum.astype(str) + ' ' + dfgw.tijdstip.astype(str))

[000/001] c
[000/002] cc
[000/013] ccccccccccccc
