# Introductory tutorial

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/DOV-Vlaanderen/pydov/master?filepath=docs%2Fnotebooks%2Fintroductory_tutorial.ipynb)

pydov provides machine access to the data that can be visualized with the [DOV viewer](https://www.dov.vlaanderen.be/portaal/?module=verkenner).

All the pydov functionalities rely on the existing DOV webservices. An in-depth overview of the available services and endpoints is provided on the [accessing DOV data page](https://pydov.readthedocs.io/en/latest/endpoints.html#endpoints). To retrieve data, pydov uses a combination of the available WFS services and the XML representation of the core DOV data.  

As pydov relies on the XML data returned by the existing DOV webservices, downloading DOV data with pydov is governed by the same [disclaimer](https://www.dov.vlaanderen.be/page/disclaimer) that applies to the other DOV services. Be sure to consult it when using pydov!

pydov interfaces data and services hosted by the Flemish governement. Therefore, some syntax of the API as well as the descriptions provided by the backend are in Dutch. 

## Use case: gather data for a hydrogeological model

In [1]:
%matplotlib inline
import inspect, sys

In [2]:
import pydov
import pandas as pd

### pydov: general info

To get started with pydov you should first determine which information you want to search for. DOV provides a lot of different datasets about soil, subsoil and groundwater of Flanders, some of which can be queried using pydov. Supported datasets are listed in the [quickstart](https://pydov.readthedocs.io/en/stable/quickstart.html).

In this case, to start with a hydrogeological model, we are interested in the hydrostratigraphic interpretation of the borehole data and the groundwater level. These datasets can be found with the following search objects:
- [Hydrostratigraphic interpretation](https://pydov.readthedocs.io/en/stable/reference.html#pydov.search.interpretaties.HydrogeologischeStratigrafieSearch)
- [Groundwater level](https://pydov.readthedocs.io/en/stable/reference.html#pydov.search.grondwaterfilter.GrondwaterFilterSearch)

Indeed, each of the datasets can be queried using a search object for the specific dataset. While the search objects are different, the workflow is the same for each dataset. Relevant classes can be imported from the pydov.search package, for example if we’d like to query the dataset with hydrostratigraphic interpretations of borehole data:

In [3]:
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
hs = HydrogeologischeStratigrafieSearch()

If you would like some more information or metadata about the data you can retrieve, you can query the search object. Since pydov interfaces services and metadata from Flemish government agencies, the descriptions are in Dutch:

In [4]:
hs.get_description()

'De hydrostratigrafie geeft, op basis van de (gecodeerde) lithologie, een indeling weer naar de al dan niet watervoerende eigenschappen van een bepaald beschreven diepte-interval. Deze interpretatie respecteert de lithostratigrafie van het Tertiair, maar deelt deze anders in. De hiervoor gebruikte standaard is de Hydrogeologische Codering van de Ondergrond van Vlaanderen (HCOV). Deze kan beschouwd worden als de officiele hydrogeologische codering voor het Vlaams Gewest.'

The different fields that are available for objects of the 'Hydrogeologische Stratigrafie' datatype can be requested with the get_fields() method:

In [5]:
fields = hs.get_fields()
# print available fields
for f in fields.values():
    print(f['name'])

pkey_interpretatie
Type_proef
Proefnummer
pkey_boring
x
y
start_interpretatie_mtaw
diepte_tot_m
gemeente
Auteurs
Datum
Opdrachten
betrouwbaarheid_interpretatie
Geldig_van
Geldig_tot
eerste_invoer
geom
diepte_laag_van
diepte_laag_tot
aquifer


You can get more information of a field by requesting it from the fields dictionary:

- name: name of the field
- definition: definition of this field
- cost: currently this is either 1 or 10, depending on the datasource of the field. It is an indication of the expected time it will take to retrieve this field in the output dataframe.
- notnull: whether the field is mandatory or not
- type: datatype of the values of this field
- query: whether you can use this field in an attribute query

Alternatively, you can either list all fields by consulting the output of the `get_fields()` method, or the search instance itself:

In [6]:
hs

In [7]:
fields['pkey_interpretatie']

The fields `pkey_interpretatie` and `pkey_boring` are important identifiers. In this case `pkey_interpretatie` is the unique identifier of this interpretation and is also the **permanent url** where the data can be consulted (~https://www.dov.vlaanderen.be/data/interpretatie/...). You can retrieve an XML representation by appending '.xml' to the URL, or a JSON equivalent by appending '.json'.

The `pkey_boring` is the identifier of the borehole from which this interpretation was made. As mentioned before, it is also  the **permanent url** (~https://www.dov.vlaanderen.be/data/boring/...). 

Optionally, if the field has an associated codelist, this is listed as *codelist*:

In [8]:
fields['Type_proef']

### Query the data with pydov

#### Attributes

The data can be queried on **attributes**, **location** or both. To query on attributes, the OGC filter functions from OWSLib are used:

In [9]:
# list available query methods
methods = [i for i,j in inspect.getmembers(sys.modules['owslib.fes2'], 
                                           inspect.isclass) 
           if 'Property' in i]
print(*methods, sep = "\n") 

PropertyIsBetween
PropertyIsEqualTo
PropertyIsGreaterThan
PropertyIsGreaterThanOrEqualTo
PropertyIsLessThan
PropertyIsLessThanOrEqualTo
PropertyIsLike
PropertyIsNotEqualTo
PropertyIsNull
SortProperty


If you are for example interested in all the hydrostratigraphic interpretations in the city of Leuven, you compose the query like below (mind that the values are in Dutch):

In [10]:
from owslib.fes2 import PropertyIsEqualTo
query = PropertyIsEqualTo(
            propertyname='gemeente',
            literal='Leuven')
dfhs = hs.search(query=query)
dfhs.head()

[000/001] .
[000/036] cccccccccccccccccccccccccccccccccccc


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,start_interpretatie_mtaw,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,0.0,4.5,162
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,4.5,59.0,620
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,59.0,90.7,900
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,90.7,110.0,1013
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,110.0,130.0,1014


This yielded 38 interpretations from 38, or less, boreholes. It can be less than 38 boreholes because multiple interpretations can be made of a single borehole. 

If you would like to narrow the search down to for example interpretations deeper than 200 meters, you can combine features in the search using the **logical operators And, Or** provided by OWSLib:

In [11]:
from owslib.fes2 import And
from owslib.fes2 import PropertyIsGreaterThan
query = And([
    PropertyIsEqualTo(
            propertyname='gemeente',
            literal='Leuven'),
    PropertyIsGreaterThan(
            propertyname='diepte_tot_m',
            literal='200')
    ])
dfhs = hs.search(query=query)
dfhs.head()

[000/001] .
[000/001] c


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,start_interpretatie_mtaw,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,0.0,4.5,162
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,4.5,59.0,620
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,59.0,90.7,900
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,90.7,110.0,1013
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,110.0,130.0,1014


Mind the difference between attributes `diepte_tot_m` and `diepte_laag_...`. The former is defined in the WFS service and can be used as attribute in the query. The latter attributes are defined in the linked XML document, from which the information is only available after it has been gathered from the DOV webservice. All the attributes with cannot be used in the intial query and should be used in a subsequent filtering of the Pandas DataFrame.

More information on querying attribute properties is given in the [docs](https://pydov.readthedocs.io/en/stable/query_attribute.html). Worth mentioning is the query using lists where pydov extends the default OGC filter expressions described with a new expression **PropertyInList** that allows you to use lists (of strings) in search queries.

One last goodie is the possibility to join searches using common attibutes. For example the `pkey_boring` field, denoting the borehole. As such, you can get the boreholes for which a hydrostratigraphical interpretation is available, and also query the lithological description of that borehole. Like below:

In [12]:
from pydov.util.query import Join
from pydov.search.interpretaties import LithologischeBeschrijvingenSearch

ls = LithologischeBeschrijvingenSearch()
dfls = ls.search(query=Join(dfhs, 'pkey_boring'))
df_joined = pd.merge(dfhs, dfls.loc[:, ['pkey_boring','diepte_laag_van', 'diepte_laag_tot', 'beschrijving']],  
                     how='left', 
                     left_on=['pkey_boring','diepte_laag_van', 'diepte_laag_tot'], 
                     right_on = ['pkey_boring','diepte_laag_van', 'diepte_laag_tot']
                    )
df_joined.head()

[000/001] .
[000/001] c


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,start_interpretatie_mtaw,diepte_laag_van,diepte_laag_tot,aquifer,beschrijving
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,0.0,4.5,162,
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,4.5,59.0,620,"sable gris quartzeux, avec grès gris quartzeux..."
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,59.0,90.7,900,argile grise finement sableuse
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,90.7,110.0,1013,
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1932...,goed,173252.0,179257.0,17.0,110.0,130.0,1014,"sable argileux gris, avec petits débris broyés..."


#### Location

One can also query on **location**, using the location objects and spatial filters from the pydov.util.location module. For example, to request all hydrostratigraphic interpretations in a given bounding **box**:

In [13]:
from pydov.util.location import Within, Box
location = Within(Box(170000, 171000, 172000, 173000))
df = hs.search(location=location)
df.head()

[000/001] .
[000/005] ccccc


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,start_interpretatie_mtaw,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1946...,goed,170355.0,171118.0,24.4,0.0,9.0,163
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1946...,goed,170355.0,171118.0,24.4,9.0,10.5,920
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170853.0,172888.0,44.0,0.0,5.0,153
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170853.0,172888.0,44.0,5.0,42.0,620
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/2016...,goed,170853.0,172888.0,44.0,42.0,43.0,922


Alternatively, you can define a **Point** or a **GML document** for the spatial query as is described in the [docs](https://pydov.readthedocs.io/en/stable/query_location.html). For example, if you are interested in a site you can define the point with a search radius of for example 500 meters like this:

In [14]:
from pydov.util.location import WithinDistance, Point
location = WithinDistance(
            Point(171500, 172500), 
            500, 
            distance_unit='meter'
            )
df = hs.search(location=location)
df.head()

[000/001] .
[000/001] c


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,start_interpretatie_mtaw,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,0.0,0.6,110
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,0.6,14.4,100
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,14.4,95.1,0
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,95.1,118.9,1100
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,118.9,124.5,1300


#### Groundwater head data

Querying the groundwater head data follows the same workflow as mentioned above for the interpretation of borehole data with the instantiation of a search object and the subsequent query with selection on attribute or location properties. 

In [15]:
from pydov.search.grondwaterfilter import GrondwaterFilterSearch
gws = GrondwaterFilterSearch()
gws

For example query all data in a bounding box:

In [16]:
location = Within(Box(170000, 171000, 173000, 174000))
df = gws.search(location=location)
df.head()

[000/001] .
[000/055] cccccccccccccccccccccccccccccccccccccccccccccccccc
[050/055] ccccc


Unnamed: 0,pkey_filter,pkey_grondwaterlocatie,gw_id,filternummer,filtertype,x,y,start_grondwaterlocatie_mtaw,mv_mtaw,gemeente,...,regime,diepte_onderkant_filter,lengte_filter,datum,tijdstip,peil_mtaw,betrouwbaarheid,methode,filterstatus,filtertoestand
0,https://www.dov.vlaanderen.be/data/filter/2000...,https://www.dov.vlaanderen.be/data/put/2018-00...,DYLP162,1,peilfilter,170716.0,172051.0,24.48,24.48,Leuven,...,onbekend,2.18,0.5,2000-09-01,,23.48,onbekend,peillint,onbekend,1.0
1,https://www.dov.vlaanderen.be/data/filter/2000...,https://www.dov.vlaanderen.be/data/put/2018-00...,DYLP162,1,peilfilter,170716.0,172051.0,24.48,24.48,Leuven,...,onbekend,2.18,0.5,2000-09-15,,23.65,onbekend,peillint,onbekend,1.0
2,https://www.dov.vlaanderen.be/data/filter/2000...,https://www.dov.vlaanderen.be/data/put/2018-00...,DYLP162,1,peilfilter,170716.0,172051.0,24.48,24.48,Leuven,...,onbekend,2.18,0.5,2000-09-27,,23.67,onbekend,peillint,onbekend,1.0
3,https://www.dov.vlaanderen.be/data/filter/2000...,https://www.dov.vlaanderen.be/data/put/2018-00...,DYLP162,1,peilfilter,170716.0,172051.0,24.48,24.48,Leuven,...,onbekend,2.18,0.5,2000-10-09,,23.73,onbekend,peillint,onbekend,1.0
4,https://www.dov.vlaanderen.be/data/filter/2000...,https://www.dov.vlaanderen.be/data/put/2018-00...,DYLP162,1,peilfilter,170716.0,172051.0,24.48,24.48,Leuven,...,onbekend,2.18,0.5,2000-10-18,,23.77,onbekend,peillint,onbekend,1.0


One important difference is the presence of time-related data. More specifically the attributes `datum` and `tijdstip`. These can be combined to create a date.datetime object that can be used in the subsequent manipuliation of the Pandas DataFrame. Make sure to **remove** the records without a valid `datum` and **fill** the empty  `tijdstip` fields with a default timestamp (!)

In [None]:
import pandas as pd
df.reset_index(inplace=True)
df = df.loc[~df.datum.isna()]
df['tijdstip'] = df.tijdstip.fillna('00:00:00')
df['tijd'] = pd.to_datetime(df.datum.astype(str) + ' ' + df.tijdstip.astype(str))
df.tijd.head()

0   2000-09-01
1   2000-09-15
2   2000-09-27
3   2000-10-09
4   2000-10-18
Name: tijd, dtype: datetime64[ns]

More examples for the timeseries processing and analysis is available in the Notebooks of pydov.

### Data cache

Notice the cc in the progress bar while loading of the data? It means the data was loaded from your local cache instead of being downloaded, as it was already part of an earlier data request. See the [caching documentation](https://pydov.readthedocs.io/en/stable/caching.html#caching) for more in-depth information about the default directory, how to change and/or clean it, and even how to create some custom cache format.

## Putting it all together

In [18]:
# imports
import pandas as pd
import pydov
from pydov.util.location import WithinDistance, Point
from pydov.util.query import Join
from pydov.search.interpretaties import LithologischeBeschrijvingenSearch
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
from pydov.search.grondwaterfilter import GrondwaterFilterSearch
from owslib.fes2 import PropertyIsEqualTo

# define search objects
hs = HydrogeologischeStratigrafieSearch()
ls = LithologischeBeschrijvingenSearch()
gws = GrondwaterFilterSearch()

# search hydrostratigraphic interpretations based on location
location = WithinDistance(
    Point(171500, 172500), 
    500,
    distance_unit='meter'
    )
dfhs = hs.search(location=location)

# join the lithostratigraphic desriptions
dfls = ls.search(query=Join(dfhs, 'pkey_boring'))
df_joined = pd.merge(dfhs, dfls.loc[:, ['pkey_boring','diepte_laag_van', 'diepte_laag_tot', 'beschrijving']],  
                     how='left', 
                     left_on=['pkey_boring','diepte_laag_van', 'diepte_laag_tot'], 
                     right_on = ['pkey_boring','diepte_laag_van', 'diepte_laag_tot']
                    )

# search the groundwater head data in the neighbourhoud
dfgw = gws.search(location=location)

# create date.datetime objects for further processing
dfgw.reset_index(inplace=True)
dfgw = dfgw.loc[~dfgw.datum.isna()]
dfgw['tijdstip'] = dfgw.tijdstip.fillna('00:00:00')
dfgw['tijd'] = pd.to_datetime(dfgw.datum.astype(str) + ' ' + dfgw.tijdstip.astype(str))

[000/001] .
[000/001] c
[000/001] .
[000/002] cc
[000/001] .
[000/010] cccccccccc


In [19]:
df_joined.head()

Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,start_interpretatie_mtaw,diepte_laag_van,diepte_laag_tot,aquifer,beschrijving
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,0.0,0.6,110,aangevulde grond
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,0.6,14.4,100,
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,14.4,95.1,0,Brusseliaan - Ieperiaan en Landeniaan
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,95.1,118.9,1100,Krijt
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,goed,171548.77,172680.92,26.39,118.9,124.5,1300,Primair


In [20]:
dfgw.head()

Unnamed: 0,index,pkey_filter,pkey_grondwaterlocatie,gw_id,filternummer,filtertype,x,y,start_grondwaterlocatie_mtaw,mv_mtaw,...,diepte_onderkant_filter,lengte_filter,datum,tijdstip,peil_mtaw,betrouwbaarheid,methode,filterstatus,filtertoestand,tijd
1,1,https://www.dov.vlaanderen.be/data/filter/1974...,https://www.dov.vlaanderen.be/data/put/2017-00...,2-0005,1,peilfilter,171548.77,172680.92,26.39,26.39,...,118.95,31.5,1984-01-26,00:00:00,12.53,goed,peillint,in rust,1.0,1984-01-26
2,2,https://www.dov.vlaanderen.be/data/filter/1974...,https://www.dov.vlaanderen.be/data/put/2017-00...,2-0005,1,peilfilter,171548.77,172680.92,26.39,26.39,...,118.95,31.5,1984-09-30,00:00:00,12.53,goed,peillint,in rust,1.0,1984-09-30
3,3,https://www.dov.vlaanderen.be/data/filter/1974...,https://www.dov.vlaanderen.be/data/put/2017-00...,2-0005,1,peilfilter,171548.77,172680.92,26.39,26.39,...,118.95,31.5,1985-11-01,00:00:00,11.57,goed,peillint,in rust,1.0,1985-11-01
4,4,https://www.dov.vlaanderen.be/data/filter/1974...,https://www.dov.vlaanderen.be/data/put/2017-00...,2-0005,1,peilfilter,171548.77,172680.92,26.39,26.39,...,118.95,31.5,1986-04-04,00:00:00,11.28,goed,peillint,in rust,1.0,1986-04-04
5,5,https://www.dov.vlaanderen.be/data/filter/1974...,https://www.dov.vlaanderen.be/data/put/2017-00...,2-0005,1,peilfilter,171548.77,172680.92,26.39,26.39,...,118.95,31.5,1986-10-01,00:00:00,10.86,goed,peillint,in rust,1.0,1986-10-01
