# Introductory tutorial

[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/DOV-Vlaanderen/pydov/master?filepath=docs%2Fnotebooks%2FIntroductory_tutorial.ipynb)

pydov provides machine access to the data that can be visualized with the DOV viewer at [this link](https://www.dov.vlaanderen.be/portaal/?module=verkenner)

All the pydov functionalities rely on the existing DOV webservices. An in-depth overview of the available services and endpoints is provided on the [accessing DOV data page](https://pydov.readthedocs.io/en/latest/endpoints.html#endpoints). To retrieve data, pydov uses a combination of the available WFS services and the XML representation of the core DOV data.  

As pydov relies on the XML data returned by the existing DOV webservices, downloading DOV data with pydov is governed by the same [disclaimer](https://www.dov.vlaanderen.be/page/disclaimer) that applies to the other DOV services. Be sure to consult it when using DOV data with pydov!

pydov interfaces a database hosted by the Flemish governement. Therefore, some syntax of the API as well as the descriptions provided by the backend are in Dutch. 

## Use case: gather data for a hydrogeological model

In [2]:
%matplotlib inline
import inspect, sys

In [3]:
# check pydov path
import pydov

### pydov: general info

To get started with pydov you should first determine which information you want to search for. DOV provides a lot of different datasets about soil, subsoil and groundwater of Flanders, some of which can be queried using pydov. See https://pydov.readthedocs.io/en/latest/quickstart.html for the supported datasets.

In this case, to start with a hydrogeological model, we are interested in the hydrostratigraphic interpretation of the borehole data and the groundwater level. These datasets can be found with the following search objects:
- Hydrodstratigrapic interpretation: https://pydov.readthedocs.io/en/latest/reference.html#pydov.search.interpretaties.HydrogeologischeStratigrafieSearch
- Groundwater level:
https://pydov.readthedocs.io/en/latest/reference.html#pydov.search.grondwaterfilter.GrondwaterFilterSearch

Indeed, each of the datasets can be queried using a search object for the specific dataset. While the search objects are different, the workflow is the same for each dataset. Relevant classes can be imported from the pydov.search package, for example if we’d like to query the dataset with hydrogeological interpretations of borehole data:

In [4]:
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
hydrosearch = HydrogeologischeStratigrafieSearch()

If you would like some more information, you can query the search object. Sincy pydov interfaces a database from Flemish government agencies, the descriptions are in Dutch:

In [5]:
hydrosearch.get_description()

'De hydrostratigrafie geeft, op basis van de (gecodeerde) lithologie, een indeling weer naar de al dan niet watervoerende eigenschappen van een bepaald beschreven diepte-interval. Deze interpretatie respecteert de lithostratigrafie van het Tertiair, maar deelt deze anders in. De hiervoor gebruikte standaard is de Hydrogeologische Codering van de Ondergrond van Vlaanderen (HCOV). Deze kan beschouwd worden als de officiele hydrogeologische codering voor het Vlaams Gewest.'

The different fields that are available for objects of the 'Hydrogeologische Stratigrafie' datatype can be requested with the get_fields() method:

In [6]:
fields = hydrosearch.get_fields()
# print available fields
for f in fields.values():
    print(f['name'])

pkey_interpretatie
Type_proef
Proefnummer
pkey_boring
x
y
Z_mTAW
diepte_tot_m
gemeente
Auteurs
Datum
Opdrachten
betrouwbaarheid_interpretatie
Geldig_van
Geldig_tot
diepte_laag_van
diepte_laag_tot
aquifer


You can get more information of a field by requesting it from the fields dictionary:

- name: name of the field
- definition: definition of this field
- cost: currently this is either 1 or 10, depending on the datasource of the field. It is an indication of the expected time it will take to retrieve this field in the output dataframe.
- notnull: whether the field is mandatory or not
- type: datatype of the values of this field

In [7]:
fields['pkey_interpretatie']

{'name': 'pkey_interpretatie',
 'definition': "URL die verwijst naar de gegevens van deze hydrogeologische stratigrafie op de website. Voeg '.xml' toe om een XML voorstelling van deze gegevens te verkrijgen.",
 'type': 'string',
 'notnull': False,
 'query': True,
 'cost': 1}

The fields *pkey_interpretatie* and *pkey_boring* are important identifiers. In this case *pkey_interpretatie* is the unique identifier of this interpretation and is also the **permanent url** where the data can be viewed (~https://www.dov.vlaanderen.be/data/interpretatie/...), or retrieved if you additionally add ... + **.** *xml* 

The *pkey_boring* is the identifier of the borehole from which this interpretation was made. As mentioned before, it is also  the **permanent url** (~https://www.dov.vlaanderen.be/data/boring/...). 

Optionally, if the values of a field have a specific domain the possible values are listed as *values*:

In [8]:
fields['aquifer']['values']

{'0000': 'Onbekend',
 '0100': 'Quartaire aquifersystemen',
 '0110': 'Ophogingen',
 '0120': 'Duinen',
 '0130': 'Polderafzettingen',
 '0131': 'Kleiige polderafzettingen van de kustvlakte',
 '0132': 'Kleiige polderafzettingen van het Meetjesland',
 '0133': 'Kleiige polderafzettingen van Waasland-Antwerpen',
 '0134': 'Zandige kreekruggen',
 '0135': 'Veen-kleiige poelgronden',
 '0140': 'Alluviale deklagen',
 '0150': 'Deklagen',
 '0151': 'Zandige deklagen',
 '0152': 'Zand-lemige deklagen',
 '0153': 'Lemige deklagen',
 '0154': 'Kleiige deklagen',
 '0160': 'Pleistocene afzettingen',
 '0161': 'Pleistoceen van de kustvlakte',
 '0162': 'Pleistoceen van de Vlaamse Vallei',
 '0163': 'Pleistoceen van de riviervalleien',
 '0170': 'Maas- en Rijnafzettingen',
 '0171': 'Afzettingen Hoofdterras',
 '0172': 'Afzettingen Tussenterassen',
 '0173': 'Afzettingen Maasvlakte',
 '0200': 'Kempens Aquifersysteem',
 '0210': 'Kiezeloolietformatie ten noorden van Feldbiss',
 '0211': 'Zandige eenheid boven de Brunssum 

### Query the data with pydov

#### Attributes

The data can be queried on **attributes**, **location** or both. To query on attributes, the OGC filter functions from OWSLib are used:

In [9]:
# list available query methods
methods = [i for i,j in inspect.getmembers(sys.modules['owslib.fes'], 
                                           inspect.isclass) 
           if 'Property' in i]
print(*methods, sep = "\n") 

PropertyIsBetween
PropertyIsEqualTo
PropertyIsGreaterThan
PropertyIsGreaterThanOrEqualTo
PropertyIsLessThan
PropertyIsLessThanOrEqualTo
PropertyIsLike
PropertyIsNotEqualTo
PropertyIsNull
SortProperty


If you are for example interested in all the hydrostratigraphic interpretations in the city of Antwerp, you compose the query like below (mind that the values are in Dutch):

In [10]:
from owslib.fes import PropertyIsEqualTo
query = PropertyIsEqualTo(
            propertyname='gemeente',
            literal='Antwerpen')
df = hydrosearch.search(query=query)
df.head()

[000/191] ..................................................
[050/191] ..................................................
[100/191] ..................................................
[150/191] .........................................


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1956...,goed,154869.0,217060.0,0.0,1.0,140
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1956...,goed,154869.0,217060.0,1.0,8.0,233
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1956...,goed,154869.0,217060.0,8.0,20.0,241
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1979...,goed,144134.0,226051.0,0.0,7.5,110
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1979...,goed,144134.0,226051.0,7.5,11.0,133


This yielded 191 interpretations from 191, or less, boreholes. It can be less than 191 boreholes because multiple interpretations can be made of a single borehole. 

If you would like to narrow the search down to for example data below 200 meters, you can combine features in the search using the **logical operators And, Or** provided by OWSLib:

In [15]:
from owslib.fes import And
from owslib.fes import PropertyIsGreaterThan
query = And([
    PropertyIsEqualTo(
            propertyname='gemeente',
            literal='Antwerpen'),
    PropertyIsGreaterThan(
            propertyname='diepte_tot_m',
            literal='200')
    ])
df = hydrosearch.search(query=query)
df.head()

[000/003] ccc


(                                  pkey_interpretatie  \
 0  https://www.dov.vlaanderen.be/data/interpretat...   
 1  https://www.dov.vlaanderen.be/data/interpretat...   
 2  https://www.dov.vlaanderen.be/data/interpretat...   
 3  https://www.dov.vlaanderen.be/data/interpretat...   
 4  https://www.dov.vlaanderen.be/data/interpretat...   
 
                                          pkey_boring  \
 0  https://www.dov.vlaanderen.be/data/boring/1994...   
 1  https://www.dov.vlaanderen.be/data/boring/1994...   
 2  https://www.dov.vlaanderen.be/data/boring/1994...   
 3  https://www.dov.vlaanderen.be/data/boring/1994...   
 4  https://www.dov.vlaanderen.be/data/boring/1994...   
 
   betrouwbaarheid_interpretatie          x             y  diepte_laag_van  \
 0                          goed  155567.25  212059.53125              0.0   
 1                          goed  155567.25  212059.53125              2.0   
 2                          goed  155567.25  212059.53125             35.0   


Mind the difference between attributes *diepte_tot_m* and *diepte_laag_...*. The former is defined in the WFS service and can be used as attribute in the query. The latter attributes are defined in the linked XML document, from which the information is only available after it has been gathered from the DOV webservice. All the attributes with a *cost* of 10 are not available in the intial query and should be used in a subsequent filtering of the Pandas DataFrame

In [20]:
print('Cost of search\n', 
      'WFS attribute (diepte_tot_m): ' + str(fields['diepte_tot_m']['cost']) + '\n', 
      'XML attribute (diepte_laag_tot): '+ str(fields['diepte_laag_tot']['cost']))

Cost of search
 WFS attribute (diepte_tot_m): 1
 XML attribute (diepte_laag_tot): 10


More information on querying attribute properties is given in the [docs](https://pydov.readthedocs.io/en/latest/query_attribute.html). Worth mentioning is the query using lists where pydov extends the default OGC filter expressions described with a new expression **PropertyInList** that allows you to use lists (of strings) in search queries.

#### Location

One can also query on **location**, using the location objects and spatial filters from the pydov.util.location module. For example, to request all stratigraphic interpretations in a given bounding **box**:

In [22]:
from pydov.util.location import Within, Box
location = Within(Box(152000, 211000, 155000, 214000))
df = hydrosearch.search(location=location
                       )
df.head()

[000/012] cccccccccccc


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1979...,goed,154720.0,212262.0,0.0,4.5,100
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1979...,goed,154720.0,212262.0,4.5,34.0,254
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1979...,goed,154720.0,212262.0,34.0,38.0,300
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1931...,goed,152346.0,212957.0,0.0,5.0,133
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1931...,goed,152346.0,212957.0,5.0,11.0,230


Alternatively, you can define a **Point** or a **GML document** for the spatial query as is described in the [docs](https://pydov.readthedocs.io/en/latest/query_location.html). For example, if you are interested in a site you can define the point with a search radius of for example 500 meters like this:

In [26]:
from pydov.util.location import WithinDistance, Point
location = WithinDistance(Point(154000, 212000), 500, distance_unit='meter')
df = hydrosearch.search(location=location
                       )
df.head()

[000/002] cc


Unnamed: 0,pkey_interpretatie,pkey_boring,betrouwbaarheid_interpretatie,x,y,diepte_laag_van,diepte_laag_tot,aquifer
0,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1997...,goed,153697.0,211805.0,0.0,1.5,0.0
1,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1997...,goed,153697.0,211805.0,1.5,5.5,140.0
2,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1997...,goed,153697.0,211805.0,5.5,30.0,254.0
3,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1997...,goed,153697.0,211805.0,30.0,40.0,300.0
4,https://www.dov.vlaanderen.be/data/interpretat...,https://www.dov.vlaanderen.be/data/boring/1974...,onbekend,153563.0,211945.0,,,
