# Welcome to the Datenguide Python Package

Within this notebook the functionality of the package will be explained and demonstrated with examples.

### Topics

- Import
- get region IDs
- get statstic IDs
- get the data
    - for single regions
    - for multiple regions

## 1. Import

**Import the helper functions 'get_all_regions' and 'get_statistics'**

**Import the module Query for the main functionality**

In [None]:
# ONLY FOR TESTING LOCAL PACKAGE
# %cd ..

from datenguidepy.query_helper import get_all_regions, get_statistics
from datenguidepy import Query

**Import pandas and matplotlib for the usual display of data as tables and graphs**

In [None]:
import pandas as pd
import matplotlib
%matplotlib inline

pd.set_option('display.max_colwidth', 150)

## 2. Get Region IDs
### How to get the ID of the region I want to query

Regionalstatistik - the database behind Datenguide - has data for differently granular levels of Germany. 

nuts:

        1 – Bundesländer
        2 – Regierungsbezirke / statistische Regionen
        3 – Kreise / kreisfreie Städte.
           
lau:

        1 - Verwaltungsgemeinschaften
        2 - Gemeinden.

the function `get_all_regions()` returns all IDs from all levels.

In [None]:
# get_all_regions returns all ids
get_all_regions()

To get a specific ID, use the common pandas function `query()`


In [None]:
# e.g. get all "Bundesländer
get_all_regions().query("level == 'nuts1'")

In [None]:
# e.g. get the ID of Havelland
get_all_regions().query("name =='Havelland'")

## 3. Get statistic IDs
### How to find statistics

In [None]:
# get all statistics
get_statistics()

If you already know the statsitic ID you are looking for - perfect. 

Otherwise you can use the pandas `query()` function so search e.g. for specific terms.

In [None]:
# find out the name of the desired statistic about birth
get_statistics().query('long_description.str.contains("Statistik der Geburten")', engine='python')

## 4. get the data

The top level element is the Query. For each query fields can be added (usually statistics / measures) that you want to get information on.

A Query can either be done on a single region, or on multiple regions (e.g. all Bundesländer).

### Single Region

If I want information - e.g. all births for the past years in Berlin:

In [None]:
# create a query for the region 11
query = Query.region('11')

In [None]:
# add a field (the statstic) to the query
field_births = query.add_field('BEV001')

In [None]:
# get the data of this query
query.results().head()

To get the short description in the result data frame instead of the cryptic ID (e.g. "Lebend Geborene" instead of BEV001) set the argument "verbose_statsitics"=True in the resutls:

In [None]:
query.results(verbose_statistics =True).head()

Now we only get the information about the count of births per year and the source of the data (year, value and source are default fields).
But there is more information in the statistic that we can get information on.

Let's look at the meta data of the statstic:

In [None]:
# get information on the field
field_births.get_info()

The arguments tell us what we can use for filtering (e.g. only data on baby girls (female)).

The fields tell us what more information can be displayed in our results. 

In [None]:
# add filter
field_births.add_args({'GES': 'GESW'})

In [None]:
# now only about half the amount of births are returned as only the results for female babies are queried
query.results().head()

In [None]:
# add the field NAT (nationality) to the results
field_births.add_field('NAT')

**CAREFUL**: The information for the fields (e.g. nationality) is by default returned as a total amount. Therefore - if no argument "NAT" is specified in addition to the field, then only "None" will be displayed.

In order to get information on all possible values, the argument "ALL" needs to be added:
(the rows with value "None" are the aggregated values of all options)

In [None]:
field_births.add_args({'NAT': 'ALL'})

In [None]:
query.results().head()

To display the short description of the enum values instead of the cryptic IDs (e.g. Ausländer(innen) instead of NATA), set the argument "verbose_enums = True" on the results:

In [None]:
query.results(verbose_enums=True).head()

## Multiple Regions

To display data for multiple single regions, a list with region IDs can be used:

In [None]:
query_multiple = Query.region(['01', '02'])
query_multiple.add_field('BEV001')
query_multiple.results().sort_values('year').head()

To display data for e.g. all 'Bundesländer' or for all regions within a Bundesland, you can use the function `all_regions()`:

- specify nuts level
- specify lau level
- specify parent ID (Careful: not only the regions for the next lower level will be returned, but all levels - e.g. if you specify a parent on nuts level 1 then the "children" on nuts 2 but also the "grandchildren" on nuts 3, lau 1 and lau 2 will be returned)

In [None]:
# get data for all Bundesländer
query_all = Query.all_regions(nuts=1)
query_all.add_field('BEV001')
query_all.results().sort_values('year').head(12)

In [None]:
# get data for all regions within Brandenburg
query_all = Query.all_regions(parent='12')
query_all.add_field('BEV001')
query_all.results().head()

In [None]:
# get data for all nuts 3 regions within Brandenburg
query_all = Query.all_regions(parent='12', nuts=3)
query_all.add_field('BEV001')
query_all.results().sort_values('year').head()