# Nordrhein-Westfalen

Every federal state is represented by its own input directory and is processed into a NUTS level 2 directory containing a sub-folder for each discharge location. These folder names are derived from NUTS and reflect the CAMELS id. The NUTS level 2 code for Nordrhein-Westfalen is `DEA`.

To pre-process the data, you need to write (at least) two functions. One should extract all metadata and condense it into a single `pandas.DataFrame`. This is used to build the folder structure and derive the ids.
The second function has to take an id, as provided by the state authorities, called `provider_id` and return a `pandas.DataFrame` with the transformed data. The dataframe needs the three columns `['date', 'q' | 'w', 'flag']`.

For easier and unified output handling, the `camelsp` package contains a context object called `Bundesland`. It takes a number of names and abbreviations to identify the correct federal state and returns an object that holds helper and save functions.

The context saves files as needed and can easily be changed to save files with different strategies, ie. fill missing data with NaN, merge data into a single file, create files for each variable or pack everything together into a netcdf.

In [4]:
import pandas as pd
import geopandas as gpd
import numpy as np
from pandas.errors import ParserError
import os
from pprint import pprint
from tqdm import tqdm
from typing import Union, Dict
import patoolib
from glob import glob
from datetime import datetime as dt
from dateparser import parse
import warnings
from io import StringIO

from camelsp import Bundesland, Station


The context can also be instantiated as any regular Python class, ie. to load only the default input data path, that we will user later.

In [2]:
# the context also makes the input path available, if camelsp was install locally
BASE = Bundesland('NRW').input_path
BASE

'/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen'

In [3]:
# remove everything from BASE folder except Q&W.rar to ensure that data is extracted freshly and is up to date
for f in glob(f"{BASE}/*[!*.rar]"):
    os.remove(f)

# extract rar archive, make sure that unrar is installed
patoolib.extract_archive(f"{BASE}/Q&W.rar", outdir=BASE)

patool: Extracting /home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen/Q&W.rar ...
patool: running /usr/bin/unrar x -- /home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen/Q&W.rar
patool:     with cwd='/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen'
patool: ... /home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen/Q&W.rar extracted to `/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen'.


'/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen'

In [4]:
# the file Datenanfrage_CAMELS_2581119000100_20221107_140905.csv exists twice, remove (2)
for f in glob(f"{BASE}/Datenanfrage_CAMELS_2581119000100_20221107_140905*"):
    print(f)

os.remove(f"{BASE}/Datenanfrage_CAMELS_2581119000100_20221107_140905 (2).csv")

/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen/Datenanfrage_CAMELS_2581119000100_20221107_140905.csv
/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/NRW_Nordrhein-Westfalen/Datenanfrage_CAMELS_2581119000100_20221107_140905 (2).csv


In [5]:
with Bundesland('NRW') as bl:
    metadata = pd.read_excel(os.path.join(bl.input_path, 'Stammdaten_CAMELS.xlsx'))

metadata

Unnamed: 0,NAME,ORT,NULLPUNKT,STATIONIER,GEBFLAECHE,UTMZONE,KOORDX,KOORDY,KOMMENTAR,GEBIETSKEN,Gewässerkennzahl,Gewässer
0,Ahmsen,4639000000100,64.285,27.158,593.00,32U,479549.677600,5.771202e+06,Grundmessstelle des Landes (GL),4639.0,46,Werre
1,Albersloh,3259000000100,48.678,27.470,321.58,32U,412463.350631,5.748891e+06,,3259.0,32,Werse
2,Altena,2766930000100,154.225,29.700,1190.00,32U,407683.711900,5.682847e+06,Talsperrenbeeinflussung ab 1968 (Biggetalsperre),276693.0,2766,Lenne
3,Altenbeken 2,2781610000200,215.958,11.990,20.50,32U,494359.426900,5.734473e+06,Grundmeßstelle des Landes (GL),278161.0,27816,Beke
4,Altenburg 1,2823900000200,82.651,62.440,958.76,32U,315309.586100,5.641695e+06,Grundmessstelle,28239.0,282,Rur
...,...,...,...,...,...,...,...,...,...,...,...,...
214,Westtuennen,2786700000100,57.572,3.970,414.90,32U,421634.000000,5.724518e+06,Grundmessstelle des Landes (GL),,2786,Ahse
215,Wetter_Wengern_1,2769169000100,84.805,0.480,17.58,32U,384656.787100,5.695710e+06,Grundmeßstelle des Landes (GL),2769169.0,276916,Elbsche
216,Wettringen B70,9286291000300,41.504,6.330,175.07,32U,385480.262390,5.786270e+06,,928629.0,92862,Steinfurter Aa
217,Wt-Kluserbrücke,2736510000100,142.226,49.240,337.82,32U,371494.000000,5.679856e+06,Durch mehrere Talsperren beeinflusst. Seit 01....,273651.0,2736,Wupper


In [6]:
metadata[metadata['ORT'] == 2581119000100]

Unnamed: 0,NAME,ORT,NULLPUNKT,STATIONIER,GEBFLAECHE,UTMZONE,KOORDX,KOORDY,KOMMENTAR,GEBIETSKEN,Gewässerkennzahl,Gewässer
50,Feudingen,2581119000100,388.738,222.32,25.4,32U,452579.3848,5643232.0,,2581119.0,258,Lahn


In [7]:
for map in bl.nuts_mapping:
    if map['provider_id'] == '2581119000100':
        print(map)

### Metadata reader

Define the function that extracts / reads and eventually merges all metadata for this federal state. You can develop the function here, without using the Bundesland context and then later use the context to pass extracted metadata. The Context has a function for saving *raw* metadata, that takes a `pandas.DataFrame` and needs you to identify the id column.
Here, *raw* refers to provider metadata, that has not yet been transformed into the CAMELS-de Metadata schema.

In [8]:
# the id column will be ORT
id_column = 'ORT'

## file extract and parse

Here, we need to process the filename as the `'Ort'` is contained in the filename. Looks like the metadata header is **always** to line 32, indicating a finished header by `YTYP;`. Verify this.

In [9]:
for fname in glob(os.path.join(BASE, 'Datenanfrage_CAMELS_*')):
    df = pd.read_csv(fname, encoding='latin1', sep=';', usecols=[0,1], nrows=32, header=None)
    if df.iloc[31, 0] != 'YTYP':
        print(fname)
        

That's will make our lifes way easier. Now go for all:

In [13]:
# get all file names
filelist = glob(os.path.join(BASE, 'Datenanfrage_CAMELS_*'))

# container for meta-header and dataframes
meta = []
data = []

# go for each file
for fname in tqdm(filelist):
    # open
    with open(fname, 'rb') as f:
        txt = f.read().decode('latin1')
    
    # split header
    header = txt.splitlines()[:32]
    
    # build the meta by hand
    tups = [l.split(';') for l in header[:-1]]
    meta_dict = {t[0]: t[1] for t in tups}

    # check the parameter
    if meta_dict['Parameter'] == 'Wasserstand':
        variable = 'w'
    elif meta_dict['Parameter'] == 'Abfluss':
        variable = 'q'
    else:
        raise RuntimeError(f"Unknown Parameter: {meta_dict['Parameter']}")

    meta.append(meta_dict)
    
    # now get the body
    body = txt.splitlines()[32:]

    # now this stupid check
    second_header = [i for i, l in enumerate(body) if l.startswith('Station')]
    if len(second_header) > 0:
        # THERE IS A SECOND HEADER IN THE FILE !!!! come on!
        body = body[:second_header[0]]
    
    # write to buffer
    buffer = StringIO('\n'.join(body))
    buffer.seek(0)
    
    # read from memory
    df_data = pd.read_csv(buffer, sep=';', usecols=[0,1], skiprows=32, decimal=',', header=None, na_values='LUECKE', 
                          parse_dates=[0], date_format='%d.%m.%Y %H:%M:%S')
    
    df_data.columns = ['date', variable]
    df_data['flag'] = np.NaN
    
    # append
    data.append(df_data)
    
    
print(f"Parsed {len(meta)} metadata headers and {len(data)} data files")

100%|██████████| 436/436 [00:15<00:00, 27.75it/s]

Parsed 436 metadata headers and 436 data files





That was really stupid. Ok. Check the metadata from the data files:

In [16]:
# create dataframe from meta, drop empty columns
extra = pd.DataFrame(meta).drop(['Gewässer', 'GUELTBIS', 'GUELTVON', 'KOMMENTAR', 'PARMERKMAL', 'XEINHEIT'], axis=1)
extra

Unnamed: 0,Station,Stationsnummer,Unterbezeichnung,Einzugsgebiet,Pegelnullpunkt,Parameter,Einheit,Aussage,Lebenslauf,Zeitangabe,...,MESSGENAU,NWGRENZE,PUBLIZIERT,QUELLE,REIHENART,VERSION,X,XDISTANZ,XFAKTOR,Y
0,Mulartshütte,2824450000100,,"45,60 km²","298,968 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2824450000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,302948,T,1,5619157
1,Pottenhausen,4619100000100,,"166,00 km²","82,148 mDHHN2016 (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('4619100000100.qk1',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,482865,T,1,5762925
2,Weckhoven,2748900000100,,"95,39 km²","38,202 mNHN (aktuell)",Wasserstand,cm,Mittelwert,"MITTEL('2748900000100.wk2',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,338387,T,1,5669500
3,Kapellen,2866500000200,,"75,86 km²","20,296 mNHN (aktuell)",Wasserstand,cm,Mittelwert,"MITTEL('2866500000200.wk2',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,317027,T,1,5716519
4,Boisheim,2862100000100,,"34,78 km²","43,954 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2862100000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,309743,T,1,5682935
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
431,Lohmar,2728930000200,,"785,00 km²","57,953 mNHN (aktuell)",Wasserstand,cm,Mittelwert,"MITTEL('2728930000200.wk6',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,373367,T,1,5633750
432,Erkrath,2739229000100,,"72,67 km²","50,966 mNHN (aktuell)",Wasserstand,cm,Mittelwert,"MITTEL('2739229000100.wk1',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,354556,T,1,5677032
433,Rustenhof,4526900000100,,"77,06 km²","138,920 mNHN (aktuell)",Wasserstand,cm,Mittelwert,"MITTEL('4526900000100.wk4',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,509298,T,1,5728402
434,Burg Veynau,2741870000100,,"62,46 km²","199,408 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2741870000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,0.0000,0.0000,True,P,Z,0,338657,T,1,5612058


Metadata `extra` has different entries for Q and W at the same station. We have to merge these two metadata entries, so that we have one metadata entry for each `Stationsnummer`.

In [17]:
#extra[extra['Stationsnummer']=='2721459000100']

print(f"Two entries for ID '2721459000100':\n{extra[extra['Stationsnummer'] == '2721459000100']}")

# all IDs have two entries -> Q and W
print(f"\nNumber of duplicated IDs: {extra['Stationsnummer'].duplicated().sum()}")

Two entries for ID '2721459000100':
      Station Stationsnummer Unterbezeichnung Einzugsgebiet  \
110  Kreuztal  2721459000100                      63,40 km²   
126  Kreuztal  2721459000100                      63,40 km²   

              Pegelnullpunkt    Parameter Einheit     Aussage  \
110  273,894 mNHN (aktuell)      Abfluss    m³/s  Mittelwert   
126  273,894 mNHN (aktuell)  Wasserstand      cm  Mittelwert   

                         Lebenslauf  \
110   MITTEL('2721459000100.qk0',d)   
126   MITTEL('2721459000100.wk3',d)   

                                           Zeitangabe  ... MESSGENAU NWGRENZE  \
110  Linke Seite des Zeitintervalls mit Intervallwert  ...    0.0000   0.0000   
126  Linke Seite des Zeitintervalls mit Intervallwert  ...    0.0000   0.0000   

    PUBLIZIERT QUELLE REIHENART VERSION       X XDISTANZ XFAKTOR        Y  
110       True      P         Z       0  429430        T       1  5645719  
126       True      P         Z       0  429430        T       1

In [18]:
# get column names where Q and W data differ (at least once)
all_ids = extra['Stationsnummer'].unique()

all_different_cols = []

for id in all_ids:
    dat = extra[extra['Stationsnummer'] == id]
    if len(dat) < 2:
        continue
    different_columns = dat.iloc[0] != dat.iloc[1]
    different_colnames = different_columns[different_columns].index.tolist()
    
    for colname in different_colnames:
        if colname in all_different_cols:
            continue
        else:
            all_different_cols.append(colname)

all_different_cols

['Parameter',
 'Einheit',
 'Lebenslauf',
 'FTOLERANZ',
 'PUBLIZIERT',
 'QUELLE',
 'Pegelnullpunkt',
 'HOEHE',
 'Unterbezeichnung']

merge q and w metadata by adding Q_ and W_ suffixes.

In [19]:
# the following columns have different values in Q and W metadata at least once and therfor get a prefix
all_different_cols = ['Parameter', 'Einheit', 'Lebenslauf', 'FTOLERANZ', 'Pegelnullpunkt', 'QUELLE', 'PUBLIZIERT', 'Unterbezeichnung', 'HOEHE']

# get all unique ids
all_ids = extra['Stationsnummer'].unique()

# container
qw_metadata = []

# go for each id
for id in all_ids:
    if id in ['9284590000100', '4761500000100']:
        # for these two stations we only have w data
        w_meta = extra[(extra['Stationsnummer'] == id) & (extra['Parameter'] == 'Wasserstand')].to_dict(orient='records')[0]
        q_meta = {}
    else:
    # get Q metadata as dict
        q_meta = extra[(extra['Stationsnummer'] == id) & (extra['Parameter'] == 'Abfluss')].to_dict(orient='records')[0]
        # get W metadata as dict
        w_meta = extra[(extra['Stationsnummer'] == id) & (extra['Parameter'] == 'Wasserstand')].to_dict(orient='records')[0]

    q_updated_meta = {}
    w_updated_meta = {}

    # add prefix to Q keys
    for key in q_meta.keys():
        if key in all_different_cols:
            q_updated_meta[f"Q_{key}"] = q_meta[key]
        else:
            q_updated_meta[key] = q_meta[key]
    
    # add prefix to W keys
    for key in w_meta.keys():
        if key in all_different_cols:
            w_updated_meta[f"W_{key}"] = w_meta[key]
        else:
            w_updated_meta[key] = w_meta[key]

    # merge Q and W metadata
    qw_metadata.append({**q_updated_meta, **w_updated_meta})

qw_metadata = pd.DataFrame(qw_metadata)
qw_metadata
    

Unnamed: 0,Station,Stationsnummer,Q_Unterbezeichnung,Einzugsgebiet,Q_Pegelnullpunkt,Q_Parameter,Q_Einheit,Aussage,Q_Lebenslauf,Zeitangabe,...,Y,W_Unterbezeichnung,W_Pegelnullpunkt,W_Parameter,W_Einheit,W_Lebenslauf,W_FTOLERANZ,W_HOEHE,W_PUBLIZIERT,W_QUELLE
0,Mulartshütte,2824450000100,,"45,60 km²","298,968 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2824450000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5619157,,"298,968 mNHN (aktuell)",Wasserstand,cm,"MITTEL('2824450000100.wk3',d)",0.0000,29897,True,P
1,Pottenhausen,4619100000100,,"166,00 km²","82,148 mDHHN2016 (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('4619100000100.qk1',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5762925,,"82,148 mDHHN2016 (aktuell)",Wasserstand,cm,"MITTEL('4619100000100.wk4',d)",0.0000,8215,True,P
2,Weckhoven,2748900000100,,"95,39 km²","38,202 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2748900000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5669500,,"38,202 mNHN (aktuell)",Wasserstand,cm,"MITTEL('2748900000100.wk2',d)",0.0100,3820,True,P
3,Kapellen,2866500000200,,"75,86 km²","20,296 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2866500000200.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5716519,,"20,296 mNHN (aktuell)",Wasserstand,cm,"MITTEL('2866500000200.wk2',d)",0.0100,2030,True,P
4,Boisheim,2862100000100,,"34,78 km²","43,954 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2862100000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5682935,,"43,954 mNHN (aktuell)",Wasserstand,cm,"MITTEL('2862100000100.wk1',d)",0.0100,4395,True,P
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
214,Haltern,2789100000100,,"4273,16 km²","30,856 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2789100000100.qk3',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5732685,,"30,856 mNHN (aktuell)",Wasserstand,cm,"MITTEL('2789100000100.wk7',d)",0.0000,3086,True,P
215,Isselburg,9281700000200,,"258,16 km²","15,335 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('9281700000200.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5745437,,"15,335 mNHN (aktuell)",Wasserstand,cm,"MITTEL('9281700000200.wk0',d)",0.0000,1534,True,F
216,Kaarst,2751270000100,,"105,98 km²","34,817 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2751270000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,5676573,,"34,817 mNHN (aktuell)",Wasserstand,cm,"MITTEL('2751270000100.wk3',d)",0.0100,3482,True,P
217,Fiestel,4761500000100,,"102,24 km²",,,,Mittelwert,,Linke Seite des Zeitintervalls mit Intervallwert,...,5800420,,"43,889 mNHN (aktuell)",Wasserstand,cm,"MITTEL('4761500000100.wk4',d)",0.0000,4389,True,P


Now we have to left-join the data, as each Stationsnummer exists twice. Thus, it is only the combination of Stationsnummer and variable, that makes the data unique

In [20]:
metadata = qw_metadata.join(metadata.set_index(metadata.ORT.astype(str)), on='Stationsnummer', how='left')
metadata

Unnamed: 0,Station,Stationsnummer,Q_Unterbezeichnung,Einzugsgebiet,Q_Pegelnullpunkt,Q_Parameter,Q_Einheit,Aussage,Q_Lebenslauf,Zeitangabe,...,NULLPUNKT,STATIONIER,GEBFLAECHE,UTMZONE,KOORDX,KOORDY,KOMMENTAR,GEBIETSKEN,Gewässerkennzahl,Gewässer
0,Mulartshütte,2824450000100,,"45,60 km²","298,968 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2824450000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,298.968,16.290,45.60,32U,302948.3180,5.619157e+06,Meßschwelle,282445.0,28244,Vichtbach
1,Pottenhausen,4619100000100,,"166,00 km²","82,148 mDHHN2016 (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('4619100000100.qk1',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,82.148,39.238,166.00,32U,482865.4340,5.762925e+06,Sonderpegel,46191.0,46,Werre
2,Weckhoven,2748900000100,,"95,39 km²","38,202 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2748900000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,38.202,0.790,95.39,32U,338387.0000,5.669500e+06,,27489.0,2748,Gillbach
3,Kapellen,2866500000200,,"75,86 km²","20,296 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2866500000200.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,20.296,8.590,75.86,32U,317027.0000,5.716519e+06,,,2866,Issumer Fleuth
4,Boisheim,2862100000100,,"34,78 km²","43,954 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2862100000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,43.954,24.030,34.78,32U,309743.0000,5.682935e+06,,28621.0,2862,Nette
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
214,Haltern,2789100000100,,"4273,16 km²","30,856 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2789100000100.qk3',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,30.856,53.300,4273.16,32U,374768.6620,5.732685e+06,Grundmessstelle des Landes (GL),,278,Lippe
215,Isselburg,9281700000200,,"258,16 km²","15,335 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('9281700000200.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,15.335,130.850,258.16,32U,325840.0508,5.745437e+06,Beeinflussung durch oberhalb gelegene Wehrsteu...,,928,Issel
216,Kaarst,2751270000100,,"105,98 km²","34,817 mNHN (aktuell)",Abfluss,m³/s,Mittelwert,"MITTEL('2751270000100.qk0',d)",Linke Seite des Zeitintervalls mit Intervallwert,...,34.817,5.660,105.98,32U,335061.0000,5.676573e+06,,,275122,Nordkanal
217,Fiestel,4761500000100,,"102,24 km²",,,,Mittelwert,,Linke Seite des Zeitintervalls mit Intervallwert,...,43.889,72.752,102.24,32U,469940.7786,5.800420e+06,Grundmessstelle des Landes (GL),47615.0,476,Große Aue


In [21]:
id_column = 'Stationsnummer'

### Finally run

Now, the Q and W data can be extracted. The cool thing is, that all the id creation, data creation, merging and the mapping from our ids to the original ids and files is done by the context. This is helpful, as we less likely screw something up.

In [22]:
with Bundesland('NRW') as bl:
    # save the metadata
    bl.save_raw_metadata(metadata, 'Stationsnummer', overwrite=True)

    # for reference, call the nuts-mapping as table
    nuts_map = bl.nuts_table
    print(nuts_map.head())

    
    with warnings.catch_warnings(record=True) as warns:
        for m, df in tqdm(zip(meta, data), total=len(meta)):
            # check the meta
            provider_id = str(m['Stationsnummer'])
            bl.save_timeseries(df, provider_id)

        # check if there were warnings (there are warnings)
        if len(warns) > 0:
            log_path = bl.save_warnings(warns)
            print(f"There were warnings during the processing. The log can be found at: {log_path}")


    nuts_id    provider_id                              path
0  DEA10000  2824450000100  ./DEA/DEA10000/DEA10000_data.csv
1  DEA10010  4619100000100  ./DEA/DEA10010/DEA10010_data.csv
2  DEA10020  2748900000100  ./DEA/DEA10020/DEA10020_data.csv
3  DEA10030  2866500000200  ./DEA/DEA10030/DEA10030_data.csv
4  DEA10040  2862100000100  ./DEA/DEA10040/DEA10040_data.csv


100%|██████████| 436/436 [00:13<00:00, 32.60it/s]


## Add EZG from provider to all Stations where available


In [13]:
gdf_ezg = gpd.read_file(os.path.join(BASE, '../Shapes/NRW_Shapes/whm_nrw_pegeleinzugsgebiete_dissolved/pegeleinzugsgebiete_dissolved.shp'), encoding='utf-8')

# save errors
errors = []

for id in gdf_ezg['pegelid'].values:
    # init station via PKZ, ignore warnings as we use provider_id instead of camels_id
    try:
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')
            s = Station(id)
            
    except ValueError as e:
        errors.append(e)
        continue

    # get catchment geometry for id
    catchment = gdf_ezg[gdf_ezg['pegelid'] == id].iloc[[0]]

    # save catchment geometry
    s.save_catchment_geometry(catchment, datasource='federal_agency_ezg')

# print results and number of errors
if len(errors) > 0:
    print("Errors:")
    for e in errors:
        print(e)



Errors:
2718180000100 is neither a provider_id nor a CAMELS-DE NUTSID
2718193000100 is neither a provider_id nor a CAMELS-DE NUTSID
27194900001010 is neither a provider_id nor a CAMELS-DE NUTSID
2721344000100 is neither a provider_id nor a CAMELS-DE NUTSID
2721811000100 is neither a provider_id nor a CAMELS-DE NUTSID
2721830000100 is neither a provider_id nor a CAMELS-DE NUTSID
2722590000100 is neither a provider_id nor a CAMELS-DE NUTSID
2726613000100 is neither a provider_id nor a CAMELS-DE NUTSID
2729900000100 is neither a provider_id nor a CAMELS-DE NUTSID
2731450500099 is neither a provider_id nor a CAMELS-DE NUTSID
2737323400100 is neither a provider_id nor a CAMELS-DE NUTSID
2745400000100 is neither a provider_id nor a CAMELS-DE NUTSID
2746790000100 is neither a provider_id nor a CAMELS-DE NUTSID
2749490000100 is neither a provider_id nor a CAMELS-DE NUTSID
2751270000200 is neither a provider_id nor a CAMELS-DE NUTSID
2761490000100 is neither a provider_id nor a CAMELS-DE NUTSID