# Bayern

Every federal state is represented by its own input directory and is processed into a NUTS level 2 directory containing a sub-folder for each discharge location. These folder names are derived from NUTS and reflect the CAMELS id. The NUTS level 2 code for Bayern is `DE2`.

To pre-process the data, you need to write (at least) two functions. One should extract all metadata and condense it into a single `pandas.DataFrame`. This is used to build the folder structure and derive the ids.
The second function has to take an id, as provided by the state authorities, called `provider_id` and return a `pandas.DataFrame` with the transformed data. The dataframe needs the three columns `['date', 'q' | 'w', 'flag']`.

For easier and unified output handling, the `camelsp` package contains a context object called `Bundesland`. It takes a number of names and abbreviations to identify the correct federal state and returns an object that holds helper and save functions.

The context saves files as needed and can easily be changed to save files with different strategies, ie. fill missing data with NaN, merge data into a single file, create files for each variable or pack everything together into a netcdf.

In [1]:
import pandas as pd
import geopandas as gpd
import numpy as np
from pandas.errors import ParserError
import os
from pprint import pprint
from tqdm import tqdm
from typing import Union, Dict
import zipfile
from datetime import datetime as dt
from io import StringIO
import warnings

from camelsp import Bundesland, Station

The context can also be instantiated as any regular Python class, ie. to load only the default input data path, that we will user later.

In [2]:
# the context also makes the input path available, if camelsp was install locally
BASE = Bundesland('Bayern').input_path
BASE

'/home/alexd/Projekte/CAMELS/Github/camelsp/input_data/BY_Bayern/Abfluss_Pegel/Datenanfrage_CAMELS-DE'

### Metadata reader

Define the function that extracts / reads and eventually merges all metadata for this federal state. You can develop the function here, without using the Bundesland context and then later use the context to pass extracted metadata. The Context has a function for saving *raw* metadata, that takes a `pandas.DataFrame` and needs you to identify the id column.
Here, *raw* refers to provider metadata, that has not yet been transformed into the CAMELS-de Metadata schema.

In [3]:
# define the function 
def read_meta(base_path) -> pd.DataFrame:
    path = os.path.join(base_path, 'Stammdaten_Bayern.xlsx')
    meta = pd.read_excel(path)
    return meta

# test it here
metadata = read_meta(BASE)

metadata

Unnamed: 0,Stationsname,Stationsnummer,Stationsausprägung,Messortname,Gewässer (Name|Nummer),Gewässertyp,Ordnung,EZG km²,Flusskilometer km,Lage am Gewässer,...,Ostwert,Nordwert,Rechtswert (lokal),Hochwert (lokal),Status,Gültig ab (Datum),PNP,Höhensystem,PNP ab (Datum),PNP ab (Datum/Zeit)
0,Neu-Ulm Bad Held_Donau,10026301,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,7616.6,2586.7,Rechts,...,573050.446343,5.360046e+06,4351004.0,5.363236e+06,Aktiv,01.11.1981,464.88,Höhensystem DHHN12,01.11.1981,00:00:00
1,Günzburg uh Günz,10032009,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,B - Pegel,9395,2561.098,Links,...,594887.794726,5.368836e+06,4373181.0,5.371169e+06,Aktiv,01.11.1811,438.64,Höhensystem DHHN12,01.11.1989,00:00:00
2,Dillingen,10035801,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,11378.7,2538.3,Links,...,610562.698611,5.380538e+06,4389310.0,5.382252e+06,Aktiv,01.11.1979,415.01,Höhensystem DHHN12,01.11.1979,00:00:00
3,Donauwörth,10039802,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,15131,2508.134,Links,...,632525.732272,5.396883e+06,4411906.0,5.397727e+06,Aktiv,01.11.1979,394.78,Höhensystem DHHN12,01.11.1979,00:00:00
4,Neuburg,10043708,1121-Pegel Fluss,WWA Ingolstadt,Donau,Fluss,B - Pegel,19923.8,2477.495,Links,...,660129.475678,5.400744e+06,4439646.0,5.400499e+06,Aktiv,01.11.1974,375.44,Höhensystem DHHN2016,09.08.2019,00:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
535,Förmitz Speicherzufluss,56113404,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,8.2,2.43,Links,...,706987.215677,5.563097e+06,4492926.0,5.560860e+06,Aktiv,01.11.1975,529.38,Höhensystem DHHN12,01.11.1980,00:00:00
536,Förmitz Speicherabfluss,56114000,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,14.1,0.1,Links,...,707310.945329,5.565051e+06,4493328.0,5.562799e+06,Aktiv,01.11.1963,498.30,Höhensystem DHHN12,01.11.1980,00:00:00
537,Rehau_Schwesnitz,56122008,1121-Pegel Fluss,WWA Hof,Schwesnitz,Fluss,A - Pegel,84.3,9.218,Links,...,715286.971149,5.570284e+06,4501507.0,5.567706e+06,Aktiv,01.11.1958,511.63,Höhensystem DHHN12,01.11.1980,00:00:00
538,Kautendorf,56143008,1121-Pegel Fluss,WWA Hof,Südliche Regnitz,Fluss,A - Pegel,92.4,6.44,Rechts,...,712333.082238,5.574656e+06,4498732.0,5.572193e+06,Aktiv,01.11.1956,487.05,Höhensystem DHHN12,01.11.1980,00:00:00


In [4]:
# the id column will be Stationsnummer
id_column = 'Stationsnummer'

## file extract and parse

I'll keep the files in the zip, just because. In baWü these zips are nicely flat-packed and there is actually no need to extract the zip. Later, we might want to extract and change the code below.

bayern is really nasty as they change the format inside the files and they have negative water levels, which are most likely a sensor fault code or something. I build a dirty workaround for this by handling parser errors. If one occurs, the file content is written into a file-like-object in memory and splitted into a list of rows. Each row, that has a negative value on the second column is marked as faulty and skipped. If there were faulty columns, a warning containing the indices at which this error occured. The indices are all shifted by 8, as the first 8 rows contain metadata and are skipped anyway.
Checked this procedure for one file.

In [5]:
# helper to map ids to filenames
def get_filename_mapping(zippath: str) -> Dict[str, str]:
    with zipfile.ZipFile(zippath) as z:
        m = dict()
        for f in z.filelist:
            id_only = os.path.basename(f.filename).split('.')[0]
            m[str(id_only)] = f.filename
        return m

def get_file_from_zip(nr: Union[int, str], zippath: str, not_exists = 'raise'):
    # get filename mapping
    fmap = get_filename_mapping(zippath)
    
    # always use string
    fname = str(nr)

    # search the file 
    if fname in fmap.values():
        fname = fname
    elif fname in fmap.keys():
        fname = fmap[fname]
    else:
        FileNotFoundError(f"nr {nr} is nothing we would expect. Use a Stationsnummer or filename in the zip")
    
    # go for the file
    with zipfile.ZipFile(zippath) as z:
        if fname not in [f.filename for f in z.filelist]:
            # TODO: here, might want to warn and return an df filled with NAN
            if not_exists == 'raise':
                raise FileNotFoundError(f"{fname} is not in {zippath}")
            else:
                return None

        # return the file content
        return z.open(fname)
        

def extract_file(nr: Union[int, str], variable: str, zippath: str, not_exists = 'raise') -> pd.DataFrame:
    # get the content
    enc_content = get_file_from_zip(nr=nr, zippath=zippath, not_exists=not_exists)
    if enc_content is None:
        return pd.DataFrame(columns=['date', variable.lower(), 'flag'])

    # raw content
    try:
        raw = pd.read_csv(enc_content, encoding='latin1', skiprows=8, sep=' ', decimal=',', header=None)
    except ParserError:
        enc_content.seek(0)
        raw = enc_content.read().decode('latin1').splitlines()

        try:
            # create in-memory buffer and read the CSV from memory
            buffer = StringIO('\n'.join(raw))
        
            buffer.seek(0)

            raw = pd.read_csv(buffer, sep=' ', decimal=',', header=None)
        except ParserError:
            # if another ParserError is raised, column releaselevel is suddenly appearing -> add it when it's missing
            raw = [r.split(' ') for r in raw[8:]]

            # add None to the end of rows with only 3 columns
            for row in raw:
                if len(row) == 3:
                    row.append(None)
            
            # make a df
            raw = pd.DataFrame(raw)
    finally:
        enc_content.close()
    
    # rename the headers
    # Bayern has more surprises: sometimes they skip column releaselevel, as releaselevel is the quality flag, we set it to None in this case (ungeprüfte Rohdaten)
    if len(raw.columns) == 3:
        raw.columns = ['timestamp', 'value', 'status']
        
        # parse data
        df = pd.DataFrame({
            'date': [dt.strptime(str(t)[:8], '%Y%m%d') for t in raw.timestamp],
            variable.lower(): raw.value.values.astype(float),
            'flag': [None for _ in raw.value],
        })
    else:
        raw.columns = ['timestamp', 'value' ,'status', 'releaselevel']

        # map the flags (inforation in Hinweis_zu_WISKI-Daten.pdf)
        true_flags = ['released_historical', 'released', 'checked']
        false_flags = ['edit_mode', 'migration']

        # set flag to False if releaselevel is 'edit_mode', 'migration' or None
        flag_list = [
            True if value in true_flags else (False if value in false_flags else False)
            for value in raw['releaselevel']
        ]

        # parse data
        df = pd.DataFrame({
            'date': [dt.strptime(str(t)[:8], '%Y%m%d') for t in raw.timestamp],
            variable.lower(): raw.value.values.astype(float),
            'flag': flag_list,
        })
    
    # replace -777 with NAN
    df.loc[df[variable.lower()] == -777, variable.lower()] = np.nan

    # there are some negative values (that are not -777) for q and w, replace them with NAN
    df.loc[(df[variable.lower()] < 0) & (df[variable.lower()] != -777), variable.lower()] = np.nan

    return df

# test 
#key = 14106504
key = 11418250

df = extract_file(key, 'q', os.path.join(BASE, 'Abflüsse.zip'))

df

Unnamed: 0,date,q,flag
0,2008-07-01,,False
1,2008-07-02,,False
2,2008-07-03,,False
3,2008-07-04,,False
4,2008-07-05,,False
...,...,...,...
4927,2021-12-27,6.21,False
4928,2021-12-28,7.97,False
4929,2021-12-29,25.20,False
4930,2021-12-30,69.80,False


There is potentially interesting metadata in the header. Let's extract timezone and unit information and re-write the metadata extraction function for this

In [6]:
# define the function 
def read_meta(base_path, scan_files: bool = True) -> pd.DataFrame:
    # get the Stammdaten
    path = os.path.join(base_path, 'Stammdaten_Bayern.xlsx')
    meta = pd.read_excel(path)
    
    # now check for each file, if there is Stuff in the files
    # list of q and w with tz, and unit array each
    if not scan_files:
        return meta
    extras = [[[], []], [[], []]]
    for nr in tqdm(metadata.Stationsnummer):
        for i, _zip in enumerate(('Abflüsse.zip', 'Wasserstände.zip')):
            f = get_file_from_zip(nr, os.path.join(base_path, 'Abflüsse.zip'), 'return_none')
            if f is None:
                tup = (None, None,)
            else:
                tup = f.read().decode('latin1').splitlines()[5:7]
                f.close()
            
            # append
            extras[i][0].append(tup[0])
            extras[i][1].append(tup[1])

    # now append the arrays to meta
    meta['timezone_q'] = extras[0][0]
    meta['unit_q'] = extras[0][1]
    meta['timezone_w'] = extras[1][0]
    meta['unit_w'] = extras[1][1]
    
    return meta

# test it here
metadata = read_meta(BASE)

metadata

100%|██████████| 540/540 [00:03<00:00, 145.23it/s]


Unnamed: 0,Stationsname,Stationsnummer,Stationsausprägung,Messortname,Gewässer (Name|Nummer),Gewässertyp,Ordnung,EZG km²,Flusskilometer km,Lage am Gewässer,...,Status,Gültig ab (Datum),PNP,Höhensystem,PNP ab (Datum),PNP ab (Datum/Zeit),timezone_q,unit_q,timezone_w,unit_w
0,Neu-Ulm Bad Held_Donau,10026301,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,7616.6,2586.7,Rechts,...,Aktiv,01.11.1981,464.88,Höhensystem DHHN12,01.11.1981,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
1,Günzburg uh Günz,10032009,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,B - Pegel,9395,2561.098,Links,...,Aktiv,01.11.1811,438.64,Höhensystem DHHN12,01.11.1989,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
2,Dillingen,10035801,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,11378.7,2538.3,Links,...,Aktiv,01.11.1979,415.01,Höhensystem DHHN12,01.11.1979,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
3,Donauwörth,10039802,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,15131,2508.134,Links,...,Aktiv,01.11.1979,394.78,Höhensystem DHHN12,01.11.1979,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
4,Neuburg,10043708,1121-Pegel Fluss,WWA Ingolstadt,Donau,Fluss,B - Pegel,19923.8,2477.495,Links,...,Aktiv,01.11.1974,375.44,Höhensystem DHHN2016,09.08.2019,00:00:00,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
535,Förmitz Speicherzufluss,56113404,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,8.2,2.43,Links,...,Aktiv,01.11.1975,529.38,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
536,Förmitz Speicherabfluss,56114000,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,14.1,0.1,Links,...,Aktiv,01.11.1963,498.30,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
537,Rehau_Schwesnitz,56122008,1121-Pegel Fluss,WWA Hof,Schwesnitz,Fluss,A - Pegel,84.3,9.218,Links,...,Aktiv,01.11.1958,511.63,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
538,Kautendorf,56143008,1121-Pegel Fluss,WWA Hof,Südliche Regnitz,Fluss,A - Pegel,92.4,6.44,Rechts,...,Aktiv,01.11.1956,487.05,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|


# Stations without data
There are some stations in the metadata for which we do not have datafiles.  
We delete them for now from the metadata!

I tried to download the data for the stations by hand, but either there was no station or downloading data for that station was not possible. Only for station Wolnzach (13327855) download would be possible, but they only have 3-4 years of data. So I decided to just remove the stations for now.

In [7]:
ids_without_data = [10043710, 13327855, 15248555, 24011608, 24210706]

# drop the ids without data from metadata
metadata = metadata[~metadata['Stationsnummer'].isin(ids_without_data)].reset_index(drop=True)
metadata

Unnamed: 0,Stationsname,Stationsnummer,Stationsausprägung,Messortname,Gewässer (Name|Nummer),Gewässertyp,Ordnung,EZG km²,Flusskilometer km,Lage am Gewässer,...,Status,Gültig ab (Datum),PNP,Höhensystem,PNP ab (Datum),PNP ab (Datum/Zeit),timezone_q,unit_q,timezone_w,unit_w
0,Neu-Ulm Bad Held_Donau,10026301,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,7616.6,2586.7,Rechts,...,Aktiv,01.11.1981,464.88,Höhensystem DHHN12,01.11.1981,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
1,Günzburg uh Günz,10032009,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,B - Pegel,9395,2561.098,Links,...,Aktiv,01.11.1811,438.64,Höhensystem DHHN12,01.11.1989,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
2,Dillingen,10035801,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,11378.7,2538.3,Links,...,Aktiv,01.11.1979,415.01,Höhensystem DHHN12,01.11.1979,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
3,Donauwörth,10039802,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,15131,2508.134,Links,...,Aktiv,01.11.1979,394.78,Höhensystem DHHN12,01.11.1979,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
4,Neuburg,10043708,1121-Pegel Fluss,WWA Ingolstadt,Donau,Fluss,B - Pegel,19923.8,2477.495,Links,...,Aktiv,01.11.1974,375.44,Höhensystem DHHN2016,09.08.2019,00:00:00,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
530,Förmitz Speicherzufluss,56113404,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,8.2,2.43,Links,...,Aktiv,01.11.1975,529.38,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
531,Förmitz Speicherabfluss,56114000,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,14.1,0.1,Links,...,Aktiv,01.11.1963,498.30,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
532,Rehau_Schwesnitz,56122008,1121-Pegel Fluss,WWA Hof,Schwesnitz,Fluss,A - Pegel,84.3,9.218,Links,...,Aktiv,01.11.1958,511.63,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
533,Kautendorf,56143008,1121-Pegel Fluss,WWA Hof,Südliche Regnitz,Fluss,A - Pegel,92.4,6.44,Rechts,...,Aktiv,01.11.1956,487.05,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|


### Finally run

Now, the Q and W data can be extracted along with the metadata. The cool thing is, that all the id creation, data creation, merging and the mapping from our ids to the original ids and files is done by the context. This is helpful, as we less likely screw something up.

In [8]:
with Bundesland('Bayern') as bl:
    # save the metadata
    bl.save_raw_metadata(metadata, id_column, overwrite=True)

    # for reference, call the nuts-mapping as table
    nuts_map = bl.nuts_table    
    
    # join the path for two zips
    q_zip_path = os.path.join(bl.input_path, 'Abflüsse.zip')
    w_zip_path = os.path.join(bl.input_path, 'Wasserstände.zip')
    
    with warnings.catch_warnings(record=True) as warns:
        # go for all ids
        for provider_id in tqdm(nuts_map.provider_id):
            # extract the file for this provider
            try:
                q_df = extract_file(provider_id, 'q', q_zip_path, not_exists='fill_nan')
                w_df = extract_file(provider_id, 'w', w_zip_path, not_exists='fill_nan')
            except Exception:
                print(provider_id)
                break

            # save
            bl.save_timeseries(q_df, provider_id)
            bl.save_timeseries(w_df, provider_id)

        # check if there were warnings (there are warnings) -> not anymore, all warnings fixed by AD!
        if len(warns) > 0:
            log_path = bl.save_warnings(warns)
            print(f"There were warnings during the processing. The log can be found at: {log_path}")


100%|██████████| 535/535 [02:10<00:00,  4.11it/s]


## Add EZG from provider to all Stations where available

Bayern hat uns Basis EZGs gegeben, die noch zusammengefügt werden müssen (?) -> bekomme ich nicht hin.  
Gibt auch noch ein Shapefile für die Pegel, es könnte eine Verknüpfung zwischen 'einzugsgeb' von Pegeln zu Spalte 'GEBKZ_K' von EZGs geben, das haut aber für die meisten Stationen nicht hin, einzugsgeb-Nummer gibt es dann nicht in EZG-Shapefile...

In [40]:
# Spalte 'stationsnu' und 'einzugsgeb'
gdf_pegel = gpd.read_file(os.path.join(BASE, '../../../Shapes/Bayern_Shapes/pegel_bayern_epsg4258_shp/pegel_epsg4258.shp'))

# Spalte 'GEBKZ_K'
gdf_ezg = gpd.read_file(os.path.join(BASE, '../../../Shapes/Bayern_Shapes/EZG/ezg25_15_2016_by.shp'))


In [95]:
station_id = 21643004

# get einzugeb number from gdf_pegel
einzugsgeb = gdf_pegel[gdf_pegel['stationsnu']==station_id]['einzugsgeb'].values[0]

# get catchment from gdf_ezg (?) -> this does not work for most station_ids
gdf_ezg[gdf_ezg['GEBKZ_K']==einzugsgeb]

Unnamed: 0,GEBKZ_K,GEBKZ_15,GEBKZ_S,GEWKZ_K,VOLLST,KM2_BY,KM2,EZG_AUSL,KM2_SUM,KM2_NBY,Shape_Leng,Shape_Area,geometry


In [94]:
Station('21643004')



<camelsp.output.Station at 0x7fc4ba4ab210>

In [83]:
gdf_ezg[gdf_ezg['GEBKZ_K']=='12944']

Unnamed: 0,GEBKZ_K,GEBKZ_15,GEBKZ_S,GEWKZ_K,VOLLST,KM2_BY,KM2,EZG_AUSL,KM2_SUM,KM2_NBY,Shape_Leng,Shape_Area,geometry
45648,12944,129440000000000,5,12944,Geometrie vollständig,2.109,2.109,nein,2.109,0.0,7912.153863,2109466.0,"POLYGON ((639151.555 5384061.558, 639227.108 5..."


In [15]:
gdf_pegel = gpd.read_file(os.path.join(BASE, '../../../Shapes/Bayern_Shapes/pegel_bayern_epsg4258_shp/pegel_epsg4258.shp'))

Unnamed: 0,gewaesser,stationsnu,stationsna,link_link,linkalias_,betreiber,wert,hoehensyst,flusskilom,lage_am_ge,einzugsgeb,ordnung,wasserstan,ostwert,nordwert,geometry
0,Main,24028000,Astheim UP,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WSA Schweinfurt,189.510,Höhensystem DHHN12,31122,Links,12944,---,Ja,587521,5523599,POINT (10.21765 49.85840)
1,Main-Donau-Kanal,24209300,Bamberg,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WSA Donau MDK,228.500,Höhensystem DHHN12,731,Links,7523,---,Ja,636987,5527252,POINT (10.90684 49.88197)
2,Großer Brombachsee,24214456,Brombachsee,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WWA Ansbach,0.000,---,3425,Seepegel,579,---,Ja,638212,5444451,POINT (10.89489 49.13733)
3,Main,24060002,Faulbach,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WSA Aschaffenburg,128.257,Höhensystem DHHN2016,146633,Rechts,20730,---,Ja,531572,5514812,POINT (9.43859 49.78494)
4,Main,24070006,Obernau,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WSA Aschaffenburg,107.780,Höhensystem DHHN2016,92385,Rechts,22300,---,Ja,509263,5531278,POINT (9.12907 49.93380)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
586,Aalbach,24585006,Wüstenzell,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WWA Aschaffenburg,186.107,Höhensystem DHHN2016,76,Links,1201,B - Pegel,Ja,547050,5513792,POINT (9.65346 49.77476)
587,Flossach,11649004,Zaisertshofen,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WWA Kempten,553.990,Höhensystem DHHN12,106,Rechts,2032,B - Pegel,Ja,612355,5332186,POINT (10.51009 48.13301)
588,Lehstenbach,53211010,Zigeunermühle,https://www.gkd.bayern.de/de/fluesse/abfluss/e...,Link zum Gewässerkundlichen Dienst Bayern,WWA Hof,692.920,Höhensystem DHHN12,85,Rechts,41,B - Pegel,Ja,705928,5557121,POINT (11.88142 50.13054)
589,Großer Regen,15214003,Zwiesel_Großer Regen,https://www.gkd.bayern.de/de/fluesse/wassersta...,Link zum Gewässerkundlichen Dienst Bayern,WWA Deggendorf,558.150,Höhensystem DHHN12,07,Links,176,B - Pegel,Ja,809249,5438136,POINT (13.23014 49.01848)


In [14]:
pd.read_csv('../output_data/raw_metadata/DE2_raw_metadata.csv')

Unnamed: 0,Stationsname,Stationsnummer,Stationsausprägung,Messortname,Gewässer (Name|Nummer),Gewässertyp,Ordnung,EZG km²,Flusskilometer km,Lage am Gewässer,geogr. Breite,geogr. Länge,Ostwert,Nordwert,Rechtswert (lokal),Hochwert (lokal),Status,Gültig ab (Datum),PNP,Höhensystem,PNP ab (Datum),PNP ab (Datum/Zeit),timezone_q,unit_q,timezone_w,unit_w
0,Neu-Ulm Bad Held_Donau,10026301,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,7616.6,2586.7,Rechts,48.389342,9.986741,573050.446343,5.360046e+06,4351004.0,5.363236e+06,Aktiv,01.11.1981,464.88,Höhensystem DHHN12,01.11.1981,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
1,Günzburg uh Günz,10032009,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,B - Pegel,9395,2561.098,Links,48.465499,10.283634,594887.794726,5.368836e+06,4373181.0,5.371169e+06,Aktiv,01.11.1811,438.64,Höhensystem DHHN12,01.11.1989,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
2,Dillingen,10035801,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,11378.7,2538.3,Links,48.568177,10.498714,610562.698611,5.380538e+06,4389310.0,5.382252e+06,Aktiv,01.11.1979,415.01,Höhensystem DHHN12,01.11.1979,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
3,Donauwörth,10039802,1121-Pegel Fluss,WWA Donauwörth,Donau,Fluss,A - Pegel,15131,2508.134,Links,48.710890,10.801518,632525.732272,5.396883e+06,4411906.0,5.397727e+06,Aktiv,01.11.1979,394.78,Höhensystem DHHN12,01.11.1979,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
4,Neuburg,10043708,1121-Pegel Fluss,WWA Ingolstadt,Donau,Fluss,B - Pegel,19923.8,2477.495,Links,48.739118,11.177996,660129.475678,5.400744e+06,4439646.0,5.400499e+06,Aktiv,01.11.1974,375.44,Höhensystem DHHN2016,09.08.2019,00:00:00,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
530,Förmitz Speicherzufluss,56113404,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,8.2,2.43,Links,50.183855,11.899471,706987.215677,5.563097e+06,4492926.0,5.560860e+06,Aktiv,01.11.1975,529.38,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
531,Förmitz Speicherabfluss,56114000,1121-Pegel Fluss,WWA Hof,Förmitz,Fluss,B - Pegel,14.1,0.1,Links,50.201292,11.905065,707310.945329,5.565051e+06,4493328.0,5.562799e+06,Aktiv,01.11.1963,498.30,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
532,Rehau_Schwesnitz,56122008,1121-Pegel Fluss,WWA Hof,Schwesnitz,Fluss,A - Pegel,84.3,9.218,Links,50.245443,12.019640,715286.971149,5.570284e+06,4501507.0,5.567706e+06,Aktiv,01.11.1958,511.63,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|
533,Kautendorf,56143008,1121-Pegel Fluss,WWA Hof,Südliche Regnitz,Fluss,A - Pegel,92.4,6.44,Rechts,50.285782,11.980721,712333.082238,5.574656e+06,4498732.0,5.572193e+06,Aktiv,01.11.1956,487.05,Höhensystem DHHN12,01.11.1980,00:00:00,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|,#TZUTC+1|*|RINVAL-777|*|,#CUNITm³/s|*|


In [None]:
gdf_meta = gpd.read_file(os.path.join(BASE, '../../../Shapes/Bayern_Shapes/pegel_bayern_epsg4258_shp/pegel_epsg4258.shp'))

# make column MESSTELLEN int
gdf_meta['MESSTELLEN'] = gdf_meta['MESSTELLEN'].astype(int)

# save errors
errors = []

for id in gdf_meta['MESSTELLEN'].values:
    # init station via PKZ, ignore warnings as we use provider_id instead of camels_id
    try:
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')
            s = Station(id)
    except ValueError as e:
        errors.append(e)
        continue

    # get catchment geometry for id
    catchment = gdf_meta[gdf_meta['MESSTELLEN'] == id].iloc[[0]]

    # save catchment geometry
    s.save_catchment_geometry(catchment, datasource='federal_agency')

# print results and number of errors
if len(errors) > 0:
    print("Errors:")
    for e in errors:
        print(e)


Errors:
103 is neither a provider_id nor a CAMELS-DE NUTSID
183 is neither a provider_id nor a CAMELS-DE NUTSID
184 is neither a provider_id nor a CAMELS-DE NUTSID
185 is neither a provider_id nor a CAMELS-DE NUTSID
386 is neither a provider_id nor a CAMELS-DE NUTSID
399 is neither a provider_id nor a CAMELS-DE NUTSID
1110 is neither a provider_id nor a CAMELS-DE NUTSID
1112 is neither a provider_id nor a CAMELS-DE NUTSID
1113 is neither a provider_id nor a CAMELS-DE NUTSID
1114 is neither a provider_id nor a CAMELS-DE NUTSID
1115 is neither a provider_id nor a CAMELS-DE NUTSID
1148 is neither a provider_id nor a CAMELS-DE NUTSID
1304 is neither a provider_id nor a CAMELS-DE NUTSID
1304 is neither a provider_id nor a CAMELS-DE NUTSID
1311 is neither a provider_id nor a CAMELS-DE NUTSID
1452 is neither a provider_id nor a CAMELS-DE NUTSID
2366 is neither a provider_id nor a CAMELS-DE NUTSID
4401 is neither a provider_id nor a CAMELS-DE NUTSID
4402 is neither a provider_id nor a CAMELS-D