# Merge Metadata

This script can be used to build the final Metadata file.

There are several notes that are important:

* Some of the location files have been processed externally and the workflow needs to be explained here
* We should note done all CRS transformations applied for reference 

In [1]:
import os
import pandas as pd
from tqdm import tqdm

from camelsp import Bundesland, get_metadata

As an example: The `Bundesland` context manager can load the metadata for the given Bundesland only from the full metadata table. If this table does not yet exist, it is created from the NUTSID mapping table. Check out for Saarland:

In [2]:
with Bundesland('DEC') as bl:
    dec_meta = bl.metadata

dec_meta.head()

Unnamed: 0,provider_id,camels_id,camels_path,nuts_lvl2,federal_state,area,x,y
534,1271120,DEC10000,./DEC/DEC10000/DEC10000_data.csv,DEC,Saarland,,,
535,1122120,DEC10010,./DEC/DEC10010/DEC10010_data.csv,DEC,Saarland,,,
536,1482120,DEC10020,./DEC/DEC10020/DEC10020_data.csv,DEC,Saarland,,,
537,1251120,DEC10030,./DEC/DEC10030/DEC10030_data.csv,DEC,Saarland,,,
538,1071120,DEC10040,./DEC/DEC10040/DEC10040_data.csv,DEC,Saarland,,,


## Generate basic metadata

This step will produce one metadata file containing all processed data, which can be used as NUTS lookup and as a basis to add more specific metadata.
The first step also loads the Location files and merges everything

In [3]:
# load the nuts lvl 2 names
from camelsp.util import _NUTS_LVL2_NAMES

for NUTS in tqdm(_NUTS_LVL2_NAMES.keys()):
    with Bundesland(NUTS) as bl:
        try:
            p = os.path.join(bl.base_path, 'locations', f'{bl.NUTS}_Locations.csv')
            # read in 
            df = pd.read_csv(p)
            df.columns = ['provider_id', 'area', 'x', 'y']
        except FileNotFoundError:
            continue
        
        # update by simply setting the new metadata to the property setter
        # in this case, the joining column needs to be 'camels_id' or 'provider_id'
        bl.metadata = df

        # or use the function if you prefer
        #bl.update_metadata(df, id_column='provider_id')

metadata = get_metadata()
metadata

100%|██████████| 16/16 [00:00<00:00, 59.13it/s]


Unnamed: 0,provider_id,camels_id,camels_path,nuts_lvl2,federal_state,area,x,y
0,444205,DEE10000,./DEE/DEE10000/DEE10000_data.csv,DEE,Sachsen-Anhalt,,,
1,575870,DEE10010,./DEE/DEE10010/DEE10010_data.csv,DEE,Sachsen-Anhalt,,,
2,579782,DEE10020,./DEE/DEE10020/DEE10020_data.csv,DEE,Sachsen-Anhalt,,,
3,597008,DEE10030,./DEE/DEE10030/DEE10030_data.csv,DEE,Sachsen-Anhalt,,,
4,570611,DEE10040,./DEE/DEE10040/DEE10040_data.csv,DEE,Sachsen-Anhalt,,,
...,...,...,...,...,...,...,...,...
1763,5865501,DE413770,./DE4/DE413770/DE413770_data.csv,DE4,Brandenburg,,,
1764,5865600,DE413780,./DE4/DE413780/DE413780_data.csv,DE4,Brandenburg,,,
1765,5865502,DE413790,./DE4/DE413790/DE413790_data.csv,DE4,Brandenburg,,,
1766,5865900,DE413800,./DE4/DE413800/DE413800_data.csv,DE4,Brandenburg,,,


In [4]:
metadata.dropna(axis=0, how='any')


Unnamed: 0,provider_id,camels_id,camels_path,nuts_lvl2,federal_state,area,x,y
590,2828300000200,DEA10000,./DEA/DEA10000/DEA10000_data.csv,DEA,Nordrhein-Westfalen,96.29,4.046224e+06,3.091228e+06
591,2762510000100,DEA10010,./DEA/DEA10010/DEA10010_data.csv,DEA,Nordrhein-Westfalen,251.60,4.204795e+06,3.154429e+06
592,9282570000100,DEA10020,./DEA/DEA10020/DEA10020_data.csv,DEA,Nordrhein-Westfalen,242.04,4.097180e+06,3.195812e+06
593,2824450000100,DEA10030,./DEA/DEA10030/DEA10030_data.csv,DEA,Nordrhein-Westfalen,45.60,4.053318e+06,3.071278e+06
594,2825330000100,DEA10040,./DEA/DEA10040/DEA10040_data.csv,DEA,Nordrhein-Westfalen,1471.75,4.059525e+06,3.103258e+06
...,...,...,...,...,...,...,...,...
1354,24810600,DE710920,./DE7/DE710920/DE710920_data.csv,DE7,Hessen,124.00,4.251688e+06,3.036450e+06
1355,42882806,DE710930,./DE7/DE710930/DE710930_data.csv,DE7,Hessen,984.98,4.273917e+06,3.106929e+06
1356,24782800,DE710940,./DE7/DE710940/DE710940_data.csv,DE7,Hessen,111.90,4.271860e+06,3.018585e+06
1357,24861407,DE710950,./DE7/DE710950/DE710950_data.csv,DE7,Hessen,392.60,4.240648e+06,3.012890e+06


In [7]:
with Bundesland('DEF') as bl:
    m = bl.nuts_table

In [21]:
provider_id = '110005'

m.where(m.provider_id == provider_id).dropna()

Unnamed: 0,nuts_id,provider_id,path
772,DEF17720,110005,./DEF/DEF17720/DEF17720_data.csv
