In [1]:
%matplotlib inline
import pandas as pd
import geopandas as gpd
import nivapy3 as nivapy
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('ggplot')

# Matching catchment and waterbody IDs

Atle needs to find Vann-nett IDs (= WFD water body IDs) corresponding to 826 catchments across Norway. The spreadsheet

    ./Match_Waterbodies_Catchments/sjoorretvassdrag_hin_tidied.xlsx
    
provides NVE vassdragsnummers for each catchment of interest. The aim of this notebook is to try to identify matching WFD IDs. As a guide, Atle has already made a start manually in the `'Vann-nett_ID'` column.

Unfortuantely, a definitive look-up table between vassdragsnummers and WFD IDs is not yet available - see the e-mail from Lars Stalsberg at NVE (received 11.02.2019 at 14.02) for details. This means that some data processing will be required, and also that it is probably not possible to uniquely identify Vann-nett IDs for all catchments, without some manual checking.

Atle has obtained a dataset of WFD waterbodies (vannforekomster) from Miljødirektoratet (see e-mail received 04.03.2019 at 09.13). These data are provided as an ESRI File Geodatabase, with separate feature classes for groundwater, river, lake and coastal waterbodies. In the data that Atle has processed manually, he has focused on Vann-nett IDs ending in `'R'`, which denote rivers. I have therefore only considered the rivers dataset in this notebook.

## 1. Read river dataset

In addition to providing Vann-nett IDs, the rivers dataset also has a `'River_CD'` field that (sometimes) identifies associated vassdragsnummers. If we're lucky, this will provide the information we need, without much data processing.

In [2]:
# Read waterbody data (just rivers here)
wfd_shp = r'../gis/vector/Elvesegmenter20190211.shp'
wfd_gdf = gpd.read_file(wfd_shp)

# Convert to df to reduce memory/increase speed
wfd_df = pd.DataFrame(wfd_gdf.drop(columns='geometry'))
del wfd_gdf

# Get cols of interest
wfd_df = wfd_df[['Name', 'WaterBodyI', 'River_CD', 'CatchmentI']]
wfd_df.columns = ['name', 'wb_id', 'riv_id', 'cat_id']
wfd_df.head()

Unnamed: 0,name,wb_id,riv_id,cat_id
0,Bekkefelt til Ulviksjøen,001-91-R,001.Z,001.M
1,Bekkefelt til Ulviksjøen,001-91-R,,001.M
2,Bekkefelt til Ulviksjøen,001-91-R,001.Z,001.M
3,Bekkefelt til Ulviksjøen,001-91-R,,001.M
4,Bekkefelt til Ulviksjøen,001-91-R,,001.M


## 2. Get vassdragsnummers of interest

In [3]:
# Read trout dataset
sjo_xlsx = r'../sjoorretvassdrag_hin_tidied.xlsx'
df = pd.read_excel(sjo_xlsx, sheet_name='Resterende')

df.rename({'Vassdragsnr':'vass_id',
           'Name':'name',
           'Vann-nett_ID':'atle_wb_id'},
          inplace=True,
          axis='columns')
df.head()

Unnamed: 0,vass_id,name,atle_wb_id,Sjøørret_2013,Økologisk,Kjemisk,Forsuring,Bergverk,Industri,P,Landbruk,Avløp,Urban,Industri.1,Veg,Laks_kategorisering_samlet,Laks_kategorisering_uten_rømt
0,001.221Z,Soverk,001-2-R,Tapt bestand,god,ukjent,,,,,liten,middels,,,,,
1,001.222Z,Skottene,001-58-R,3a Sårbar bestand (bestand er nær truet),moderat,ukjent,god,,,god,liten,middels,,,,,
2,001.223Z,Ystehedebekken,001-3-R,3a Sårbar bestand (bestand er nær truet),moderat,ukjent,svært god,,,moderat,middels,,,,,,
3,001.22Z,Kirkebekken/Idd,001-52-R,5b Hensynskrevende-naturlig liten bestand,moderat,ukjent,svært god,,,moderat,stor,middels,middels,,,,
4,001.2Z,Folkåa,001-53-R,3a Sårbar bestand (bestand er nær truet),moderat,ukjent,moderat,,,god,liten,middels,,,,,


## 3. Search for matches

The code below iterates over the vassdragsnummers in Atle's dataset and searches for matches in the river waterbody dataset. If more than one match is found, all possibilities are appended to the results.

**Note:** Many of the vassdragsnummers in Atle's dataset end in e.g. `'X1'` or `'X2'`, and these cannot be matched in the rivers dataset. Atle's manual classifications suggest these could all be coastal waterbodies, but I can't find these codes in the coastal WFD data either. **Ask Atle about the `'X'` codes**.

In [4]:
# Containers for results
poss_matches = []
n_matches = []

# Loop over trout dataset
for idx, row in df.iterrows():
    # Get vassdrag code   
    vass_id = row['vass_id']
       
    # Query WB dataset
    match_df = wfd_df.query('riv_id == @vass_id').dropna(subset=['wb_id'])
    
    # Get list of associated WBs
    matches = list(match_df['wb_id'].unique())
    
    # WB codes ending in 'R' are rivers; 'L' for lakes
    # We only want rivers
    matches = [i for i in matches if i[-1] == 'R']
    
    # Add numbner of matches
    n_matches.append(len(matches))
    
    # If just one match, extract from list, else keep list of options
    if len(matches) == 1:
        matches = matches[0]
    
    # If no matches, add NaN instead of an empty list
    if len(matches) == 0:
        matches = np.nan
        
    # Add to results
    poss_matches.append(matches)
    
# Add to df
df['poss_matches'] = poss_matches
df['n_matches'] = n_matches

# Reorder cols
st_cols = ['vass_id', 'name', 'atle_wb_id']
end_cols = [i for i in df.columns if i not in st_cols]
df = df[st_cols + ['poss_matches', 'n_matches'] + end_cols]

# Save
csv_path = r'../match_vassdrags_wbs.csv'
df.to_csv(csv_path, encoding='utf-8', index=False)

df.head(10)

Unnamed: 0,vass_id,name,atle_wb_id,poss_matches,n_matches,Sjøørret_2013,Økologisk,Kjemisk,Forsuring,Bergverk,...,P,Landbruk,Avløp,Urban,Industri.1,Veg,Laks_kategorisering_samlet,Laks_kategorisering_uten_rømt,poss_matches.1,n_matches.1
0,001.221Z,Soverk,001-2-R,001-2-R,1,Tapt bestand,god,ukjent,,,...,,liten,middels,,,,,,001-2-R,1
1,001.222Z,Skottene,001-58-R,001-58-R,1,3a Sårbar bestand (bestand er nær truet),moderat,ukjent,god,,...,god,liten,middels,,,,,,001-58-R,1
2,001.223Z,Ystehedebekken,001-3-R,001-3-R,1,3a Sårbar bestand (bestand er nær truet),moderat,ukjent,svært god,,...,moderat,middels,,,,,,,001-3-R,1
3,001.22Z,Kirkebekken/Idd,001-52-R,001-52-R,1,5b Hensynskrevende-naturlig liten bestand,moderat,ukjent,svært god,,...,moderat,stor,middels,middels,,,,,001-52-R,1
4,001.2Z,Folkåa,001-53-R,001-53-R,1,3a Sårbar bestand (bestand er nær truet),moderat,ukjent,moderat,,...,god,liten,middels,,,,,,001-53-R,1
5,001.31Z,Remmenbekken,001-4-R,001-4-R,1,Tapt bestand,dårlig,ukjent,svært god,,...,svært dårlig,stor,stor,middels,,,,,001-4-R,1
6,001.32Z,Lundestadbekken,001-49-R,001-49-R,1,Tapt bestand,dårlig,ukjent,svært god,,...,svært dårlig,stor,middels,stor,,,,,001-49-R,1
7,001.3Z,Unnebergsbekken (Halden),001-49-R,001-49-R,1,2 Truet bestand,dårlig,ukjent,svært god,,...,svært dårlig,stor,middels,stor,,,,,001-49-R,1
8,001.Z,Tista (del av Haldenvassdraget),001-113-R,"[001-91-R, 001-184-R, 001-136-R, 001-113-R, 00...",8,X Usikker kategoriplassering,dårlig,god,svært god,,...,god,,middels,stor,middels,,,X Usikker kategoriplassering,"[001-91-R, 001-184-R, 001-136-R, 001-113-R, 00...",8
9,002.1110X1,Svalerødbekken (Halden),kystfelt,,0,3a Sårbar bestand (bestand er nær truet),,,,,...,,,,,,,,,,0
