In [1]:
%matplotlib inline
import pandas as pd
import numpy as np

# Norwegian Pollutant Export model

I would like to develop a new Python-based export coefficient model for estimating pollutant loads from rivers in Norway. The model will not attempt to replicate TEOTIL exactly, but it will be based on similar principles and will make use of some of the core TEOTIL input files (e.g. the representation of the catchment network etc.).

This notebook restructures datasets previously used for TEOTIL into something that can be used effectively with the new model. All the relevant data has been added to the Excel file here:

[...]Elveovervakingsprogrammet\Recoding_TEOTIL\Linking_TEOTIL_Input_Files\simplified_inputs_2015.xlsx

The sheets in this workbook are essentially tidied versions of the datasets underpinning TEOTIL, but note that I've done some further processing to remove duplicates and fill missing data gaps, so it is important to use the Excel versions rather than the originals from now on.

In [2]:
# Path to input data
in_xlsx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Recoding_TEOTIL\Linking_TEOTIL_Input_Files\simplified_inputs_2015.xlsx')

## 1. Land use

### 1.1. Land areas

Areas have been estimated from several different datasets, so they are not neceaarily compatible. The first step is therefore to adjust the area estimates so that they make sense.

In [3]:
# Read data
reg_df = pd.read_excel(in_xlsx, sheetname='regine', index_col=0)
lc_df = pd.read_excel(in_xlsx, sheetname='land_cover', index_col=0)
la_df = pd.read_excel(in_xlsx, sheetname='lake_areas', index_col=0)

# Join
area_df = pd.concat([reg_df, lc_df, la_df], axis=1)
area_df.index.name = 'regine'
area_df.reset_index(inplace=True)

# Fill NaN
area_df.fillna(value=0, inplace=True)

# Get total area of categories
area_df['a_sum'] = (area_df['a_wood_km2'] + area_df['a_agri_km2'] +
                    area_df['a_upland_km2'] + area_df['a_glacier_km2'] + 
                    area_df['a_urban_km2'] + area_df['a_sea_km2'] + 
                    area_df['a_lake_km2'])

# If total exceeds overall area, calc correction factor
area_df['a_cor_fac'] = np.where(area_df['a_sum'] > area_df['a_reg_km2'], 
                                area_df['a_reg_km2'] / area_df['a_sum'], 
                                1)

# Apply correction factor
area_cols = ['a_wood_km2', 'a_agri_km2', 'a_upland_km2', 'a_glacier_km2', 
             'a_urban_km2', 'a_sea_km2', 'a_lake_km2', 'a_sum']
    
for col in area_cols:    
    area_df[col] = area_df[col] * area_df['a_cor_fac']

# Calc 'other' column
area_df['a_other_km2'] = area_df['a_reg_km2'] - area_df['a_sum']

# Combine 'glacier' and 'upland' as 'upland'
area_df['a_upland_km2'] = area_df['a_upland_km2'] + area_df['a_glacier_km2']

# Add 'land area' column
area_df['a_land_km2'] = area_df['a_reg_km2'] - area_df['a_sea_km2']

# Tidy
del area_df['a_glacier_km2'], area_df['a_sum'], area_df['a_cor_fac']

### 1.2. Background land use coefficients

Join in data for background land use inputs.

In [4]:
# Read back coeffs
back_df = pd.read_excel(in_xlsx, sheetname='back_coeffs')

# Join
area_df = pd.merge(area_df, back_df, how='left', on='regine')

### 1.3. Agricultural land use coefficients

Join in data for agricultural land use inputs.

In [5]:
# Read back coeffs
agri_df = pd.read_excel(in_xlsx, sheetname='agri_coeffs')
fy_df = pd.read_excel(in_xlsx, sheetname='regine_fysone')

# Join
area_df = pd.merge(area_df, fy_df, how='left', on='regine')
area_df = pd.merge(area_df, agri_df, how='left', on='fylke_sone')

## 2. Discharge

`Regine.txt` gives the long-term, specific average annual flow for each of the regine catchments (i.e. the annual runoff, but expressed in $m^3/s/km^2$). NVE only provide modelled data for the main vassdragsområder, so we estimate regine-specific flow for the year of interest by (i) summing all the long-term regine average runoff volumes to the level of the vassdragsområder, (ii) calculating the ratio of vassdragsområder annual flow for the year of interest to the TEOTIL long-term averages, and (iii) applying this correction factor to the regine-specific discharges. This should ensure that the total accumulated discharge estimated by the model equals the actual discharge (from the NVE modelled data) for the year of interest.

In [6]:
# Read NVE flow data for year of interest
q_df = pd.read_excel(in_xlsx, sheetname='flow')

# Sum LTA to vassom level
lta_df = area_df[['vassom', 'q_reg_m3/s']].groupby('vassom').sum().reset_index()
lta_df.columns = ['vassom', 'q_lta_m3/s']

# Join
q_df = pd.merge(lta_df, q_df, how='left', on='vassom')

# Calculate corr fac
q_df['q_fac'] = q_df['q_yr_m3/s'] / q_df['q_lta_m3/s']

# Join and reset index
df = pd.merge(area_df, q_df, how='left', on='vassom')
df.index = df['regine']
del df['regine']

# Calculate regine-specific flow for this year
for col in ['q_sp_m3/s/km2', 'runoff_mm/yr', 'q_reg_m3/s']:
    df[col] = df[col] * df['q_fac']

    # Fill NaN
    df[col].fillna(value=0, inplace=True)

# Tidy
del df['q_fac'], df['q_yr_m3/s'], df['q_lta_m3/s']   

## 3. Point sources

### 3.1. Aquaculture, Renseanlegg and Indutsri

We first group values from each site/outlet according to regine, then join to the main dataframe.

In [7]:
# Read data
aqu_df = pd.read_excel(in_xlsx, sheetname='aqua')
ren_df = pd.read_excel(in_xlsx, sheetname='rense')
ind_df = pd.read_excel(in_xlsx, sheetname='industry')

# Ignore ID cols
del aqu_df['id'], ren_df['id'], ind_df['id']

# Group by and set index
# Aqua
aqu_df = aqu_df.groupby('regine').sum().reset_index()
aqu_df.index = aqu_df['regine']
del aqu_df['regine']

# Rense
ren_df = ren_df.groupby('regine').sum().reset_index()
ren_df.index = ren_df['regine']
del ren_df['regine']

# Industry
ind_df = ind_df.groupby('regine').sum().reset_index()
ind_df.index = ind_df['regine']
del ind_df['regine']

# Join
df = pd.concat([df, aqu_df, ren_df, ind_df], axis=1)
df.index.name = 'regine'
df.reset_index(inplace=True)

# Fill NaN
for typ in ['aqu', 'ren', 'ind']:
    for par in ['n', 'p']:
        col = '%s_%s_tonnes' % (typ, par)
        df[col].fillna(value=0, inplace=True)    

### 3.1. Spredt

Spredt inputs are reported at kommune level. TEOTIL distributes these loads according to area ratios in the following way:

 1. If cultivated land exists in the kommune, assume most of the population are located on and around the agricultural land i.e. spredt is allocated over the regine according to ratio of cultivated_land_regine / cultivated_land_kommune) <br><br>
 
 2. If there is no culivated land in the kommune, spredt is allocated according to ratio of overall areas (regine_area / kommune_area)

In [8]:
# Read spredt data
spr_df = pd.read_excel(in_xlsx, sheetname='spredt')

# Get total land area and area of cultivated land in each kommune
kom_df = df[['komnr', 'a_land_km2', 'a_agri_km2']]
kom_df = kom_df.groupby('komnr').sum()
kom_df.reset_index(inplace=True)
kom_df.columns = ['komnr', 'a_kom_km2', 'a_agri_kom_km2']

# Join 'spredt' to kommune areas
kom_df = pd.merge(kom_df, spr_df,
                  how='left', on='komnr')

# Join back to main df
df = pd.merge(df, kom_df,
              how='left', on='komnr')

# Distribute loads
for par in ['n', 'p']:
    # Over agri
    df['spr_agri'] = df['spr_%s_tonnes' % par] * df['a_agri_km2'] / df['a_agri_kom_km2']
    
    # Over all area
    df['spr_all'] = df['spr_%s_tonnes' % par] * df['a_land_km2'] / df['a_kom_km2']
    
    # Use agri if > 0, else all
    df['spr_%s_tonnes' % par] = np.where(df['a_agri_kom_km2'] > 0, 
                                         df['spr_agri'], 
                                         df['spr_all'])

# Delete intermediate cols
del df['spr_agri'], df['spr_all']

# Fill NaN
for col in ['a_kom_km2', 'a_agri_kom_km2', 'spr_n_tonnes', 'spr_p_tonnes']:
    df[col].fillna(value=0, inplace=True)

## 4. Diffuse sources

The land areas, land coefficients and flow data can now be used to estimate diffue loads.

In [9]:
# Loop over pars
for par in ['n', 'p']:
    # Background inputs
    # Woodland
    df['wood_%s_tonnes' % par] = (df['a_wood_km2'] * df['q_sp_m3/s/km2'] * 
                                  df['c_wood_mg/l_%s' % par] * 0.0864*365)
    
    # Upland
    df['upland_%s_tonnes' % par] = (df['a_upland_km2'] * df['q_sp_m3/s/km2'] * 
                                    df['c_upland_mg/l_%s' % par] * 0.0864*365) 
    
    # Lake
    df['lake_%s_tonnes' % par] = (df['a_lake_km2'] * 
                                  df['c_lake_kg/km2_%s' % par] /
                                  1000)
    
    # Urban
    df['urban_%s_tonnes' % par] = (df['a_urban_km2'] * 
                                  df['c_urban_kg/km2_%s' % par] /
                                  1000)

    # Agri from Bioforsk
    # Background
    df['agri_back_%s_tonnes' % par] = (df['a_agri_km2'] * 
                                       df['agri_back_%s_kg/km2' % par] /
                                       1000)
    
    # Point
    df['agri_pt_%s_tonnes' % par] = (df['a_agri_km2'] * 
                                     df['agri_point_%s_kg/km2' % par] /
                                     1000)

    # Diffuse
    df['agri_diff_%s_tonnes' % par] = (df['a_agri_km2'] * 
                                       df['agri_diff_%s_kg/km2' % par] /
                                       1000)

## 5. Retention and transmission

Retention factors are estimated based on the numbers and volumes of lakes within each regine. See section 2.4 of the [TEOTIL report](https://brage.bibsys.no/xmlui/bitstream/handle/11250/214825/5914-2010_72dpi.pdf?sequence=1&isAllowed=y) for details. Where retention factors are not omitted, they are assumed to be zero.

In [10]:
# Read retention
ret_df = pd.read_excel(in_xlsx, sheetname='retention')

# Join
df = pd.merge(df, ret_df, how='left', on='regine')

# Fill NaN
for par in ['n', 'p']:
    df['ret_%s' % par].fillna(value=0, inplace=True)
    
    # Calculate transmission
    df['trans_%s' % par] = 1 - df['ret_%s' % par]    

## 6. Aggregate values

The inputs can be summarised in various useful ways.

In [11]:
# Loop over pars
for par in ['n', 'p']:
    # Subtract natural background from agri_diffuse
    df['agri_diff_%s_tonnes' % par] = (df['agri_diff_%s_tonnes' % par] - 
                                       df['agri_back_%s_tonnes' % par])
    
    # All point sources
    df['all_point_%s_tonnes' % par] = (df['spr_%s_tonnes' % par] + 
                                       df['aqu_%s_tonnes' % par] +
                                       df['ren_%s_tonnes' % par] +
                                       df['ind_%s_tonnes' % par] +
                                       df['agri_pt_%s_tonnes' % par])
    
    # Natural diffuse sources
    df['nat_diff_%s_tonnes' % par] = (df['wood_%s_tonnes' % par] +
                                      df['upland_%s_tonnes' % par] +
                                      df['lake_%s_tonnes' % par] + 
                                      df['agri_back_%s_tonnes' % par])
    
    # Anthropogenic diffuse sources
    df['anth_diff_%s_tonnes' % par] = (df['urban_%s_tonnes' % par] +
                                       df['agri_diff_%s_tonnes' % par])
    
    # All sources
    df['all_sources_%s_tonnes' % par] = (df['all_point_%s_tonnes' % par] +
                                         df['nat_diff_%s_tonnes' % par] + 
                                         df['anth_diff_%s_tonnes' % par])

# Get cols of interest
# Basic_cols
col_list = ['regine', 'regine_ned', 'a_reg_km2', 'runoff_mm/yr', 'q_reg_m3/s']

# Param specific cols
par_cols = ['trans_%s', 'aqu_%s_tonnes', 'ind_%s_tonnes', 'ren_%s_tonnes', 
            'spr_%s_tonnes', 'all_point_%s_tonnes', 'nat_diff_%s_tonnes',
            'anth_diff_%s_tonnes', 'all_sources_%s_tonnes']

# Buil col list
for name in par_cols:
    for par in ['n', 'p']:
        col_list.append(name % par)
        
# Get cols
df = df[col_list]

# Remove rows where regine_ned is null
df = df.query('regine_ned == regine_ned')

# Fill Nan
df.fillna(value=0, inplace=True)

## 7. Write tidied model input file

In [12]:
# Write regine-level data
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Recoding_TEOTIL\Linking_TEOTIL_Input_Files\nope_input_data.csv')
df.to_csv(out_csv, encoding='utf-8', index=False)

df.head()

Unnamed: 0,regine,regine_ned,a_reg_km2,runoff_mm/yr,q_reg_m3/s,trans_n,trans_p,aqu_n_tonnes,aqu_p_tonnes,ind_n_tonnes,...,spr_n_tonnes,spr_p_tonnes,all_point_n_tonnes,all_point_p_tonnes,nat_diff_n_tonnes,nat_diff_p_tonnes,anth_diff_n_tonnes,anth_diff_p_tonnes,all_sources_n_tonnes,all_sources_p_tonnes
0,001.,1_2,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,001.10,001.,1.41,506.397097,0.022641,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.170181,0.001742,0.0,0.0,0.170181,0.001742
2,001.1A1,001.10,1.16,506.397097,0.018627,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.291669,0.002922,0.0,0.0,0.291669,0.002922
3,001.1A20,001.1A1,0.35,361.712212,0.004014,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.062187,0.000628,0.0,0.0,0.062187,0.000628
4,001.1A2A,001.1A20,17.4,434.054655,0.23949,0.87,0.36,0.0,0.0,0.0,...,0.401484,0.030155,0.674877,0.052254,4.186221,0.05502,3.392855,0.214071,8.253954,0.321346


This input file can now be used to drive the new model, as described in the [next notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/nope_model.ipynb).