In [1]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
import imp

sn.set_context('notebook')

# Estimating loads in unmonitored regions - 2016

The output from the new model can be used to estimate loads in unmonitored areas. We know the regine ID for each of the 155 stations where water chemistry is measured, and we also know which OSPAR region each monitoring site drains to. We want to use observed data to estimate loads upstream of each monitoring point, and modelled data elsewhere. This can easily be achieved using the output from the new model.

**Note:** In the code below, I'm assuming that we want to use the observed data for all 155 sites. In reality, values for the `RID_108` stations are estimated using linear interpolation, so we may prefer to use the modelled output for these anyway. Furthermore, in the OSPAR template we only report observed values for the 11 main rivers. The choice of which sites to consider "observed" can easily be controlled by reading different sheet(s) into `stn_df` in the code below.

In [2]:
# Import model
nope_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\Python\rid\notebooks\nope.py')
nope = imp.load_source('nope', nope_path)

# Connect to db
resa2_basic_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template'
                    r'\useful_resa2_code.py')
resa2_basic = imp.load_source('useful_resa2_code', resa2_basic_path)
engine, conn = resa2_basic.connect_to_resa2()

## 1. Generate model input file

In [3]:
# Year of interest
year = 2016

# Parameters of interest
par_list = ['Tot-N', 'Tot-P']

# Folder containing NOPE data
nope_fold = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\NOPE\NOPE_Core_Input_Data')

# Ouput path for model file
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\NOPE_Annual_Inputs\nope_input_data_2016.csv')

In [4]:
# Make input file
df = nope.make_rid_input_file(year, engine, nope_fold, out_csv,
                              par_list=par_list)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  downcast=downcast, **kwargs)


## 2. Run model

In [5]:
%%time
# Input file
in_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
          r'\NOPE\NOPE_Annual_Inputs\nope_input_data_2016.csv')

# Run model
g = nope.run_nope(in_csv, par_list)

Wall time: 4.2 s


## 3. Save results

In [6]:
# Save results as csv
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\nope_results_2016.csv')
df = nope.model_to_dataframe(g, out_path=out_csv)

df.head()

Unnamed: 0,regine,regine_ned,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,accum_aqu_tot-n_tonnes,accum_aqu_tot-p_tonnes,...,local_nat_diff_tot-n_tonnes,local_nat_diff_tot-p_tonnes,local_q_reg_m3/s,local_ren_tot-n_tonnes,local_ren_tot-p_tonnes,local_runoff_mm/yr,local_spr_tot-n_tonnes,local_spr_tot-p_tonnes,local_trans_tot-n,local_trans_tot-p
0,001.222Z,001.2220,0.109003,0.007512,2.014422,0.07885,0.930145,0.058948,0.0,0.0,...,0.975274,0.012389,0.052966,0.0,0.0,268.974615,0.097939,0.006618,1.0,1.0
1,002.DGBZ,002.DGB0,0.0,0.0,5.07757,0.108752,0.0,0.0,0.0,0.0,...,5.234608,0.131027,0.780749,0.0,0.0,286.099189,0.0,0.0,0.97,0.83
2,123.A1Z,123.A12,0.484875,0.023375,16.146657,0.311826,11.213173,0.249968,0.0,0.0,...,5.492109,0.128275,0.624774,0.0,0.0,494.426247,0.444984,0.064271,0.81,0.3
3,212.FAC,212.FAB0,0.021639,0.000779,7.536223,0.076501,0.0,0.0,0.0,0.0,...,7.910089,0.099634,0.561737,0.0,0.0,325.881795,0.022777,0.001025,0.95,0.76
4,135.1AC,135.1AB,0.0,0.0,4.721225,0.04925,0.0,0.0,0.0,0.0,...,5.554382,0.117261,0.880246,0.0,0.0,1975.761038,0.0,0.0,0.85,0.42


## 4. Explore results

In [7]:
# Read model output
in_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
          r'\NOPE\nope_results_2016.csv')
nope_df = pd.read_csv(in_csv)

# Read station data
in_xlsx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Data\RID_Sites_List.xlsx')
stn_df = pd.read_excel(in_xlsx, sheetname='RID_All')

# Join accumulated outputs to stns
mon_df = pd.merge(stn_df, nope_df, how='left',
                  left_on='nve_vassdrag_nr',
                  right_on='regine')

# Groupby OSPAR region
mon_df = mon_df.groupby('ospar_region').sum()

# Get just accum cols
cols = [i for i in mon_df.columns if i.split('_')[0]=='accum']
mon_df = mon_df[cols]

mon_df.head()

Unnamed: 0_level_0,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,accum_aqu_tot-n_tonnes,accum_aqu_tot-p_tonnes,accum_ind_tot-n_tonnes,accum_ind_tot-p_tonnes,accum_nat_diff_tot-n_tonnes,accum_nat_diff_tot-p_tonnes,accum_q_m3/s,accum_ren_tot-n_tonnes,accum_ren_tot-p_tonnes,accum_spr_tot-n_tonnes,accum_spr_tot-p_tonnes,accum_upstr_area_km2
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
LOFOTEN-BARENTS SEA,77.940831,5.69331,4291.21753,61.556413,139.951511,5.259556,0.0,0.0,0.0,0.0,4073.325188,50.603547,897.263854,24.647655,0.757696,50.753621,4.717238,63555.61
NORTH SEA,427.423604,40.71757,13738.091623,204.622091,2656.367698,70.932435,0.0,0.0,25.603844,3.641174,10654.300321,92.972086,1668.65421,232.564816,24.008028,133.403434,10.804017,25223.53
NORWEGIAN SEA2,628.328294,56.047081,12422.700767,260.046138,2945.94904,82.03963,0.0,0.0,46.04183,10.4435,8848.423433,121.959428,1656.251067,327.090573,19.86883,209.966091,22.408039,49150.49
SKAGERAK,3798.995861,108.78737,30191.714572,487.862844,11150.240482,245.101403,0.0,0.0,182.085191,26.602975,15242.47823,133.97407,2021.68098,2641.158801,24.549411,873.645037,52.085883,94172.25


This table gives the **modelled** inputs to each OSPAR region from catchments for which we have observed data. We want to subtract these values from the overall modelled inputs to each region and substitute the observed data instead.

The trickiest part of this is that the OSPAR regions in the TEOTIL catchment network (and therefore the network for my new model too) don't exactly match the new OSPAR definitions. The OSPAR boundaries were updated relatively recently, so instead of simply selecting the desired OSPAR region in the model output, I need to aggregate based on vassdragsnummers.

In [8]:
# Define "new" OSPAR regions
os_dict = {'SKAGERAK':(1, 23),
           'NORTH SEA':(24, 90),
           'NORWEGIAN SEA2':(91, 170),
           'LOFOTEN-BARENTS SEA':(171, 247)}

# Container for results
df_list = []

# Loop over model output
for reg in os_dict.keys():
    min_id, max_id = os_dict[reg]
    
    regs = ['%03d.' % i for i in range(min_id, max_id+1)]
    
    # Get data for this region
    df2 = nope_df[nope_df['regine'].isin(regs)]
    
    # Get just accum cols
    cols = [i for i in df2.columns if i.split('_')[0]=='accum']
    df2 = df2[cols]
    
    # Add region
    df2['ospar_region'] = reg
    
    # Add sum to output
    df_list.append(df2)

# Build df
os_df = pd.concat(df_list, axis=0)

# Aggregate
os_df = os_df.groupby('ospar_region').sum()

os_df.head()

Unnamed: 0_level_0,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,accum_aqu_tot-n_tonnes,accum_aqu_tot-p_tonnes,accum_ind_tot-n_tonnes,accum_ind_tot-p_tonnes,accum_nat_diff_tot-n_tonnes,accum_nat_diff_tot-p_tonnes,accum_q_m3/s,accum_ren_tot-n_tonnes,accum_ren_tot-p_tonnes,accum_spr_tot-n_tonnes,accum_spr_tot-p_tonnes,accum_upstr_area_km2
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
LOFOTEN-BARENTS SEA,15536.465083,2624.071894,24729.268966,2782.102006,595.348458,29.007032,14140.779244,2435.102449,83.27,6.071,8597.455425,129.02308,2107.379728,972.323096,141.231208,329.52763,40.781418,138090.89
NORTH SEA,24154.765441,3943.73499,51360.258258,4337.303801,6227.905182,191.999327,19560.984224,3370.099669,437.681065,93.562709,20977.587634,201.569483,3326.954266,3379.521592,392.782224,696.254634,81.460852,59314.38
NORWEGIAN SEA2,27414.085643,4530.286265,50838.86574,5002.845199,7367.288126,236.64394,22963.896011,3971.30847,1029.58823,119.6302,16057.491971,235.914994,3230.153201,2602.465008,343.750027,716.427779,87.515367,113934.05
SKAGERAK,10555.596638,274.680733,39811.418034,741.044808,12869.205614,320.197798,25.00677,4.222934,1057.199191,67.486975,16386.615782,146.166276,2127.911199,8324.041575,127.213095,1036.363378,69.405388,102574.69


We can now calculate the unmonitored component by simply subtracting the values modelled upstream of monitoring stations from the overall modelled inputs to each OSPAR region.

In [9]:
# Calc unmonitored loads
unmon_df = os_df - mon_df

# Write output
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\unmonitored_loads_2016.csv')
unmon_df.to_csv(out_csv, encoding='utf-8', index_label='ospar_region')

unmon_df.T

ospar_region,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
accum_all_point_tot-n_tonnes,15458.524253,23727.341837,26785.757349,6756.600777
accum_all_point_tot-p_tonnes,2618.378584,3903.01742,4474.239185,165.893363
accum_all_sources_tot-n_tonnes,20438.051437,37622.166635,38416.164974,9619.703462
accum_all_sources_tot-p_tonnes,2720.545593,4132.68171,4742.799061,253.181964
accum_anth_diff_tot-n_tonnes,455.396947,3571.537484,4421.339087,1718.965133
accum_anth_diff_tot-p_tonnes,23.747476,121.066893,154.60431,75.096395
accum_aqu_tot-n_tonnes,14140.779244,19560.984224,22963.896011,25.00677
accum_aqu_tot-p_tonnes,2435.102449,3370.099669,3971.30847,4.222934
accum_ind_tot-n_tonnes,83.27,412.077221,983.5464,875.114
accum_ind_tot-p_tonnes,6.071,89.921536,109.1867,40.884


For comparison with the previous methodology, I have extracted the loads from Table 3 of the 2015 report and added them to Excel. The code below aggregates the dataframe above into approximately the same categories used by Tore in the report. The bar charts illustrate the results obtained from the two methods. **Note the log scale on the y-axis**.

In [10]:
# Aggregate to match report
unmon_df['flow'] = unmon_df['accum_q_m3/s']*60*60*24/1000. # 1000s m3/day

unmon_df['sew_n'] = unmon_df['accum_ren_tot-n_tonnes'] + unmon_df['accum_spr_tot-n_tonnes']
unmon_df['sew_p'] = unmon_df['accum_ren_tot-p_tonnes'] + unmon_df['accum_spr_tot-p_tonnes']

unmon_df['ind_n'] = unmon_df['accum_ind_tot-n_tonnes']
unmon_df['ind_p'] = unmon_df['accum_ind_tot-p_tonnes']

unmon_df['fish_n'] = unmon_df['accum_aqu_tot-n_tonnes']
unmon_df['fish_p'] = unmon_df['accum_aqu_tot-p_tonnes']

unmon_df['diff_n'] = unmon_df['accum_anth_diff_tot-n_tonnes'] + unmon_df['accum_nat_diff_tot-n_tonnes']
unmon_df['diff_p'] = unmon_df['accum_anth_diff_tot-p_tonnes'] + unmon_df['accum_nat_diff_tot-p_tonnes']

new_df = unmon_df[['flow', 'sew_n', 'sew_p', 
                   'ind_n', 'ind_p', 'fish_n', 
                   'fish_p', 'diff_n', 'diff_p']]

# Total for Norway
new_df.loc['NORWAY'] = new_df.sum(axis=0)

# Reorder rows
new_df = new_df.reindex(['NORWAY', 'LOFOTEN-BARENTS SEA', 'NORTH SEA', 
                         'NORWEGIAN SEA2', 'SKAGERAK'])

new_df.astype(int).head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0_level_0,flow,sew_n,sew_p,ind_n,ind_p,fish_n,fish_p,diff_n,diff_p
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
NORWAY,392994,13563,1124,2354,246,56690,9780,33367,687
LOFOTEN-BARENTS SEA,104554,1226,176,83,6,14140,2435,4979,102
NORTH SEA,143277,3709,439,412,89,19560,3370,13894,229
NORWEGIAN SEA2,135985,2781,388,983,109,22963,3971,11630,268
SKAGERAK,9178,5845,119,875,40,25,4,2863,87


For comparison, here are the values for 2015 derived using Tore's approach and published in the 2015 report. Obviously these dataframes should not be the same, as they are for different years, but the numbers should be broadly comparable.

In [11]:
# Read data from report
in_xlsx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\unmonitored_loads_2015_report.xlsx')
old_df = pd.read_excel(in_xlsx, sheetname='2015_report', index_col=0)

old_df.head()

Unnamed: 0_level_0,flow,sew_n,sew_p,ind_n,ind_p,fish_n,fish_p,diff_n,diff_p
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
NORWAY,478573,13527,1015,2324,191,57142,9670,45664,830
LOFOTEN-BARENTS SEA,118195,1204,163,57,4,14020,2366,7517,144
NORTH SEA,179818,3899,350,394,79,19005,3210,19379,288
NORWEGIAN SEA2,171629,2818,368,924,77,24074,4087,15556,319
SKAGERAK,8931,5606,134,949,30,43,7,3212,79
