In [1]:
%matplotlib inline
import nope
import nivapy3 as nivapy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn

sn.set_context('notebook')

# Estimating loads in unmonitored regions - 2019

The new model can be used to estimate loads in unmonitored areas. We know the regine ID for each of the 155 stations where water chemistry is measured, and we also know which OSPAR region each monitoring site drains to. We want to use observed data to estimate loads upstream of each monitoring point, and modelled data elsewhere. This can be achieved using the output from the new model.

This notebook is based on the one [here](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/loads_unmonitored_regions.ipynb). It first runs the NOPE model for 2019 and then extracts data for unmonitored regions.

In [2]:
# Connect to db
engine = nivapy.da.connect()

Username:  ···
Password:  ········


Connection successful.


## 1. Generate model input file

In [3]:
# Year of interest
year = 2019

# Parameters of interest
par_list = ['Tot-N', 'Tot-P']

# Folder containing NOPE data
nope_fold = r'../../../NOPE/NOPE_Core_Input_Data'

# Ouput path for model file
out_csv = r'../../../NOPE/NOPE_Annual_Inputs/nope_input_data_%s.csv' % year

In [4]:
# Make input file
df = nope.make_rid_input_file(year, engine, nope_fold, out_csv,
                              par_list=par_list)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(


## 2. Run model

In [5]:
%%time
# Input file
in_csv = r'../../../NOPE/NOPE_Annual_Inputs/nope_input_data_%s.csv' % year

# Run model
g = nope.run_nope(in_csv, par_list)

CPU times: user 7.38 s, sys: 94.3 ms, total: 7.48 s
Wall time: 7.53 s


## 3. Save results

In [6]:
# Save results as csv
out_csv = r'../../../NOPE/nope_results_%s.csv' % year
df = nope.model_to_dataframe(g, out_path=out_csv)

df.head()

Unnamed: 0,regine,regine_ned,accum_agri_diff_tot-n_tonnes,accum_agri_diff_tot-p_tonnes,accum_agri_pt_tot-n_tonnes,accum_agri_pt_tot-p_tonnes,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,...,local_q_reg_m3/s,local_ren_tot-n_tonnes,local_ren_tot-p_tonnes,local_runoff_mm/yr,local_spr_tot-n_tonnes,local_spr_tot-p_tonnes,local_trans_tot-n,local_trans_tot-p,local_urban_tot-n_tonnes,local_urban_tot-p_tonnes
0,315.0,315.,0.0,0.0,0.0,0.0,0.0,0.0,0.432277,0.010186,...,0.024685,0.0,0.0,131.943783,0.0,0.0,1.0,1.0,0.0,0.0
1,315.,300_315,0.0,0.0,0.0,0.0,0.0,0.0,0.432277,0.010186,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
2,314.C,314.B,2.337607,0.025746,0.011128,0.000403,0.095908,0.002186,15.610029,0.106909,...,0.955845,0.0,0.0,633.00166,0.102145,0.005095,0.83,0.35,0.0,0.0
3,314.B,314.A,11.201296,0.080308,0.053322,0.001257,2.308088,0.033658,52.57747,0.338109,...,1.591411,1.74545,0.01088,553.876452,0.822435,0.111956,0.85,0.26,0.0,0.0
4,314.A,314.5,11.201296,0.080308,0.053322,0.001257,2.308088,0.033658,54.05632,0.355855,...,0.11504,0.0,0.0,553.876452,0.0,0.0,1.0,1.0,0.0,0.0


In [7]:
# Save version with main catchments only
main_list = ["%03d." % i for i in range(1, 316)]
df2 = df.query('regine in @main_list')
df2.sort_values('regine', inplace=True)

# Save
out_csv = r'../../../NOPE/nope_results_%s_main_catchs.csv' % year
df2.to_csv(out_csv, index=False, encoding='utf-8')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2.sort_values('regine', inplace=True)


## 4. Explore results

### 4.1. Total N and P

####  4.1.1. Identify areas with monitoring data

Where observations are available, we want to use them in preference to the model output. This means identifying all the catchments with observed data and substracting the model results for these locations. This is more complicated than it appears, because a small number of observed catchments are upstream of others, so subtracting all the loads for the 155 monitored catchments involves "double accounting", which we want to avoid. The first step is therefore to identify the downstream-most nodes for the monitored areas i.e. for the cases where one catchment is upstream of another, we just want the downstream node.

In [8]:
# Read station data
in_xlsx = r'../../../Data/RID_Sites_List.xlsx'
stn_df = pd.read_excel(in_xlsx, sheet_name='RID_All')

# Get just cols of interest and drop duplicates 
# (some sites are in the same regine)
stn_df = stn_df[['ospar_region', 'nve_vassdrag_nr']].drop_duplicates()

# Get catch IDs with calib data
calib_nds = set(stn_df['nve_vassdrag_nr'].values)

# Build network
in_path = r'../../../NOPE/NOPE_Annual_Inputs/nope_input_data_1990.csv'
g, nd_list = nope.build_calib_network(in_path, calib_nds)

# Get list of downstream nodes
ds_nds = []
for nd in g:
    # If no downstream nodes
    if g.out_degree(nd) == 0:
        # Node is of interest
        ds_nds.append(nd)

# Get just the downstream catchments
stn_df = stn_df[stn_df['nve_vassdrag_nr'].isin(ds_nds)]

#### 4.1.2. Sum model results for monitored locations

In [9]:
# Read model output
in_csv = r'../../../NOPE/nope_results_%s.csv' % year
nope_df = pd.read_csv(in_csv)

# Join accumulated outputs to stns of interest
mon_df = pd.merge(stn_df, nope_df, how='left',
                  left_on='nve_vassdrag_nr',
                  right_on='regine')

# Groupby OSPAR region
mon_df = mon_df.groupby('ospar_region').sum()

# Get just accum cols
cols = [i for i in mon_df.columns if i.split('_')[0]=='accum']
mon_df = mon_df[cols]

mon_df.head()

Unnamed: 0_level_0,accum_agri_diff_tot-n_tonnes,accum_agri_diff_tot-p_tonnes,accum_agri_pt_tot-n_tonnes,accum_agri_pt_tot-p_tonnes,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,...,accum_nat_diff_tot-n_tonnes,accum_nat_diff_tot-p_tonnes,accum_q_m3/s,accum_ren_tot-n_tonnes,accum_ren_tot-p_tonnes,accum_spr_tot-n_tonnes,accum_spr_tot-p_tonnes,accum_upstr_area_km2,accum_urban_tot-n_tonnes,accum_urban_tot-p_tonnes
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
LOFOTEN-BARENTS SEA,160.620164,4.375391,2.539554,0.218376,85.541986,5.22556,4677.722792,64.771122,160.620164,4.375391,...,4431.560642,55.170171,1011.695437,34.078052,1.434333,48.92438,3.572851,63555.61,0.0,0.0
NORTH SEA,3018.014035,62.725145,34.862152,2.17647,406.127765,36.342394,12929.889969,180.994616,3023.986605,63.268621,...,9499.7756,81.3836,1447.940859,200.604925,18.015515,134.836139,10.712791,23353.19,5.97257,0.543476
NORWEGIAN SEA2,3134.435213,81.356793,40.763819,3.061545,615.642046,50.816648,13153.64732,265.592214,3150.90219,83.485944,...,9387.103084,131.289623,1782.359148,296.332749,16.294309,197.893308,19.835594,45896.63,16.466978,2.129151
SKAGERAK,11511.521904,242.936487,102.079627,5.546824,4121.260046,114.515619,33801.324549,528.723736,11673.457143,260.414249,...,18006.60736,153.793868,2513.978203,2992.023173,31.323745,823.801797,45.746593,93945.27,161.935239,17.477763


This table gives the **modelled** inputs to each OSPAR region from catchments for which we have observed data. We want to subtract these values from the overall modelled inputs to each region and substitute the observed data instead.

The trickiest part of this is that the OSPAR regions in the TEOTIL catchment network (and therefore the network for my new model too) don't exactly match the new OSPAR definitions. The OSPAR boundaries were updated relatively recently, so instead of simply selecting the desired OSPAR region in the model output, I need to aggregate based on vassdragsnummers.

**Note:** Eventually, it would be a good idea to update the network information in `regine.csv` to reflect the current OSPAR regions.

#### 4.1.3. Group model output according to "new" OSPAR regions

In [10]:
# Define "new" OSPAR regions
os_dict = {'SKAGERAK':(1, 23),
           'NORTH SEA':(24, 90),
           'NORWEGIAN SEA2':(91, 170),
           'LOFOTEN-BARENTS SEA':(171, 247)}

# Container for results
df_list = []

# Loop over model output
for reg in os_dict.keys():
    min_id, max_id = os_dict[reg]
    
    regs = ['%03d.' % i for i in range(min_id, max_id+1)]
    
    # Get data for this region
    df2 = nope_df[nope_df['regine'].isin(regs)]
    
    # Get just accum cols
    cols = [i for i in df2.columns if i.split('_')[0]=='accum']
    df2 = df2[cols]
    
    # Add region
    df2['ospar_region'] = reg
    
    # Add sum to output
    df_list.append(df2)

# Build df
os_df = pd.concat(df_list, axis=0)

# Aggregate
os_df = os_df.groupby('ospar_region').sum()

os_df.head()

Unnamed: 0_level_0,accum_agri_diff_tot-n_tonnes,accum_agri_diff_tot-p_tonnes,accum_agri_pt_tot-n_tonnes,accum_agri_pt_tot-p_tonnes,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,...,accum_nat_diff_tot-n_tonnes,accum_nat_diff_tot-p_tonnes,accum_q_m3/s,accum_ren_tot-n_tonnes,accum_ren_tot-p_tonnes,accum_spr_tot-n_tonnes,accum_spr_tot-p_tonnes,accum_upstr_area_km2,accum_urban_tot-n_tonnes,accum_urban_tot-p_tonnes
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
LOFOTEN-BARENTS SEA,674.549435,21.85691,10.565114,0.885819,17327.727102,2899.374949,27872.61281,3069.835044,684.521745,23.26075,...,9860.363963,147.199344,2483.868223,1288.737479,145.388775,336.990827,39.285379,138090.89,9.97231,1.403841
NORTH SEA,7241.820074,171.313342,80.323926,5.829535,24816.994381,4064.748618,51513.473465,4426.162564,7281.637556,176.382012,...,19414.841528,185.031934,3001.571844,3174.270639,389.283542,695.146177,80.504799,59314.38,39.817482,5.06867
NORWEGIAN SEA2,8509.040424,257.171609,101.708615,8.082202,31806.673259,5230.627063,57766.658656,5747.199485,8541.199214,261.412052,...,17418.786183,255.160371,3529.803346,2558.483966,316.444122,731.230357,86.52069,113934.05,32.158791,4.240442
SKAGERAK,13231.750574,309.065049,112.985724,6.352341,10553.101334,282.408709,43465.326119,792.236739,13505.485043,341.797016,...,19406.739743,168.031014,2663.147872,8275.05401,131.737573,978.709426,61.527044,102574.69,273.734469,32.731966


We can now calculate the unmonitored component by simply subtracting the values modelled upstream of monitoring stations from the overall modelled inputs to each OSPAR region.

#### 4.1.4. Estimate loads in unmonitored areas

In [11]:
# Calc unmonitored loads
unmon_df = os_df - mon_df

# Write output
out_csv = r'../../../NOPE/unmonitored_loads_%s.csv' % year
unmon_df.to_csv(out_csv, encoding='utf-8', index_label='ospar_region')

unmon_df.round(0).astype(int).T

ospar_region,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
accum_agri_diff_tot-n_tonnes,514,4224,5375,1720
accum_agri_diff_tot-p_tonnes,17,109,176,66
accum_agri_pt_tot-n_tonnes,8,45,61,11
accum_agri_pt_tot-p_tonnes,1,4,5,1
accum_all_point_tot-n_tonnes,17242,24411,31191,6432
accum_all_point_tot-p_tonnes,2894,4028,5180,168
accum_all_sources_tot-n_tonnes,23195,38584,44613,9664
accum_all_sources_tot-p_tonnes,3005,4245,5482,264
accum_anth_diff_tot-n_tonnes,524,4258,5390,1832
accum_anth_diff_tot-p_tonnes,19,113,178,81


#### 4.1.5. Aggregate values to required quantities

In [12]:
# Aggregate to match report
unmon_df['flow'] = unmon_df['accum_q_m3/s']*60*60*24/1000. # 1000s m3/day

unmon_df['sew_n'] = unmon_df['accum_ren_tot-n_tonnes'] + unmon_df['accum_spr_tot-n_tonnes']
unmon_df['sew_p'] = unmon_df['accum_ren_tot-p_tonnes'] + unmon_df['accum_spr_tot-p_tonnes']

unmon_df['ind_n'] = unmon_df['accum_ind_tot-n_tonnes']
unmon_df['ind_p'] = unmon_df['accum_ind_tot-p_tonnes']

unmon_df['fish_n'] = unmon_df['accum_aqu_tot-n_tonnes']
unmon_df['fish_p'] = unmon_df['accum_aqu_tot-p_tonnes']

unmon_df['diff_n'] = unmon_df['accum_anth_diff_tot-n_tonnes'] + unmon_df['accum_nat_diff_tot-n_tonnes']
unmon_df['diff_p'] = unmon_df['accum_anth_diff_tot-p_tonnes'] + unmon_df['accum_nat_diff_tot-p_tonnes']

new_df = unmon_df[['flow', 'sew_n', 'sew_p', 
                   'ind_n', 'ind_p', 'fish_n', 
                   'fish_p', 'diff_n', 'diff_p']]

# Total for Norway
new_df.loc['NORWAY'] = new_df.sum(axis=0)

# Reorder rows
new_df = new_df.reindex(['NORWAY', 'LOFOTEN-BARENTS SEA', 'NORTH SEA', 
                         'NORWEGIAN SEA2', 'SKAGERAK'])

new_df.round().astype(int)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


Unnamed: 0_level_0,flow,sew_n,sew_p,ind_n,ind_p,fish_n,fish_p,diff_n,diff_p
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
NORWAY,425297,13310,1104,2476,231,63365,10925,36780,725
LOFOTEN-BARENTS SEA,127196,1543,180,27,2,15664,2712,5953,111
NORTH SEA,134234,3534,441,470,83,20361,3501,14173,217
NORWEGIAN SEA2,150979,2795,367,1051,105,27283,4703,13422,302
SKAGERAK,12888,5438,116,927,41,56,10,3232,96


## 5. Other N and P species

Tore's procedure `RESA2.FIXTEOTILPN` defines simple correction factors for estimating PO4, NO3 and NH4 from total P and N. The table below lists the factors used.

|   Source    | Phosphate | Nitrate | Ammonium |
|:-----------:|:---------:|:-------:|:--------:|
|    Sewage   |     0.600 |   0.050 |    0.750 |
|   Industry  |     0.600 |   0.050 |    0.750 |
| Aquaculture |     0.690 |   0.110 |    0.800 |
|   Diffuse   |     0.246 |   0.625 |    0.055 |


In [13]:
# Dict of conversion factors
con_dict = {('sew', 'po4'):('p', 0.6),
            ('ind', 'po4'):('p', 0.6),
            ('fish', 'po4'):('p', 0.69),
            ('diff', 'po4'):('p', 0.246),
            ('sew', 'no3'):('n', 0.05),
            ('ind', 'no3'):('n', 0.05),
            ('fish', 'no3'):('n', 0.11),
            ('diff', 'no3'):('n', 0.625),
            ('sew', 'nh4'):('n', 0.75),
            ('ind', 'nh4'):('n', 0.75),
            ('fish', 'nh4'):('n', 0.8),
            ('diff', 'nh4'):('n', 0.055)}

# Apply factors
for src in ['sew', 'ind', 'fish', 'diff']:
    for spc in ['po4', 'no3', 'nh4']:
        el, fac = con_dict[(src, spc)]
        new_df[src+'_'+spc] = fac * new_df[src+'_'+el]
        
new_df.round().astype(int).T

ospar_region,NORWAY,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
flow,425297,127196,134234,150979,12888
sew_n,13310,1543,3534,2795,5438
sew_p,1104,180,441,367,116
ind_n,2476,27,470,1051,927
ind_p,231,2,83,105,41
fish_n,63365,15664,20361,27283,56
fish_p,10925,2712,3501,4703,10
diff_n,36780,5953,14173,13422,3232
diff_p,725,111,217,302,96
sew_po4,662,108,265,220,70


## 6. Other quantities

The model currently only considers N and P, but the project focuses on a wider range of parameters. For now, we simply assume that all measured inputs (`renseanlegg`, `industri` and `akvakultur`) for regines outside of catchments with measured data make it to the sea.

We only want data for catchments that are not monitored i.e. for regine IDs **not** in the graph created above.

In [14]:
# The sql below uses a horrible (and slow!) hack to get around Oracle's
# 1000 item limit on IN clauses. See here for details:
# https://stackoverflow.com/a/9084247/505698
nd_list_hack = [(1, i) for i in nd_list]

sql = ("SELECT SUBSTR(a.regine, 1, 3) AS vassdrag, "
       "  a.type, "
       "  b.name, "
       "  b.unit, "
       "  SUM(c.value * d.factor) as value "
       "FROM RESA2.RID_PUNKTKILDER a, "
       "RESA2.RID_PUNKTKILDER_OUTPAR_DEF b, "
       "RESA2.RID_PUNKTKILDER_INPAR_VALUES c, "
       "RESA2.RID_PUNKTKILDER_INP_OUTP d "
       "WHERE a.anlegg_nr = c.anlegg_nr "
       "AND (1, a.regine) NOT IN %s "
       "AND d.in_pid = c.inp_par_id "
       "AND d.out_pid = b.out_pid "
       "AND c.year = %s "
       "GROUP BY SUBSTR(a.regine, 1, 3), a.type, b.name, b.unit "
       "ORDER BY SUBSTR(a.regine, 1, 3), a.type" % (tuple(nd_list_hack), year))

df = pd.read_sql(sql, engine)

# Tidy
df['par'] = df['type'] + '_' + df['name'] + '_' + df['unit']
del df['name'], df['unit'], df['type']

# Pivot
df = df.pivot(index='vassdrag', columns='par', values='value')
df.reset_index(inplace=True)

In [15]:
def f(x):
    try:
        a = int(x)
        return a
    except:
        return -999

# Convert vassdrag to numbers
df['vass'] = df['vassdrag'].apply(f)

# Get just the main catchments
df = df.query('vass != -999')

df.head()

par,vassdrag,INDUSTRI_As_tonn,INDUSTRI_Cd_tonn,INDUSTRI_Cr_tonn,INDUSTRI_Cu_tonn,INDUSTRI_Hg_tonn,INDUSTRI_NH3_tonn,INDUSTRI_NH4-N_tonn,INDUSTRI_Ni_tonn,INDUSTRI_PCB_tonn,...,RENSEANLEGG_Hg_tonn,RENSEANLEGG_Ni_tonn,RENSEANLEGG_PAH_tonn,RENSEANLEGG_PCB_tonn,RENSEANLEGG_Pb_tonn,RENSEANLEGG_S.P.M._tonn,RENSEANLEGG_Tot-N_tonn,RENSEANLEGG_Tot-P_tonn,RENSEANLEGG_Zn_tonn,vass
0,1,0.02489,0.01256,0.01999,0.09112,0.0,,,0.06937,,...,6.7e-05,0.015437,0.000265,0.0,0.001801,124.06087,111.53113,1.570941,0.111269,1
1,2,0.011349,0.006162,0.576105,4.304512,0.003698,,,0.457225,,...,0.000339,0.277778,0.002688,4e-07,0.0168,662.47537,686.32228,10.06519,1.402939,2
2,3,0.032239,0.000321,0.007566,0.092549,9.4e-05,0.0002,18.2438,0.04932,,...,2e-05,0.039006,,,0.001098,171.20241,288.60416,2.90909,0.161106,3
3,4,,,,,,,,,,...,9e-06,0.006504,,,0.001988,,247.62317,1.23823,0.076826,4
4,5,,,,,,,,,,...,,,,,,,44.49473,0.27295,,5


In [16]:
def f2(x):   
    if x in range(1, 24):
        return 'SKAGERAK'
    elif x in range(24, 91):
        return 'NORTH SEA'
    elif x in range(91, 171):
        return 'NORWEGIAN SEA2'
    elif x in range(171, 248):
        return 'LOFOTEN-BARENTS SEA'
    else:
        return np.nan

# Assign main catchments to OSPAR regions
df['osp_reg'] = df['vass'].apply(f2)

df.head()

par,vassdrag,INDUSTRI_As_tonn,INDUSTRI_Cd_tonn,INDUSTRI_Cr_tonn,INDUSTRI_Cu_tonn,INDUSTRI_Hg_tonn,INDUSTRI_NH3_tonn,INDUSTRI_NH4-N_tonn,INDUSTRI_Ni_tonn,INDUSTRI_PCB_tonn,...,RENSEANLEGG_Ni_tonn,RENSEANLEGG_PAH_tonn,RENSEANLEGG_PCB_tonn,RENSEANLEGG_Pb_tonn,RENSEANLEGG_S.P.M._tonn,RENSEANLEGG_Tot-N_tonn,RENSEANLEGG_Tot-P_tonn,RENSEANLEGG_Zn_tonn,vass,osp_reg
0,1,0.02489,0.01256,0.01999,0.09112,0.0,,,0.06937,,...,0.015437,0.000265,0.0,0.001801,124.06087,111.53113,1.570941,0.111269,1,SKAGERAK
1,2,0.011349,0.006162,0.576105,4.304512,0.003698,,,0.457225,,...,0.277778,0.002688,4e-07,0.0168,662.47537,686.32228,10.06519,1.402939,2,SKAGERAK
2,3,0.032239,0.000321,0.007566,0.092549,9.4e-05,0.0002,18.2438,0.04932,,...,0.039006,,,0.001098,171.20241,288.60416,2.90909,0.161106,3,SKAGERAK
3,4,,,,,,,,,,...,0.006504,,,0.001988,,247.62317,1.23823,0.076826,4,SKAGERAK
4,5,,,,,,,,,,...,,,,,,44.49473,0.27295,,5,SKAGERAK


In [17]:
# Group by OSPAR region
df.fillna(0, inplace=True)
df = df.groupby('osp_reg').sum()
df.drop(0, inplace=True)

# Total for Norway
df.loc['NORWAY'] = df.sum(axis=0)

# Join to model results 
df = new_df.join(df)

# Get cols of interest
umod_cols = ['S.P.M.', 'TOC', 'As', 'Pb', 'Cd', 'Cu', 'Zn', 'Ni', 'Cr', 'Hg']
umod_cols = ['%s_%s_tonn' % (i, j) for i in ['INDUSTRI', 'RENSEANLEGG'] for j in umod_cols]
cols = list(new_df.columns) + umod_cols
cols.remove('RENSEANLEGG_TOC_tonn')
df = df[cols]

df.round(0).astype(int).T

ospar_region,NORWAY,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
flow,425297,127196,134234,150979,12888
sew_n,13310,1543,3534,2795,5438
sew_p,1104,180,441,367,116
ind_n,2476,27,470,1051,927
ind_p,231,2,83,105,41
fish_n,63365,15664,20361,27283,56
fish_p,10925,2712,3501,4703,10
diff_n,36780,5953,14173,13422,3232
diff_p,725,111,217,302,96
sew_po4,662,108,265,220,70


## 7. Fish farm copper

Finally, we need to add in the Cu totals from fish farms. The method is similar to that used above, but simpler because we're only interested in one parameter.

In [18]:
# The sql below uses a horrible (and slow!) hack to get around Oracle's
# 1000 item limit on IN clauses. See here for details:
# https://stackoverflow.com/a/9084247/505698
nd_list_hack = [(1, i) for i in nd_list]

sql = ("SELECT SUBSTR(a.regine, 1, 3) AS vassdrag, "
       "  SUM(b.value) as value "
       "FROM RESA2.RID_KILDER_AQUAKULTUR a, "
       "RESA2.RID_KILDER_AQKULT_VALUES b "
       "WHERE a.nr = b.anlegg_nr "
       "AND (1, a.regine) NOT IN %s "
       "AND b.inp_par_id = 41 "
       "AND b.ar = %s "
       "GROUP BY SUBSTR(a.regine, 1, 3), b.inp_par_id "
       "ORDER BY SUBSTR(a.regine, 1, 3), b.inp_par_id" % (tuple(nd_list_hack), year))

aq_df = pd.read_sql(sql, engine)

# Get vassdrag
aq_df['vass'] = aq_df['vassdrag'].apply(f)
aq_df = aq_df.query('vass != -999')

# Calc OSPAR region and group
aq_df['osp_reg'] = aq_df['vass'].apply(f2)
aq_df.fillna(0, inplace=True)
aq_df = aq_df.groupby('osp_reg').sum()
del aq_df['vass']

# Total for Norway
aq_df.loc['NORWAY'] = aq_df.sum(axis=0)

# Rename
aq_df.columns = ['AQUAKULTUR_Cu_tonn',]

# Join model results 
df = df.join(aq_df)

df.round(0).astype(int).T

ospar_region,NORWAY,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
flow,425297,127196,134234,150979,12888
sew_n,13310,1543,3534,2795,5438
sew_p,1104,180,441,367,116
ind_n,2476,27,470,1051,927
ind_p,231,2,83,105,41
fish_n,63365,15664,20361,27283,56
fish_p,10925,2712,3501,4703,10
diff_n,36780,5953,14173,13422,3232
diff_p,725,111,217,302,96
sew_po4,662,108,265,220,70


In [19]:
# Write output
out_csv = r'../../../Results/Unmon_loads/unmon_loads_%s.csv' % year
df.to_csv(out_csv)

This data can then be used to create Table 3 in the report - see [this notebook](https://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/summary_table_2017.ipynb) for details.