In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import imp

sn.set_context('notebook')

# Estimating loads in unmonitored regions - 2016

The new model can be used to estimate loads in unmonitored areas. We know the regine ID for each of the 155 stations where water chemistry is measured, and we also know which OSPAR region each monitoring site drains to. We want to use observed data to estimate loads upstream of each monitoring point, and modelled data elsewhere. This can be achieved using the output from the new model.

This notebook is based on the one [here](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/loads_unmonitored_regions.ipynb). It first runs the NOPE model for 2016 and then extracts data for unmonitored regions.

In [2]:
# Import model
nope_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\Python\rid\notebooks\nope.py')
nope = imp.load_source('nope', nope_path)

# Connect to db
resa2_basic_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template'
                    r'\useful_resa2_code.py')
resa2_basic = imp.load_source('useful_resa2_code', resa2_basic_path)
engine, conn = resa2_basic.connect_to_resa2()

## 1. Generate model input file

In [3]:
# Year of interest
year = 2016

# Parameters of interest
par_list = ['Tot-N', 'Tot-P']

# Folder containing NOPE data
nope_fold = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\NOPE\NOPE_Core_Input_Data')

# Ouput path for model file
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\NOPE_Annual_Inputs\nope_input_data_2016.csv')

In [4]:
# Make input file
df = nope.make_rid_input_file(year, engine, nope_fold, out_csv,
                              par_list=par_list)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  downcast=downcast, **kwargs)


## 2. Run model

In [5]:
%%time
# Input file
in_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
          r'\NOPE\NOPE_Annual_Inputs\nope_input_data_2016.csv')

# Run model
g = nope.run_nope(in_csv, par_list)

Wall time: 8.34 s


## 3. Save results

In [6]:
# Save results as csv
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\nope_results_2016.csv')
df = nope.model_to_dataframe(g, out_path=out_csv)

df.head()

Unnamed: 0,regine,regine_ned,accum_agri_diff_tot-n_tonnes,accum_agri_diff_tot-p_tonnes,accum_agri_pt_tot-n_tonnes,accum_agri_pt_tot-p_tonnes,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,...,local_q_reg_m3/s,local_ren_tot-n_tonnes,local_ren_tot-p_tonnes,local_runoff_mm/yr,local_spr_tot-n_tonnes,local_spr_tot-p_tonnes,local_trans_tot-n,local_trans_tot-p,local_urban_tot-n_tonnes,local_urban_tot-p_tonnes
0,001.222Z,001.2220,0.930145,0.058948,0.011063,0.000894,0.109003,0.007512,2.014422,0.07885,...,0.052966,0.0,0.0,268.974615,0.097939,0.006618,1.0,1.0,0.0,0.0
1,002.DGBZ,002.DGB0,0.0,0.0,0.0,0.0,0.0,0.0,5.07757,0.108752,...,0.780749,0.0,0.0,286.099189,0.0,0.0,0.97,0.83,0.0,0.0
2,123.A1Z,123.A12,8.893221,0.127219,0.124438,0.004094,0.484875,0.023375,16.146657,0.311826,...,0.624774,0.0,0.0,494.426247,0.444984,0.064271,0.81,0.3,2.864138,0.409163
3,212.FAC,212.FAB0,0.0,0.0,0.0,0.0,0.021639,0.000779,7.536223,0.076501,...,0.561737,0.0,0.0,325.881795,0.022777,0.001025,0.95,0.76,0.0,0.0
4,135.1AC,135.1AB,0.0,0.0,0.0,0.0,0.0,0.0,4.721225,0.04925,...,0.880246,0.0,0.0,1975.761038,0.0,0.0,0.85,0.42,0.0,0.0


In [7]:
# Save version with main catchments only
main_list = ["%03d." % i for i in range(1, 316)]
df2 = df.query('regine in @main_list')
df2.sort_values('regine', inplace=True)

# Save
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\nope_results_2016_main_catchs.csv')
df2.to_csv(out_csv, index=False, encoding='utf-8')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


## 4. Explore results

### 4.1. Total N and P

####  4.1.1. Identify areas with monitoring data

Where observations are available, we want to use them in preference to the model output. This means identifying all the catchments with observed data and substracting the model results for these locations. This is more complicated than it appears, because a small number of observed catchments are upstream of others, so subtracting all the loads for the 155 monitored catchments involves "double accounting", which we want to avoid. The first step is therefore to identify the downstream-most nodes for the monitored areas i.e. for the cases where one catchment is upstream of another, we just want the downstream node.

In [7]:
# Read station data
in_xlsx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Data\RID_Sites_List.xlsx')
stn_df = pd.read_excel(in_xlsx, sheetname='RID_All')

# Get just cols of interest and drop duplicates 
# (some sites are in the same regine)
stn_df = stn_df[['ospar_region', 'nve_vassdrag_nr']].drop_duplicates()

# Get catch IDs with calib data
calib_nds = set(stn_df['nve_vassdrag_nr'].values)

# Build network
in_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\NOPE_Annual_Inputs\nope_input_data_1990.csv')
g, nd_list = nope.build_calib_network(in_path, calib_nds)

# Get list of downstream nodes
ds_nds = []
for nd in g:
    # If no downstream nodes
    if g.out_degree(nd) == 0:
        # Node is of interest
        ds_nds.append(nd)

# Get just the downstream catchments
stn_df = stn_df[stn_df['nve_vassdrag_nr'].isin(ds_nds)]

#### 4.1.2. Sum model results for monitored locations

In [8]:
# Read model output
in_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
          r'\NOPE\nope_results_2016.csv')
nope_df = pd.read_csv(in_csv)

# Join accumulated outputs to stns of interest
mon_df = pd.merge(stn_df, nope_df, how='left',
                  left_on='nve_vassdrag_nr',
                  right_on='regine')

# Groupby OSPAR region
mon_df = mon_df.groupby('ospar_region').sum()

# Get just accum cols
cols = [i for i in mon_df.columns if i.split('_')[0]=='accum']
mon_df = mon_df[cols]

mon_df.head()

Unnamed: 0_level_0,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,accum_aqu_tot-n_tonnes,accum_aqu_tot-p_tonnes,accum_ind_tot-n_tonnes,accum_ind_tot-p_tonnes,accum_nat_diff_tot-n_tonnes,accum_nat_diff_tot-p_tonnes,accum_q_m3/s,accum_ren_tot-n_tonnes,accum_ren_tot-p_tonnes,accum_spr_tot-n_tonnes,accum_spr_tot-p_tonnes,accum_upstr_area_km2
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
LOFOTEN-BARENTS SEA,77.940831,5.69331,4291.21753,61.556413,139.951511,5.259556,0.0,0.0,0.0,0.0,4073.325188,50.603547,897.263854,24.647655,0.757696,50.753621,4.717238,63555.61
NORTH SEA,422.469276,40.151492,13123.134151,195.078003,2576.970591,67.880396,0.0,0.0,25.603844,3.641174,10123.694284,87.046115,1579.015629,232.564816,24.008028,129.438464,10.325821,23353.19
NORWEGIAN SEA2,577.559392,52.788635,11620.652324,243.908353,2699.516851,74.559747,0.0,0.0,46.04183,10.4435,8343.576081,116.559971,1566.29048,297.559847,18.818656,193.193896,20.464933,45896.63
SKAGERAK,3797.981666,108.728319,30139.023148,486.673445,11134.261381,244.336637,0.0,0.0,182.085191,26.602975,15206.7801,133.608488,2018.09444,2641.135033,24.549043,872.681816,52.029478,93945.27


This table gives the **modelled** inputs to each OSPAR region from catchments for which we have observed data. We want to subtract these values from the overall modelled inputs to each region and substitute the observed data instead.

The trickiest part of this is that the OSPAR regions in the TEOTIL catchment network (and therefore the network for my new model too) don't exactly match the new OSPAR definitions. The OSPAR boundaries were updated relatively recently, so instead of simply selecting the desired OSPAR region in the model output, I need to aggregate based on vassdragsnummers.

**Note:** Eventually, it would be a good idea to update the network information in `regine.csv` to reflect the current OSPAR regions.

#### 4.1.3. Group model output according to "new" OSPAR regions

In [9]:
# Define "new" OSPAR regions
os_dict = {'SKAGERAK':(1, 23),
           'NORTH SEA':(24, 90),
           'NORWEGIAN SEA2':(91, 170),
           'LOFOTEN-BARENTS SEA':(171, 247)}

# Container for results
df_list = []

# Loop over model output
for reg in os_dict.keys():
    min_id, max_id = os_dict[reg]
    
    regs = ['%03d.' % i for i in range(min_id, max_id+1)]
    
    # Get data for this region
    df2 = nope_df[nope_df['regine'].isin(regs)]
    
    # Get just accum cols
    cols = [i for i in df2.columns if i.split('_')[0]=='accum']
    df2 = df2[cols]
    
    # Add region
    df2['ospar_region'] = reg
    
    # Add sum to output
    df_list.append(df2)

# Build df
os_df = pd.concat(df_list, axis=0)

# Aggregate
os_df = os_df.groupby('ospar_region').sum()

os_df.head()

Unnamed: 0_level_0,accum_all_point_tot-n_tonnes,accum_all_point_tot-p_tonnes,accum_all_sources_tot-n_tonnes,accum_all_sources_tot-p_tonnes,accum_anth_diff_tot-n_tonnes,accum_anth_diff_tot-p_tonnes,accum_aqu_tot-n_tonnes,accum_aqu_tot-p_tonnes,accum_ind_tot-n_tonnes,accum_ind_tot-p_tonnes,accum_nat_diff_tot-n_tonnes,accum_nat_diff_tot-p_tonnes,accum_q_m3/s,accum_ren_tot-n_tonnes,accum_ren_tot-p_tonnes,accum_spr_tot-n_tonnes,accum_spr_tot-p_tonnes,accum_upstr_area_km2
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
LOFOTEN-BARENTS SEA,15536.465083,2624.071894,24729.268966,2782.102006,595.348458,29.007032,14140.779244,2435.102449,83.27,6.071,8597.455425,129.02308,2107.379728,972.323096,141.231208,329.52763,40.781418,138090.89
NORTH SEA,24154.765441,3943.73499,51360.258258,4337.303801,6227.905182,191.999327,19560.984224,3370.099669,437.681065,93.562709,20977.587634,201.569483,3326.954266,3379.521592,392.782224,696.254634,81.460852,59314.38
NORWEGIAN SEA2,27414.085643,4530.286265,50838.86574,5002.845199,7367.288126,236.64394,22963.896011,3971.30847,1029.58823,119.6302,16057.491971,235.914994,3230.153201,2602.465008,343.750027,716.427779,87.515367,113934.05
SKAGERAK,10555.596638,274.680733,39811.418034,741.044808,12869.205614,320.197798,25.00677,4.222934,1057.199191,67.486975,16386.615782,146.166276,2127.911199,8324.041575,127.213095,1036.363378,69.405388,102574.69


We can now calculate the unmonitored component by simply subtracting the values modelled upstream of monitoring stations from the overall modelled inputs to each OSPAR region.

#### 4.1.4. Estimate loads in unmonitored areas

In [10]:
# Calc unmonitored loads
unmon_df = os_df - mon_df

# Write output
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\unmonitored_loads_2016.csv')
unmon_df.to_csv(out_csv, encoding='utf-8', index_label='ospar_region')

unmon_df.round(0).astype(int).T

ospar_region,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
accum_all_point_tot-n_tonnes,15459,23732,26837,6758
accum_all_point_tot-p_tonnes,2618,3904,4477,166
accum_all_sources_tot-n_tonnes,20438,38237,39218,9672
accum_all_sources_tot-p_tonnes,2721,4142,4759,254
accum_anth_diff_tot-n_tonnes,455,3651,4668,1735
accum_anth_diff_tot-p_tonnes,24,124,162,76
accum_aqu_tot-n_tonnes,14141,19561,22964,25
accum_aqu_tot-p_tonnes,2435,3370,3971,4
accum_ind_tot-n_tonnes,83,412,984,875
accum_ind_tot-p_tonnes,6,90,109,41


#### 4.1.5. Aggregate values to required quantities

In [11]:
# Aggregate to match report
unmon_df['flow'] = unmon_df['accum_q_m3/s']*60*60*24/1000. # 1000s m3/day

unmon_df['sew_n'] = unmon_df['accum_ren_tot-n_tonnes'] + unmon_df['accum_spr_tot-n_tonnes']
unmon_df['sew_p'] = unmon_df['accum_ren_tot-p_tonnes'] + unmon_df['accum_spr_tot-p_tonnes']

unmon_df['ind_n'] = unmon_df['accum_ind_tot-n_tonnes']
unmon_df['ind_p'] = unmon_df['accum_ind_tot-p_tonnes']

unmon_df['fish_n'] = unmon_df['accum_aqu_tot-n_tonnes']
unmon_df['fish_p'] = unmon_df['accum_aqu_tot-p_tonnes']

unmon_df['diff_n'] = unmon_df['accum_anth_diff_tot-n_tonnes'] + unmon_df['accum_nat_diff_tot-n_tonnes']
unmon_df['diff_p'] = unmon_df['accum_anth_diff_tot-p_tonnes'] + unmon_df['accum_nat_diff_tot-p_tonnes']

new_df = unmon_df[['flow', 'sew_n', 'sew_p', 
                   'ind_n', 'ind_p', 'fish_n', 
                   'fish_p', 'diff_n', 'diff_p']]

# Total for Norway
new_df.loc['NORWAY'] = new_df.sum(axis=0)

# Reorder rows
new_df = new_df.reindex(['NORWAY', 'LOFOTEN-BARENTS SEA', 'NORTH SEA', 
                         'NORWEGIAN SEA2', 'SKAGERAK'])

new_df.round().astype(int)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0_level_0,flow,sew_n,sew_p,ind_n,ind_p,fish_n,fish_p,diff_n,diff_p
ospar_region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
NORWAY,408822,13615,1128,2354,246,56691,9781,34781,711
LOFOTEN-BARENTS SEA,104554,1226,177,83,6,14141,2435,4980,102
NORTH SEA,151022,3714,440,412,90,19561,3370,14505,239
NORWEGIAN SEA2,143758,2828,392,984,109,22964,3971,12382,281
SKAGERAK,9488,5847,120,875,41,25,4,2915,88


## 5. Other N and P species

Tore's procedure `RESA2.FIXTEOTILPN` defines simple correction factors for estimating PO4, NO3 and NH4 from total P and N. The table below lists the factors used.

|             | Phosphate | Nitrate | Ammonium |
|:-----------:|:---------:|:-------:|:--------:|
|    Sewage   |     0.600 |   0.050 |    0.750 |
|   Industry  |     0.600 |   0.050 |    0.750 |
| Aquaculture |     0.690 |   0.110 |    0.800 |
|   Diffuse   |     0.246 |   0.625 |    0.055 |

In [12]:
# Dict of conversion factors
con_dict = {('sew', 'po4'):('p', 0.6),
            ('ind', 'po4'):('p', 0.6),
            ('fish', 'po4'):('p', 0.69),
            ('diff', 'po4'):('p', 0.246),
            ('sew', 'no3'):('n', 0.05),
            ('ind', 'no3'):('n', 0.05),
            ('fish', 'no3'):('n', 0.11),
            ('diff', 'no3'):('n', 0.625),
            ('sew', 'nh4'):('n', 0.75),
            ('ind', 'nh4'):('n', 0.75),
            ('fish', 'nh4'):('n', 0.8),
            ('diff', 'nh4'):('n', 0.055)}

# Apply factors
for src in ['sew', 'ind', 'fish', 'diff']:
    for spc in ['po4', 'no3', 'nh4']:
        el, fac = con_dict[(src, spc)]
        new_df[src+'_'+spc] = fac * new_df[src+'_'+el]
        
new_df.round().astype(int).T

ospar_region,NORWAY,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
flow,408822,104554,151022,143758,9488
sew_n,13615,1226,3714,2828,5847
sew_p,1128,177,440,392,120
ind_n,2354,83,412,984,875
ind_p,246,6,90,109,41
fish_n,56691,14141,19561,22964,25
fish_p,9781,2435,3370,3971,4
diff_n,34781,4980,14505,12382,2915
diff_p,711,102,239,281,88
sew_po4,677,106,264,235,72


## 6. Other quantities

The model currently only considers N and P, but the project focuses on a wider range of parameters. For now, we simply assume that all measured inputs (`renseanlegg`, `industri` and `akvakultur`) for regines outside of catchments with measured data make it to the sea.

We only want data for catchments that are not monitored i.e. for regine IDs **not** in the graph created above.

In [13]:
year = 2016

# The sql below uses a horrible (and slow!) hack to get around Oracle's
# 1000 item limit on IN clauses. See here for details:
# https://stackoverflow.com/a/9084247/505698
nd_list_hack = [(1, i) for i in nd_list]

sql = ("SELECT SUBSTR(a.regine, 1, 3) AS vassdrag, "
       "  a.type, "
       "  b.name, "
       "  b.unit, "
       "  SUM(c.value * d.factor) as value "
       "FROM RESA2.RID_PUNKTKILDER a, "
       "RESA2.RID_PUNKTKILDER_OUTPAR_DEF b, "
       "RESA2.RID_PUNKTKILDER_INPAR_VALUES c, "
       "RESA2.RID_PUNKTKILDER_INP_OUTP d "
       "WHERE a.anlegg_nr = c.anlegg_nr "
       "AND (1, a.regine) NOT IN %s "
       "AND d.in_pid = c.inp_par_id "
       "AND d.out_pid = b.out_pid "
       "AND c.year = %s "
       "GROUP BY SUBSTR(a.regine, 1, 3), a.type, b.name, b.unit "
       "ORDER BY SUBSTR(a.regine, 1, 3), a.type" % (tuple(nd_list_hack), year))

df = pd.read_sql(sql, engine)

# Tidy
df['par'] = df['type'] + '_' + df['name'] + '_' + df['unit']
del df['name'], df['unit'], df['type']

# Pivot
df = df.pivot(index='vassdrag', columns='par', values='value')
df.reset_index(inplace=True)

In [14]:
def f(x):
    try:
        a = int(x)
        return a
    except:
        return -999

# Convert vassdrag to numbers
df['vass'] = df['vassdrag'].apply(f)

# Get just the main catchments
df = df.query('vass != -999')

df.head()

par,vassdrag,INDUSTRI_As_tonn,INDUSTRI_Cd_tonn,INDUSTRI_Cr_tonn,INDUSTRI_Cu_tonn,INDUSTRI_Hg_tonn,INDUSTRI_NH3_tonn,INDUSTRI_NH4-N_tonn,INDUSTRI_Ni_tonn,INDUSTRI_PCB_tonn,...,RENSEANLEGG_Hg_tonn,RENSEANLEGG_Ni_tonn,RENSEANLEGG_PAH_tonn,RENSEANLEGG_PCB_tonn,RENSEANLEGG_Pb_tonn,RENSEANLEGG_S.P.M._tonn,RENSEANLEGG_Tot-N_tonn,RENSEANLEGG_Tot-P_tonn,RENSEANLEGG_Zn_tonn,vass
0,1,0.04895,0.01379,0.00721,0.11309,2e-05,,,0.05255,,...,,,,,,,139.92692,14.966,,1
1,2,0.016039,0.005805,0.798983,4.320172,0.0042199,,,0.460639,,...,0.002027,0.234566,0.004062,0.000152,0.022118,0.013,723.88409,12.33553,1.032589,2
2,3,2.7e-05,1.4e-05,0.00012,0.001874,2e-07,,,0.000321,,...,2.7e-05,0.0216,0.0,0.0,0.0017,,289.3,2.798,0.1048,3
3,4,,,,,,,,,,...,,,,,,,153.114,1.16,,4
4,5,,,,,,,,,,...,,,,,,,31.27013,0.17567,,5


In [15]:
def f2(x):   
    if x in range(1, 24):
        return 'SKAGERAK'
    elif x in range(24, 91):
        return 'NORTH SEA'
    elif x in range(91, 171):
        return 'NORWEGIAN SEA2'
    elif x in range(171, 248):
        return 'LOFOTEN-BARENTS SEA'
    else:
        return np.nan

# Assign main catchments to OSPAR regions
df['osp_reg'] = df['vass'].apply(f2)

df.head()

par,vassdrag,INDUSTRI_As_tonn,INDUSTRI_Cd_tonn,INDUSTRI_Cr_tonn,INDUSTRI_Cu_tonn,INDUSTRI_Hg_tonn,INDUSTRI_NH3_tonn,INDUSTRI_NH4-N_tonn,INDUSTRI_Ni_tonn,INDUSTRI_PCB_tonn,...,RENSEANLEGG_Ni_tonn,RENSEANLEGG_PAH_tonn,RENSEANLEGG_PCB_tonn,RENSEANLEGG_Pb_tonn,RENSEANLEGG_S.P.M._tonn,RENSEANLEGG_Tot-N_tonn,RENSEANLEGG_Tot-P_tonn,RENSEANLEGG_Zn_tonn,vass,osp_reg
0,1,0.04895,0.01379,0.00721,0.11309,2e-05,,,0.05255,,...,,,,,,139.92692,14.966,,1,SKAGERAK
1,2,0.016039,0.005805,0.798983,4.320172,0.0042199,,,0.460639,,...,0.234566,0.004062,0.000152,0.022118,0.013,723.88409,12.33553,1.032589,2,SKAGERAK
2,3,2.7e-05,1.4e-05,0.00012,0.001874,2e-07,,,0.000321,,...,0.0216,0.0,0.0,0.0017,,289.3,2.798,0.1048,3,SKAGERAK
3,4,,,,,,,,,,...,,,,,,153.114,1.16,,4,SKAGERAK
4,5,,,,,,,,,,...,,,,,,31.27013,0.17567,,5,SKAGERAK


In [16]:
# Group by OSPAR region
df.fillna(0, inplace=True)
df = df.groupby('osp_reg').sum()
df.drop(0, inplace=True)

# Total for Norway
df.loc['NORWAY'] = df.sum(axis=0)

# Join to model results 
df = new_df.join(df)

# Get cols of interest
umod_cols = ['S.P.M.', 'TOC', 'As', 'Pb', 'Cd', 'Cu', 'Zn', 'Ni', 'Cr', 'Hg']
umod_cols = ['%s_%s_tonn' % (i, j) for i in ['INDUSTRI', 'RENSEANLEGG'] for j in umod_cols]
cols = list(new_df.columns) + umod_cols
cols.remove('RENSEANLEGG_TOC_tonn')
df = df[cols]

df.round(0).astype(int).T

ospar_region,NORWAY,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
flow,408822,104554,151022,143758,9488
sew_n,13615,1226,3714,2828,5847
sew_p,1128,177,440,392,120
ind_n,2354,83,412,984,875
ind_p,246,6,90,109,41
fish_n,56691,14141,19561,22964,25
fish_p,9781,2435,3370,3971,4
diff_n,34781,4980,14505,12382,2915
diff_p,711,102,239,281,88
sew_po4,677,106,264,235,72


## 7. Fish farm copper

Finally, we need to add in the Cu totals from fish farms. The method is similar to that used above, but simpler because we're only interested in one parameter.

In [17]:
year = 2016

# The sql below uses a horrible (and slow!) hack to get around Oracle's
# 1000 item limit on IN clauses. See here for details:
# https://stackoverflow.com/a/9084247/505698
nd_list_hack = [(1, i) for i in nd_list]

sql = ("SELECT SUBSTR(a.regine, 1, 3) AS vassdrag, "
       "  SUM(b.value) as value "
       "FROM RESA2.RID_KILDER_AQUAKULTUR a, "
       "RESA2.RID_KILDER_AQKULT_VALUES b "
       "WHERE a.nr = b.anlegg_nr "
       "AND (1, a.regine) NOT IN %s "
       "AND b.inp_par_id = 41 "
       "AND b.ar = %s "
       "GROUP BY SUBSTR(a.regine, 1, 3), b.inp_par_id "
       "ORDER BY SUBSTR(a.regine, 1, 3), b.inp_par_id" % (tuple(nd_list_hack), year))

aq_df = pd.read_sql(sql, engine)

# Get vassdrag
aq_df['vass'] = aq_df['vassdrag'].apply(f)
aq_df = aq_df.query('vass != -999')

# Calc OSPAR region and group
aq_df['osp_reg'] = aq_df['vass'].apply(f2)
aq_df.fillna(0, inplace=True)
aq_df = aq_df.groupby('osp_reg').sum()
del aq_df['vass']

# Total for Norway
aq_df.loc['NORWAY'] = aq_df.sum(axis=0)

# Rename
aq_df.columns = ['AQUAKULTUR_Cu_tonn',]

# Join model results 
df = df.join(aq_df)

df.round(0).astype(int).T

ospar_region,NORWAY,LOFOTEN-BARENTS SEA,NORTH SEA,NORWEGIAN SEA2,SKAGERAK
flow,408822,104554,151022,143758,9488
sew_n,13615,1226,3714,2828,5847
sew_p,1128,177,440,392,120
ind_n,2354,83,412,984,875
ind_p,246,6,90,109,41
fish_n,56691,14141,19561,22964,25
fish_p,9781,2435,3370,3971,4
diff_n,34781,4980,14505,12382,2915
diff_p,711,102,239,281,88
sew_po4,677,106,264,235,72


In [18]:
# Write output
out_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Unmon_loads\unmon_loads_2016.csv')
df.to_csv(out_csv)

This data can then be used to create Table 3 in the report - see [this notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/word_data_tables.ipynb) for details.