In [None]:
import imp
import pandas as pd

# Generate RID data tables for Microsoft Word

This notebook includes code for creating Word tables in the output format required by the RID project.

## 1. Get site data

### 1.1. Establish database connection and import RID functions

In [None]:
# Connect to db
resa2_basic_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template'
                    r'\useful_resa2_code.py')

resa2_basic = imp.load_source('useful_resa2_code', resa2_basic_path)

engine, conn = resa2_basic.connect_to_resa2()

# Import custom RID functions
rid_func_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
                 r'\Python\rid\notebooks\useful_rid_code.py')

rid = imp.load_source('useful_rid_code', rid_func_path)

### 1.2. Basic site metadata for stations of interest

In [None]:
# Read site data
in_xlsx = r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet\Data\RID_Sites_List.xlsx'

rid_11_df = pd.read_excel(in_xlsx, sheetname='RID_11')
rid_36_df = pd.read_excel(in_xlsx, sheetname='RID_36')
rid_108_df = pd.read_excel(in_xlsx, sheetname='RID_108')

# Drop the 37th site (with no NVE code) from RID_36
rid_36_df.dropna(how='any', inplace=True)

## 2. Tables of raw water chemistry data

The first set of tables displays raw water chemistry values for the 11 main rivers and the 36 tributaries. I've created a template in Word based on Tore's previous output that includes 47 blank tables:

C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet\Results\Word_Tables\Table_Templates\rid_water_chem_tables_template.docx

**Do not modify this document**. Instead, create a copy of it and use the code below to modify the copy with the desired data.

The code below fills in the Word template with data from 2015. The output is available in PDF format [here](https://github.com/JamesSample/rid/blob/master/pdf/TABLE1_2015_JES.pdf) and can be compared to the results in *Table 1a* of the 2015 report (page 148 onwards). 

**Note:** I haven't yet worried about number formatting (number of decimal places etc.) or conditional formatting of cells (i.e. colour codes). These features can be easily added later.

In [None]:
# Concatenate data for RID_11 and RID_36 sites
stn_df = pd.concat([rid_11_df, rid_36_df], axis=0)

# Path to *COPIED* template for editing
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016Analysis_2015Data\TABLE1_2015_JES.docx')

# Write tables for 2015
rid.write_word_water_chem_tables(stn_df, 2015, in_docx, engine)

## 3. Tables of loads at each site

The next set of tables shows annual pollutant loads for each of the 155 main sites. A Word template based on Tore's previous output is here:

C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet\Results\Word_Tables\Table_Templates\rid_loads_by_river_template.docx

**Do not modify this document**. Instead, create a copy of it and use the code below to modify the copy with the desired data.

The code below reads the output produced by [loads notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/estimate_loads.ipynb) and fills-in the template with data from 2015. The finished table is available in PDF format [here](https://github.com/JamesSample/rid/blob/master/pdf/TABLE2_2015_JES.pdf) and can be compared to *Table 2a* of the 2015 report (page 186 onwards). 

**Note:** I have made the following changes to the orginal table from the report (see section 2.2 of [this notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/estimate_loads.ipynb) for details):

 * The original table included duplicated sites/rows, which have now been removed <br><br>
 
 * The original table did not distinguish between two sites named Børselva and two named Oselva. For these four locations, I have added the site code in brackets after the site name to avoid confusion i.e. `Børselva (FINEBØR)` versus `Børselva (STREBØR)` etc.

In [None]:
# Concatenate data for RID_11, RID_36 and RID_108 sites
stn_df = pd.concat([rid_11_df, rid_36_df, rid_108_df], axis=0)

# Path to *COPIED* template for editing
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016Analysis_2015Data\TABLE2_2015_JES.docx')

# Read loads data (from "loads notebook")
loads_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\Results\Loads_CSVs\loads_all_sites_2015.csv')

# Write tables for 2015
rid.write_word_loads_table(stn_df, loads_csv, in_docx, engine)

## 4. Annual summary of monitored and modelled loads

The third data table combines all the data from the entire project. Before running the code below, it is necessary to have processed all the monitoring data (creating a file like *loads_and_flows_all_sites_2016.csv* - see [this notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/rid_working_2016-17.ipynb)), as well as completing the modelling for unmonitored locations (creating a file like *unmon_loads_2015.csv* - see [this notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/loads_unmonitored_regions.ipynb)). The code below restructures these files and writes the final output to Word.

A Word template modified from Tore's previous output is here:

C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet\Results\Word_Tables\Table_Templates\rid_loads_overall_summary_template.docx

**Do not modify this document**. Instead, create a copy of it and use the code below to modify the copy with the desired data.

### 4.1. Summarise monitoring data

The code below adds up the monitored loads. Note that because some of the tributary rivers are upstream of others, it is necessary to build the catchment network using NOPE to identify the downstream-most catchments.

In [None]:
# Import model
nope_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\Python\rid\notebooks\nope.py')
nope = imp.load_source('nope', nope_path)

In [None]:
# Read station data
in_xlsx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Data\RID_Sites_List.xlsx')
stn_df = pd.read_excel(in_xlsx, sheetname='RID_All')

# Get just cols of interest and drop duplicates 
# (some sites are in the same regine)
stn_df = stn_df.drop_duplicates(subset=['ospar_region', 'nve_vassdrag_nr'])

# Get catch IDs with calib data
calib_nds = set(stn_df['nve_vassdrag_nr'].values)

# Build network
in_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\NOPE\NOPE_Annual_Inputs\nope_input_data_1990.csv')
g, nd_list = nope.build_calib_network(in_path, calib_nds)

# Get list of downstream nodes
ds_nds = []
for nd in g:
    # If no downstream nodes
    if g.out_degree(nd) == 0:
        # Node is of interest
        ds_nds.append(nd)

# Get just the downstream catchments
stn_df = stn_df[stn_df['nve_vassdrag_nr'].isin(ds_nds)]

In [None]:
# Read data
in_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
          r'\Results\Loads_CSVs\loads_and_flows_all_sites_2015.csv')
mon_df = pd.read_csv(in_csv)

# Get just the downstream catchments
#mon_df = mon_df[mon_df['station_id'].isin(stn_df['station_id'].values)]

# Group by OSPAR region
mon_df1 = mon_df.groupby(['ospar_region', 'rid_group']).sum()

# Totals for Norway
mon_df2 = mon_df.groupby('rid_group').sum().reset_index()
mon_df2['ospar_region'] = 'NORWAY'
mon_df2.set_index(['ospar_region', 'rid_group'], inplace=True)

# Combine
mon_df = pd.concat([mon_df1, mon_df2], axis=0)

# Cols of interest
cols = [i for i in mon_df.columns if i.split('_')[1] != 'Est']
mon_df = mon_df[cols]
del mon_df['station_id']

# Rename cols to match template
mon_df['Flow rate_1000m3/day'] = mon_df['mean_q_1000m3/day']
del mon_df['mean_q_1000m3/day']

# Units are correct, so remove
mon_df.columns = [i.split('_')[0] for i in mon_df.columns]

mon_df.round(0)

### 4.2. Data for unmonitored areas

In [None]:
# Read data
in_csv = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
          r'\Results\Unmon_loads\unmon_loads_2015.csv')
umon_df = pd.read_csv(in_csv, index_col=0)

# Rename cols
umon_df.columns = [i.replace('RENSEANLEGG', 'sew') for i in umon_df.columns]
umon_df.columns = [i.replace('INDUSTRI', 'ind') for i in umon_df.columns]
umon_df.columns = [i.replace('_tonn', '') for i in umon_df.columns]
umon_df.columns = [i.replace('AQUAKULTUR', 'fish') for i in umon_df.columns]

# Convert Hg to kgs
umon_df['sew_Hg'] = umon_df['sew_Hg']*1000
umon_df['ind_Hg'] = umon_df['ind_Hg']*1000

umon_df.round(0)

### 4.3. Process template

In [None]:
# Create table 3
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016Analysis_2015Data\TABLE3_2015_JES.docx')

rid.write_word_overall_table(mon_df, umon_df, in_docx)

The values in [this table](https://github.com/JamesSample/rid/blob/master/pdf/TABLE3_2015_JES.pdf) are very similar to those in Table 3 of the 2015 report (page 202 onwards). Minor differences between my output and that reported in 2015 can be explained as follows:

 * There are very minor discrepancies between the regine catchments considered as "monitored" in my workflow compared to Tore's. I think Tore's code may include some monitored discharges twice, as some monitored catchments appear to upstream of one another. Overall, these differences are negligible <br><br>
 
 * My handling of LOD values is different to Tore's. Furthermore, my method for estimating historic concentrations for the RID 108 stations is substantially different - see [this notebook](http://nbviewer.jupyter.org/github/JamesSample/rid/blob/master/notebooks/estimate_loads.ipynb) for details.<br><br>
 
 * For unmonitored locations, my new model uses a different (an in my opinion improved) method for estimating annual runoff for each regine catchment. The total flows estimated by the new model are therefore slightly different. <br><br>
 
 * The new model assumes that "local" inputs are added at the upstream boundary of the catchment (and are therefore subject to retention), whereas I think TEOTIL assumes local inputs are added at the catchment outflow (so the retention factor is not applied). Both seem reasonable, but one consequence is that the new model predicts slightly lower loads than the old one.
 
Overall, I'm pretty happy that these results are comparable.