In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sn
import pandas as pd
import imp
import numpy as np
import os
from sqlalchemy import create_engine

sn.set_context('notebook')

# Programme changes 2017-18

The RID project has applied a fairly consistent methodology for many years, and Tore's original workflow has been used (with minor modifications) for every year up to and including 2015. In 2016, I updated and streamlined the code required to repeat this workflow, as documented [here](https://github.com/JamesSample/rid). However, the programme has been substantially restructured for 2017-2020, which means the workflow I created last year will need to be adapted. This notebook describes the changes and provides some initial modified code. 

## 1. What has changed?

### 1.1. "Core" programme ("grunnprogrammet")

The following is a (slightly modified) extract from an e-mail sent to Hans Fredrik (16.08.2018 at 11.54) when I first discovered changes to the core programme compared to previous years:

>In 2016 (and for several years prior to that) we had the following: 
>
> * Monthly sampling for the 11 main rivers <br><br>
>
> * Quarterly sampling for the 36 bielver <br><br>
>
> * Estimated values based on long-term measurements (collected prior to 2004) for 108 other rivers
>
>Each of these categories had its own project in the database, and my code basically applies a different analysis methodology for each one. Comparing the table in the 2017 tilbud PDF to the data in the database, I can see that we now have the following (for 2017-20):
>
> * We are still sampling the 11 main rivers, as before <br><br>
>
> * We have stated that we will sample only 8 of the 36 bielver <br><br>
>
> * We will begin sampling again at one of the old RID-108 rivers (Storelva/Vegårdselva), but this time with monthly resolution (prior to 2017, this site hadn’t been sampled since 2003, and then it had only 1 sample per year)
>
>In addition, there is at least some data from 2017 in the database for another 7 of the bielver (NTRENAM, AAGENID, TELETOK, VAGEKVI, VAGELYN, VAGEMAN, VAGETOV). This has been collected either as part of the flood sampling or under "Option 3" (see below). We need to decide whether to use this data in the loads and trends analyses.
>
>Some key questions:
>
> * What do we do with the 28 bielver that are no longer being monitored this year? Are we going to estimate values using the same methodology as we have used previously for the RID 108 rivers, but using all data from 1990 to 2016 (instead of 1990 to 2003)? <br><br>
>
> * Do we want to incorporate data from the other 7 bielver? They are not they are not part of the core programme, but we do have at least some data in the database for them (monthly in some cases; only one sample per year in others) <br><br>
>
> * How do we want to estimate trends for Storelva/Vegårdselva? In terms of data, this location has one sample per year for 1990 to 2003 inclusive, then 12 samples from 2017, so it’s a bit unusual
>
>In terms of the new analysis, we now have different site groupings compared to previous years. Something like this:
>
> * 11 RID hovedelver <br><br>
>
> * 8 RID bielver <br><br>
>
> * 107 “other” rivers not monitored since 2003 <br><br>
>
> * 28 unmonitored RID bielver, where values now need to be estimated based on long-term data up to 2016. In reality, however, it’s more like 21 sites than 28, because we have 2017 data for 7 of these stations from other components of this project <br><br>
>
> * 1 “other” river with data prior to 2004 and after 2016, but nothing in between
>
>The first three of these categories can be treated the same as before, but the last two will require a modified approach.

In addition, some new parameters have been included in the analyses for 2017-20: filtered metals, POC, particulate N, etc. **These should not be included in the standard loads or trends calculations**, but a separate analysis of these data may nevertheless be required for the report.

**The 20 stations monitored as part of the new core programme are listed in RESA2 under the project `'Elveovervåkningsprogrammet (O 16384)'`**.

### 1.2. Option 3 ("opsjon3")

On top of the basic monitoring, there are some additional stations (several per river, arranged along the stream course) that are monitored as part of this project. A different subset of the 20 main rivers is monitored under "opsjon3" each year: roughly 5 rivers per year, for each of the four years in the project.

**The new Option 3 sites should not be included in the standard loads or trends work**. For the most part, the Option 3 stations do not overlap with the 155 stations used for estimating loads so, for the analyses of interest here, Option 3 can be largely ignored. However, there are two exceptions: during 2017 Option 3 included SFJESTR (one of the RID-108 sites) and NTRENAM (part of RID-36). These stations were added part way through the sampling programme, so there are no measurements from the first half of the year, but from July to December we have one sample per month for each location. Following discussion with Øyvind, we have decided that **these samples should be excluded from the loads estimation work** (see e-mail received 17.08.2018 at 14.03 for details).

A further issue is that the Option 3 data are currently not well organised in the database. To help Liv Bente, the following cleaning is required:

 * The 2017 "opsjon3" data is currently associated with project `'Elveoverv opsj3 2017'` in RESA2. Following discussion on 16.08.2018, we have decided to rename this `'Elveoverv opsj3'` and use it for all Option 3 data for all four years <br><br>
 
 * Station co-ordinates for all the 2017 Option 3 rivers need adding to RESA (see e-mail from Liv Bente received 15.08.2018 at 14.33) <br><br>
 
 * New stations (plus co-ordinates) for the 2018 Option 3 rivers need adding to the updated `'Elveoverv opsj3'` project (see e-mail from Liv Bente received 15.08.2018 at 14.33)
 
### 1.3. Flood sampling ("flomprøver")

During October 2017, flood sampling was carried out at 10 locations linked to the Elveovervåkningsprogrammet:

| Station code |     Datetime     |
|:------------:|:----------------:|
| AAGENID      | 25.10.2017 14:50 |
| AAGEVEG      | 25.10.2017 14:00 |
| TELESKI      | 25.10.2017 11:00 |
| TELETOK      | 25.10.2017 12:45 |
| VAGEKVI      | 26.10.2017 13:10 |
| VAGELYN      | 26.10.2017 17:20 |
| VAGEMAN      | 26.10.2017 18:15 |
| VAGEOTR      | 26.10.2017 19:10 |
| VAGESIR2     | 26.10.2017 14:10 |
| VAGETOV      | 25.10.2017 15:30 |

Three of these stations (AAGEVEG, TELESKI and VAGEOTR) have also been monitored monthly as part of the core programme in 2017. Note also that VAGESIR2 is one of the Option 3 stations for 2017, and the remaining 6 stations have all been previously monitored for RID, either as part of the bielver or as part of RID-108.

Because these samples specifically target flood peaks, it is not a good idea to use them in the loads calculations: to do so would bias the results. **These samples should therefore be omitted from the analysis here**.

Liv Bente has created a new project in RESA for the flood sampling called `'Elveoverv_Flomprøver'` and, ideally, the flood samples should be linked to this. However, there is no straightforward way to do this RESA, because water samples are assigned to stations and stations are assigned to projects, so it is difficult to get just some of the water samples associated with a particular station associated with a specific project. As far as I can see, this has been achieved previously in an *ad hoc* way: the tables `'RESA2.SAMPLE_SELECTION_DEFINITIONS'` and `'RESA2.SAMPLE_SELECTIONS'` provide most of the necessary database infrastructure, but neither have had any updates since 2011, so they clearly aren't used consistently.

**Need to decide whether to attempt to use this old structure, or to create a temporary solution of my own for the analysis this year**.

### 1.4. Summary

 * For the new core programme, we now have 20 "main" rivers that are monitored monthly (except Glomma and Drammenselva, which have 16 samples per year). These 20 sites comprise the original 11 "main" rivers, plus 8 of the old "bielver", plus one of the old "RID-108" rivers. **Loads and trends for these sites should be calculated in the same way as previously for the RID-11 rivers** <br><br>

 * All the other sites from the old programme (135 in total) are now considered as "other"/"unmonitored". **Loads for these should be calculated using the method previously applied for RID-108 sites** <br><br>
 
 * Additional sampling has been carried out under Option 3. For the analysis presented here these data can be largely ignored, but the **data collected at stations SFJESTR and NTRENAM should be excluded from the loads estimation procedure**

 * Flood sampling has been carried out at 10 locations. **These samples should be removed from the loads and trends analysis to avoid biasing the results**
 
The table below attempts to summarise the main features of the 2017-20 Elveovervåkningsprogrammet programme relevant to the loads and trends analysis:

<img src="../png/change_summary_2017_2020.png" alt="Change summary" width="400">