In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sn
import mpld3
import pandas as pd
import imp
import folium
from sqlalchemy import create_engine
sn.set_context('notebook')

# RID project: initial data exploration

Tore coded his previous analyses for the RID project using PL/SQL within the RESA2 database. Some changes are required this year, so I either need to figure out how to update and run Tore's old code, or I need to re-code a new workflow that will be compatible with what has been done previously. This notebook documents my initial exploration of the data and the tables within the NIVA database.

## 1. Sites of interest

A copy of the 2015 report (produced during 2016) is here:

C:\Data\James_Work\Staff\Oyvind_K\Elveovervakingsprogrammet\Report\RID REPORT 2015_1 DEC 2016_final.pdf

According to this, the RID programme includes 11 sites that are monitored at monthly frequency or higher, 36 (actually 37 - see below) sites monitored quarterly, and 108 sites that are no longer monitored at all, but which have data prior to 2004. After a bit of exploration in RESA2, it looks as though the key projects are as follows:

• The RID_11 sites are under project **`RID (O 25800 03)`**

• The RID_36 sites (actually 37 of them) are under project **`RID - Bielver (O 25800 04)`**

• The RID_108 sites are under project **`RID - 109`**

I've extracted basic metadata for the sites in these three projects here:

C:\Data\James_Work\Staff\Oyvind_K\Elveovervakingsprogrammet\Data\RID_Sites_List.xlsx

Note that one of the 37 RID_36 sites is missing an NVE "vassdragnummer", which might explain why it is not usually included?

In the raw station data, there are 7 sites missing geographic (lat/lon) co-ordinates and one missing projected (UTM) co-ordinates. As a first step, I've therefore made some basic co-ordinate conversions and added this metadata to the stations table in RESA2.

The code below reads the metadata and creates a simple interactive map. **Use the "layer control" top-right to turn groups of sites on and off, and click on a site to see the station name and code**.

In [2]:
# Read site data
in_xlsx = r'C:\Data\James_Work\Staff\Oyvind_K\Elveovervakingsprogrammet\Data\RID_Sites_List.xlsx'

rid_11_df = pd.read_excel(in_xlsx, sheetname='RID_11')
rid_36_df = pd.read_excel(in_xlsx, sheetname='RID_36')
rid_108_df = pd.read_excel(in_xlsx, sheetname='RID_108')

In [3]:
# Setup map
map1 = folium.Map(location=[65, 10.8],
                  zoom_start=4,
                  tiles='Stamen Terrain')

# Create feature groups
fg_11 = folium.FeatureGroup(name='RID 11')
fg_36 = folium.FeatureGroup(name='RID 36')
fg_108 = folium.FeatureGroup(name='RID 108')
fgs = [fg_11, fg_36, fg_108]

# Define colours
cols = ['red', 'green', 'blue']

# Add clickable markers for sites
for df_idx, df in enumerate([rid_11_df, rid_36_df, rid_108_df]):
    for idx, row in df.iterrows():  
        folium.Marker([row['lat'], row['lon']], 
                      popup='%s (%s)' % (row['station_name'], 
                                         row['station_code']),
                      icon=folium.Icon(color=cols[df_idx])).add_to(fgs[df_idx])

    # Add feature group to map
    map1.add_child(fgs[df_idx])

# Turn on layer control
map1.add_child(folium.map.LayerControl())

map1

## 2. Site data

For each of the stations identified above, we need to access two types of data: water chemistry and discharge. The NVE discharge datasets are stored in separate tables within RESA2 (`RESA2.DISCHARGE_STATIONS` and `RESA2.DISCHARGE_VALUES`), and the table `RESA2.DEFAULT_DIS_STATIONS` links the site IDs for the water chemistry stations to those for the discharge.

The workflow for estimating loads based on the observed data should therefore look something like this:

 1. Write a function to extract water chemistry time series for a specified site, parameter(s) and time period <br><br>
 
 2. Write a function to identify the NVE discharge station associated with a specified water chemsitry station and extract the discharge values for the desired time period <br><br>
 
 3. Calculate loads using the methodology defined by OSPAR
 
These steps are considered in turn below but, first of all, I need to establish a connection to the database.

In [4]:
# Connect to db
resa2_basic_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template'
                    r'\useful_resa2_code.py')

resa2_basic = imp.load_source('useful_resa2_code', resa2_basic_path)

engine, conn = resa2_basic.connect_to_resa2()

### 2.1. Extract water chemistry data

For the ICP Waters project, I have previously written code to extract data from RESA2. For the RID project, I have modified these functions and transferred them to a new file, which can be found [here]()

In [5]:
# Import custom RID functions
rid_func_path = (r'C:\Data\James_Work\Staff\Oyvind_K\Elveovervakingsprogrammet'
                 r'\Python\rid\useful_rid_code.py')

rid = imp.load_source('useful_rid_code', rid_func_path)

The function `extract_water_chem` provides a reasonably flexible method for extracting and plotting water chemistry data. The example below produces an interactive plot showing time series from 2010 to 2015 for five parameters at Vesenum, which is one of the sites within the RID programme. **Hovering the mouse over the plot should display pan and zoom tools towards the bottom-left corner - use these to explore the data**.

In [6]:
# Stations of interest
stn_id = 29617


# Pars of interest
par_list = ['pH', 'KOND', 'TOC', 'TOTN', 'TOTP']

# Get data between 2010 and 2015
wc_df, fig = rid.extract_water_chem(stn_id, par_list, 
                                    '2010-01-01', '2015-12-31',
                                    engine, plot=True)

# Plot
mpld3.display(fig)

The function also returns a dataframe of the extracted data, in the format illustrated below.

In [7]:
wc_df.head(10)

par,KOND_mS/m,TOC_mg C/l,TOTN_µg/l N,TOTP_µg/l P,pH_None
sample_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04,4.57,3.5,535.0,5.0,6.98
2010-02-08,4.97,3.1,585.0,6.0,7.06
2010-03-08,4.79,3.0,560.0,5.0,7.23
2010-04-06,6.25,4.6,900.0,31.0,7.04
2010-05-10,4.73,6.0,610.0,17.0,7.16
2010-05-18,4.47,5.8,540.0,6.0,7.13
2010-05-25,3.81,5.9,510.0,26.0,6.99
2010-05-31,3.35,4.5,430.0,15.0,7.05
2010-06-07,3.97,3.7,395.0,10.0,7.28
2010-06-18,4.18,3.8,550.0,13.0,7.14


### 2.2. Extract discharge data

Similarly, the function `extract_discharge` provides options for extracting and plotting NVE flow data for a specified water chemistry site. The function automatically identifies the correct NVE station for the RID site specified, and discharge values are scaled by the ratio of catchment areas.

As above, **hovering the mouse over the plot should display pan and zoom tools towards the bottom-left corner - use these to explore the data**.

In [8]:
# Get data between 2010 and 2015
q_df, fig = rid.extract_discharge(stn_id, 
                                  '2010-01-01', '2015-12-31',
                                   engine, plot=True)

# Plot
mpld3.display()

In [9]:
q_df.head()

Unnamed: 0_level_0,flow_m3/s
date,Unnamed: 1_level_1
2010-01-01,628.44171
2010-01-02,627.399518
2010-01-03,627.399518
2010-01-04,627.399518
2010-01-05,627.399518
