In [1]:
%matplotlib inline
import pandas as pd
import imp
import glob
import os
from sqlalchemy import create_engine

# Update RID flow datasets

Each year, updated flow datasets (both modelled and observed) are obtained from NVE and added to RESA2. Tore has a number of Access files here:

K:\Avdeling\Vass\316_Miljøinformatikk\Prosjekter\RID\Vannføring

which handle the update process. The code here replaces this.

In [2]:
# Connect to db
resa2_basic_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template'
                    r'\useful_resa2_code.py')

resa2_basic = imp.load_source('useful_resa2_code', resa2_basic_path)

engine, conn = resa2_basic.connect_to_resa2()

## 1. Observed discharge

Observed time series are used **only** for the 11 main rivers (project `RID (O 25800 03)`) - all other calculations are based on modelled flows (from HBV). Discharge for these 11 sites can be obtained from the [Hydra II database](http://www4.nve.no/xhydra/). Note that more than 11 sites are involved, because at some chemistry sampling locations the flow is the sum of several NVE discharge series. To find which NVE sites are associated with each chemistry sampling location, see RESA2.

The NVE sites associated with the 11 water chemistry sampling locations are listed in the table below, together with **an indication of whether 2016 data were available as of 19/06/2017**. I have also e-mailed Trine at NVE to ask when the remaining data series will be available. 

| NVE ID | 2016 available in Hydra II? |         Comment         |
|:------:|:---------------------------:|:-----------------------:|
| 12.285 |              0              |                         |
| 16.133 |              0              |                         |
| 16.153 |              0              | Can use 16.133 instead? |
|  2.605 |              0              |                         |
| 212.11 |              0              |                         |
|  151.5 |              0              |                         |
|  6.78  |              0              |                         |
|  21.11 |              1              |                         |
|  21.71 |              1              |  Can use 21.11 instead? |
|  15.61 |              1              |                         |
| 121.22 |              1              |                         |
|  28.7  |              1              |                         |
|  62.5  |              1              |                         ||


Following discussions with Tore on 19/06/2017, note the following:

 * Chemistry station 29613 should ideally use several NVE series, but some of the usually prove hard to get in time. In the past, Tore has used NVE station 16.133 and then just **added 10 $m^3/s$** to represent the the missing flow volumes. **Check this is OK with Øyvind**. <br><br>
 
 * Discharge data for chemistry stations 29617 (NVE ID 2.605) and 36225 (NVE ID 6.78) are often delayed. Need to contact Trine at NVE early to avoid problems later.

In [3]:
# Add code for updating observed datasets here

## 2. Modelled discharge

Stein has supplied modelled data from HBV for the period from 1990 to 2016 (see e-mail received 13/06/2017 at 12.17). These data are stored locally here:

C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet\Data\hbv_modelled\RID_2016

The flow files are named e.g. `hbv_00000001.var`, where the number corresponds to the NVE "vassdragsområde". These are listed in *vassomr.pdf* in the above folder, and they're also included in RESA2's `DISCHARGE_STATIONS` table. The vassdragsområde numbers are stored in the `NVE_SERINUMMER` field.

Tore has an Access database in e.g.

K:\Avdeling\Vass\316_Miljøinformatikk\Prosjekter\RID\Vannføring\Modellert\NVE_MODELLERT2016\vannføring

that first deletes the modelled NVE values for each station from 1990 onwards and then adds the new data, which includes everything from 1990 plus the additional year of data. The code below does the same, and performs some basic checking of the data at the same time.

In [4]:
# Year of interest
year = 2016

# Folder containing modelled data
data_fold = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\Data\hbv_modelled\RID_2016')

In [None]:
# Get a list of files to process (only interested in flow here)
search_path = os.path.join(data_fold, 'hbv_*.var')
file_list = glob.glob(search_path)

# Get number of days between 1990 and year of interest
days_new = len(pd.date_range(start='1990-01-01', 
                             end='%s-12-31' % year,
                             freq='D'))

# Get number of days between 1990 and year before
days_old = len(pd.date_range(start='1990-01-01', 
                             end='%s-12-31' % (year-1),
                             freq='D'))

# Loop over files
for file_path in file_list:
    # Get name and reg. nr.
    name = os.path.split(file_path)[1]
    reg_nr = int(name.split('_')[1][:-4])
    
    # Get RESA2 station ID
    sql = ("SELECT dis_station_id FROM resa2.discharge_stations "
           "WHERE nve_serienummer = '%s'" % reg_nr)
    dis_id = pd.read_sql_query(sql, engine).iloc[0,0]

    # Check number of post-1990 records already in db
    # (should equal days_old)
    sql = ("SELECT COUNT(*) FROM resa2.discharge_values "
           "WHERE dis_station_id = %s "
           "AND xdate >= DATE '1990-01-01'" % dis_id)    
    cnt_old = pd.read_sql_query(sql, engine).iloc[0,0]    
    assert cnt_old == days_old, 'Unexpected number of records already in database.'
    
    # Read new data
    df = pd.read_csv(file_path, delim_whitespace=True, 
                     header=None, names=['XDATE', 'XVALUE'])
    
    # Convert dates
    df['XDATE'] = pd.to_datetime(df['XDATE'], format='%Y%m%d/1200')

    # Check st, end and length
    assert df['XDATE'].iloc[0] == pd.Timestamp('1990-01-01'), 'New series does not start on 01/01/1990.'
    assert df['XDATE'].iloc[-1] == pd.Timestamp('%s-12-31' % year), 'New series does not end on 31/12/%s.' % year
    assert len(df) == days_new, 'Unexpected length for new series.'
    
    # Add station ID to df
    df['DIS_STATION_ID'] = dis_id
    
    # Drop existing rows post-1990 for this site
    sql = ("DELETE FROM resa2.discharge_values "
           "WHERE dis_station_id = %s "
           "AND xdate >= DATE '1990-01-01'" % dis_id)
    res = conn.execute(sql)
    
    # Add new rows
    df.to_sql('discharge_values', con=engine, schema='resa2', 
              if_exists='append', index=False)    