In [1]:
%matplotlib inline
import nivapy3 as nivapy
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

In [2]:
# Connect to NIVABASE
eng = nivapy.da.connect()

Connection successful.


# Monthly fluxes from Elveovervåkingsprogrammet

Helene would like to calculate monthly fluxes during 2017 for three stations in the Elveovervåkingsprogramme: ØSTEGLO, SFJENAU and TROEMÅL. ØSTEGLO has 16 samples collected during 2017 (one per month, except during May and June, when three samples per month were taken), whereas SFJENAU and TROEMÅL have 12 samples each (one per month). We also have daily discharge for each location, estimated from modelled values supplied by Stein Beldring at NVE.

The parameters of interest are:

 * NH3
 * NH4
 * PO4
 * SiO2
 * TotN
 * TotP
 * TOC
 * DOC
 * SPM
 
**Note:** the difference between DOC and TOC is often negligible in Norwegian rivers. Results from NIVALAB during 2017 for the three stations listed above sometimes show DOC > TOC, which is an indication that differences between TOC and DOC are smaller than measurement errors in the lab equipment.

## 1. Water chemsitry

The basic datasets required for this analysis have already been compiled for the Elveovervåkingprogramme, except that work does not distinguish between DOC and TOC and the focus is on annual - not monthly - fluxes. Liv Bente has therefore extracted a more complete dataset from RESA for this analysis and placed it on the network here:

    K:\Prosjekter\Ferskvann\16384 Elveovervåkingsprogrammet\2019\6. Annen rapportering\Data til Helene jan2019\RESA-data_Glo Nau Mål Mål2_21jan19.xlsx
    
I have created a simplified/tidied version of this file containing just the 2017 data, which is named `'resa2_glo_nau_mal_2017_jes.xlsx'`.

**Note:** The code below sets any LOD values equal to the LOD. This is very rough, and will lead to overestimates of fluxes for parameters such as NH4, where many of the measurements are at the LOD.

In [3]:
# Read chem data
xl_path = r'../resa2_glo_nau_mal_2017_jes.xlsx'
chem_df = pd.read_excel(xl_path, sheet_name='Data')

# Assume LOD values are equal to the LOD
cols = list(set(chem_df.columns) - set(['RESA_ID', 'Code', 'Name', 'Date']))
for col in cols:
    chem_df[col] = chem_df[col].astype(str).str.strip('<').astype(float)
    
chem_df

Unnamed: 0,RESA_ID,Code,Name,Date,SPM_mg/l,TOC_mgC/l,DOC_mgC/l,TOTN_ugN/l,NH4_ugN/l,NO3_ugN/l,TOTP_ugP/l,PO4P_ugP/l,SiO2_mgSiO2/l,CDOM_Abs_250-400
0,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-01-22 13:45:00,7.18,3.4,3.4,580.0,29.0,410.0,13.0,8.0,4.436,2218.941
1,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-02-14 15:00:00,1.44,3.0,,520.0,28.0,350.0,5.0,2.0,3.514,1858.315
2,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-03-06 14:15:00,9.05,3.7,,715.0,6.0,520.0,16.0,7.0,4.307,2446.328
3,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-04-05 11:05:00,12.0,3.9,3.7,750.0,13.0,510.0,20.0,12.0,4.393,2554.683
4,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-05-04 10:40:00,4.09,4.7,4.8,590.0,13.0,450.0,11.0,5.0,3.9,3098.491
5,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-05-10 15:00:00,9.2,4.4,4.3,575.0,10.0,360.0,16.0,9.0,3.75,2865.786
6,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-05-21 17:00:00,5.74,5.3,4.6,610.0,2.0,350.0,14.0,9.0,3.793,3780.659
7,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-06-06 20:50:00,5.02,3.3,3.4,400.0,7.0,220.0,12.0,2.0,3.043,2369.208
8,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-06-12 12:20:00,4.61,3.7,3.5,490.0,3.0,300.0,10.0,13.0,3.043,2525.245
9,29617,ØSTEGLO,Glomma ved Sarpsfoss,2017-06-21 12:20:00,5.99,3.5,3.4,480.0,16.0,270.0,6.0,8.0,3.214,2440.111


## 2. Discharge

The code below extracts daily discharge for each station during 2017.

In [4]:
# Period of interest
st_dt = '2017-01-01'
end_dt = '2017-12-31'

# Dict to store results
q_dict = {}

# Loop over stations
for stn_id in chem_df['RESA_ID'].unique():
    stn_id = int(stn_id) # Convert from np 64-bit int
    q_df = nivapy.da.extract_resa_discharge(stn_id,
                                            st_dt,
                                            end_dt,
                                            eng)
    q_dict[stn_id] = q_df

# Print last df as example
q_df.head()

Unnamed: 0_level_0,flow_m3/s
date,Unnamed: 1_level_1
2017-01-01,71.077032
2017-01-02,64.928201
2017-01-03,60.723947
2017-01-04,57.141451
2017-01-05,54.039096


## 3. Estimating loads

The simplest load estimators are "averaging estimators", and one of the simplest of these is to calculate (i) the average concentration in each month and (ii) the average discharge in each month, then multiply the two together. Such approaches have been shown to be **precise, but inaccurate**, with a **tendency to underestimate the true load**. These properties make them most suitable for e.g. trend tests, where the direction of change is more important than the absolute estimate. Nevertheless, due to its simplicity, this is a common starting point.

In [5]:
# Dict linking ID to site names
names_dict = {29617:'Glomma',
              29848:'Målselv',
              29842:'Nausta'}

# Container for output
df_list = []

# Loop over stations
for stn_id in chem_df['RESA_ID'].unique():
    # Get chem and q
    df = chem_df.query("RESA_ID == @stn_id")
    df.index = df['Date']
    q_df = q_dict[stn_id]
    
    # Monthly averages
    df = df.resample('M').mean()
    q_df = q_df.resample('M').mean()
    
    # Join
    df = df.join(q_df)
    
    # Calc loads
    cols = list(set(df.columns) - set(['RESA_ID', 'flow_m3/s', 'CDOM_Abs_250-400']))
    for col in cols:
        # Get par and unit
        par, unit = col.split('_')
        
        # Get conversion factor to kg
        if unit[0] == 'm':
            fac = 1.E6
        elif unit[0] == 'u':
            fac = 1.E9
        else:
            raise ValueError('Unit prefix cannot be identified.')
        
        # Load and flow
        df['flow_m3/month'] = df['flow_m3/s']*60*60*24*df.index.day
        df['%s_kg/month' % par] = df[col]*df['flow_m3/month']*1000 / fac 
        
        # Tidy
        del df[col]
    del df['flow_m3/s']    
    df.index = df.index.to_period('M')
    
    # Add to output
    df_list.append(df)
    
    # Plot
    df2 = df.copy()
    del df2['RESA_ID'], df2['CDOM_Abs_250-400']
    df2.plot(layout=(5,2), figsize=(10,10), subplots=True, marker='o') 
    plt.savefig('%s_monthly_loads.png' % names_dict[stn_id], dpi=300)
    plt.close()

# Combine results
df = pd.concat(df_list, axis=0, sort=False)
df.reset_index(inplace=True)

# Save
out_csv = r'glo_nau_mal_2017_monthly_loads.csv'
df.to_csv(out_csv, index=False, encoding='utf-8')

df.head(12)

Unnamed: 0,Date,RESA_ID,CDOM_Abs_250-400,flow_m3/month,TOTN_kg/month,PO4P_kg/month,TOTP_kg/month,SiO2_kg/month,SPM_kg/month,TOC_kg/month,NO3_kg/month,DOC_kg/month,NH4_kg/month
0,2017-01,29617,2218.941,1405013000.0,814907.6,11240.105365,18265.171218,6232638.0,10087990.0,4777045.0,576055.4,4777045.0,40745.381947
1,2017-02,29617,1858.315,1046472000.0,544165.6,2092.944561,5232.361401,3677304.0,1506920.0,3139417.0,366265.3,,29301.223848
2,2017-03,29617,2446.328,1029572000.0,736143.7,7207.001641,16473.146609,4434365.0,9317624.0,3809415.0,535377.3,,6177.429978
3,2017-04,29617,2554.683,1341542000.0,1006156.0,16098.503804,26830.839673,5893394.0,16098500.0,5232014.0,684186.4,4963705.0,17440.045787
4,2017-05,29617,3248.312,2586074000.0,1530094.0,19826.564097,35343.005565,9864147.0,16404330.0,12413150.0,999948.5,11809740.0,21550.613149
5,2017-06,29617,2444.854667,3702071000.0,1690612.0,28382.542769,34552.660763,11476420.0,19275450.0,12957250.0,974878.6,12710440.0,32084.613565
6,2017-07,29617,1775.918,1976157000.0,770701.1,7904.626416,13833.096229,5590547.0,6995594.0,5533238.0,434754.5,5335623.0,5928.469812
7,2017-08,29617,1623.922,2752791000.0,1252520.0,19269.535577,38539.071155,7374727.0,8726347.0,7157256.0,660669.8,6881977.0,44044.652748
8,2017-09,29617,3608.692,2519175000.0,1007670.0,5038.350528,22672.577375,7882499.0,6726198.0,13099710.0,503835.1,12847790.0,27710.927902
9,2017-10,29617,3567.611,2508828000.0,1480209.0,17561.797045,35123.59409,8923902.0,21726450.0,12293260.0,1154061.0,11791490.0,5017.656299
