In [1]:
import pandas as pd

# Langtjern flux calculations

Espen needs some simple fluxes calculating for Langtjern. I've copied his Excel files for chemistry and discharge into separate sheets in a single file called `'langtjern_tidied.xlsx'`.

**Note:** I'm assuming all mercury values are in ng/l.

In [2]:
# Read data
in_xlsx = 'langtjern_tidied.xlsx'
chem_df = pd.read_excel(in_xlsx, sheet_name='chem')
flow_df = pd.read_excel(in_xlsx, sheet_name='flow')

# Unpivot flows
flow_df.set_index(['stn_code', 'date'], inplace=True)
flow_df = flow_df.unstack(level='stn_code')
flow_df.reset_index(inplace=True)
flow_df.columns = ['date', 'Q_LAE01_m3/s', 'Q_LAE02_m3/s', 'Q_LAE03_m3/s']

# Check for nulls
print(pd.isnull(flow_df).sum())

flow_df.head()

date              0
Q_LAE01_m3/s      0
Q_LAE02_m3/s    366
Q_LAE03_m3/s    366
dtype: int64


Unnamed: 0,date,Q_LAE01_m3/s,Q_LAE02_m3/s,Q_LAE03_m3/s
0,2008-01-01,0.022432,0.010192,0.004525
1,2008-01-02,0.023341,0.008469,0.00376
2,2008-01-03,0.024266,0.01044,0.004635
3,2008-01-04,0.025758,0.012918,0.005735
4,2008-01-05,0.02899,0.016682,0.007406


In [3]:
chem_df.head()

Unnamed: 0,stn_code,date,toc_mg/l,hg_ng/l,mehg_ng/l
0,LAE01,2008-05-19,6.5,2.7435,0.09198
1,LAE01,2008-06-22,6.2,2.78775,0.09603
2,LAE01,2008-07-21,8.9,3.894,0.09603
3,LAE01,2008-08-25,12.7,3.894,0.08536
4,LAE01,2008-09-15,12.8,3.0975,0.11737


Flow data are available for three stations, all starting 01/01/2008. Sites LAE02 and LAE03 both have 366 missing values in the table above, which suggests these data series are exacly one year shorter than the series for LAE01. A quick check in the Excel file confirms this: the series for LAE01 ends on 31/12/2016, whereas the other two series stop at the end of 2015. 

In [4]:
# Unique sites in chem data
all_stns = chem_df['stn_code'].unique()

all_stns

array(['LAE01', 'LAE02', 'LAE03', 'LAE11'], dtype=object)

As shown above, the chemistry file has data for 4 sites. The workflow is as follows:

 1. For all sites:
 
   * Create a time series with daily resolution from 2008 to 2017 inclusive
   * Match water chemistry sample dates to this series and linearly interpolate over data gaps <br><br>    
   
 2. For LAE01, LAE02 and LAE03:
 
   * Match the daily concentartions to daily discharges
   * Calculate daily fluxes as $(flow \times concentration)$
   
## 1. Daily concentrations

In [5]:
# Prepare to write output file
writer = pd.ExcelWriter('langtjern_fluxes.xlsx')

# Dataframe of dates from 2008 to 2017
date_df = pd.DataFrame({'date':pd.date_range(start='2008-01-01', 
                                             end='2017-12-31', 
                                             freq='D')})

# Loop over stns
for stn in all_stns:
    # Get chem data for this stn
    stn_chem_df = chem_df.query("stn_code == @stn")
    del stn_chem_df['stn_code']

    # Join to dates
    stn_chem_df = pd.merge(date_df, stn_chem_df, 
                           how='left', on='date')
    
    # Interpolate
    stn_chem_df.interpolate(method='linear', inplace=True, limit_direction='both')
    
    # Add stn_code
    stn_chem_df['stn_code'] = stn
    
    # Reorder cols
    stn_chem_df = stn_chem_df[['stn_code', 'date', 'toc_mg/l', 
                               'hg_ng/l', 'mehg_ng/l']]
    
    stn_chem_df.to_excel(writer, 
                         sheet_name='%s_intep_chem' % stn, 
                         index=False)
    
    writer.save()

## 2. Fluxes

In [6]:
# Loop over stations with Q data
for stn in ['LAE01', 'LAE02', 'LAE03']:
    # Get flow data
    stn_q_df = flow_df[['date', 'Q_%s_m3/s' % stn]]
    
    # Get chem df
    stn_chem_df = pd.read_excel('langtjern_fluxes.xlsx',
                                sheet_name='%s_intep_chem' % stn)
    
    # Join
    df = pd.merge(stn_q_df, stn_chem_df, 
                  how='left', on='date')
    
    # Rename flow column
    df.rename({'Q_%s_m3/s' % stn:'Q_m3/s'},
              axis='columns',
              inplace=True)
    
    # Calculate fluxes
    df['F_toc_mg/s'] = 1000*df['Q_m3/s']*df['toc_mg/l']
    df['F_hg_ng/s'] = 1000*df['Q_m3/s']*df['hg_ng/l']
    df['F_mehg_ng/s'] = 1000*df['Q_m3/s']*df['mehg_ng/l']
    
    df.to_excel(writer, 
                sheet_name='%s_fluxes' % stn, 
                index=False)
    
    writer.save()