In [1]:
%matplotlib inline
import nivapy3 as nivapy
import pandas as pd
import matplotlib.pyplot as plt
import glob
import os

plt.style.use('ggplot')

# Long-term trends in annual temperature

As part of reporting to Miljødirektoratet, Øyvind would like to estimate long-term trends in annual water temperature for the main rivers in Elveovervåkingsprogrammet. Looking in NVE's Hydra-II database, the following temperature records are available:

| St.ID | Station Code | Station name         | VANNTEMPERATURDATA                                 | Start     | End  |
|-------|--------------|----------------------|----------------------------------------------------|-----------|------|
| 29617 | ØSTEGLO      | Glomma ved Sarpsfoss | 2.1087.0.1003.1 Glomma ovf. Sarpsfossen            | Sept-2007 | 2017 |
| 36225 | OSLEALN      | Alna                 |                                                    |           |      |
| 29612 | BUSEDRA      | Drammenselva         | 12.298.0.1003.4 Drammenselva v/Døvikfoss           | Dec-1986  | 2017 |
| 29615 | VESENUM      | Numedalslågen        | 15.115.0.1003.1 Numedalslågen v/Brufoss            | Nov-1984  | 2017 |
| 29613 | TELESKI      | Skienselva           | 16.207.0.1003.2 Skienselva ndf. Norsjø             | Nov-1989  | 2017 |
| 30019 | AAGEVEG      | Vegårdselva          |                                                    |           |      |
| 29614 | VAGEOTR      | Otra                 | 21.79.0.1003.1 Otra v/Mosby                        | Jan-1986  | 2017 |
| 29832 | ROGEBJE      | Bjerkreimselva       | 27.29.0.1003.1 Bjerkreimselvi v/Bjerkreim          | Apr-1986  | 2017 |
| 29783 | ROGEORR      | Orreelva             |                                                    |           |      |
| 29837 | ROGEVIK      | Vikedalselva         | 38.2.0.1003.1 Vikedalselva utløp                   | Oct-1985  | 2017 |
| 29821 | HOREVOS      | Vosso(Bolstadelvi)   | 62.30.0.1003.3 Vosso ovf. Evangervatnet            | Jun-1987  | 2017 |
| 29842 | SFJENAU      | Nausta               | 84.23.0.1003.3 Nausta v/Hovefossen                 | Dec-1989  | 2017 |
| 29822 | MROEDRI      | Driva                | 109.44.0.1003.2 Driva ndf. Grøa                    | Jul-2000  | 2015 |
| 29778 | STREORK      | Orkla                | 121.62.0 Orkla v/Merk Bru                          | Mar-1989  | 2017 |
| 29844 | STRENID      | Nidelva(Tr.heim)     |                                                    |           |      |
| 29782 | NOREVEF      | Vefsna               | 151.32.0.1003.3 Vefsna v/Laksfors                  | Sept-1993 | 2017 |
| 29848 | TROEMÅL      | Målselv              | 196.35.0.1003.1 Malangsfoss                        | May-1997  | 1997 |
| 29779 | FINEALT      | Altaelva             | 212.68.0.1003.1 Alta v/Gargia                      | Sept-1980 | 2016 |
| 29820 | FINETAN      | Tanaelva             | 234.19.0.1003.1 Tana ovf. Polmakelva               | Jul-1990  | 2016 |
| 29819 | FINEPAS      | Pasvikelva           | 246.11.0.1003.1 Pasvikelva v/Skogfoss kraftstasjon | Mar-1991  | 2017 |

Øyvind has suggested we consider those stations with records beginning before 1995 (see e-mail received 29.10.2018 at 08.41 for details), so I have download daily temperature data for these 13 stations from Hydra. 

**Note:** It is possible to aggregate values to annual resolution *before* exporting from Hydra, but in this case any missing values within a year will cause the whole year to be assigned "no data" (e.g. a temperature record with 364 data values and only 1 day missing will become `'NaN'` in the output). This seems excessive, but we do need to take care when calculating temperature averages from years with partial data (because the seasonal variation in temperature is obviously very strong). In the code below, I have created a user-defined parameter called `'prop'` which represents the proportion of the year that must have data in order to be included in the analysis. I've set this to 0.75 as a starting point i.e. there must be at least 274 ($= 0.75 \times 365$) non-null temperature measurements in a year for it to be included. **Check that Øyvind is happy with this**.

I have also recently implemented some basic non-parametric statistical tests in NivaPy (`'nivapy.stats'`), so this seems like a good opportunity to test my new code.

## 1. User input

In [2]:
# Proportion of year with daily values for year 
# to be used in the analysis
prop = 0.75

# Data
data_fold = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
             r'\Data\temperature_data\long_term_trends\hydra-ii_daily')

# Output
png_fold = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
            r'\Data\temperature_data\long_term_trends\png')

# Output Excel file
out_xlsx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
            r'\Data\temperature_data\long_term_trends\long_term_temp_trends.xlsx')

## 2. Loop over data

The code below loops over each daily resolution temperature file from Hydra-II and performs the following calculations:

 1. Resample to annual resolution by taking the mean, and count the number of non-null measurements in each year.
 
 2. Filter out years with less than $(prop \times 365)$ data points
 
 3. Write the annual results to a new worksheet in an Excel file
 
 4. Perform the Mann-Kendall and Sen's Slope tests on the annual data and print summary results
 
 5. Plot the fitted Sen's slope against the raw data values and save the plot as a PNG.

In [3]:
# Prepare to write Excel workbook
writer = pd.ExcelWriter(out_xlsx)

# List of files to process
search_path = os.path.join(data_fold, '*.csv')
file_list = glob.glob(search_path)

# Min count based on prop
min_days = prop*365

# Loop over files
for fpath in file_list:
    # Get site code
    fname = os.path.split(fpath)[1]
    code = fname.split('_')[1][:-4]
    # Read annual data
    df = pd.read_csv(fpath, 
                     skiprows=2,
                     na_values=['-9999'],
                     names=['date', 'temp_C'])
    
    # Parse dates to index
    df['date'] = pd.to_datetime(df['date'], format='%Y.%m.%d %H:%M')
    df.set_index('date', inplace=True)
    
    # Resample (counts and avgs)
    cnt_df = df.resample('A').count()
    avg_df = df.resample('A').mean()
    
    # Join
    df = avg_df.join(cnt_df, lsuffix='', rsuffix='_count')
    
    # Filter years with insufficient data
    df = df.query('temp_C_count > @min_days')
    
    # Index to years
    df.index = df.index.year
    
    # Save to Excel
    df.to_excel(writer, sheet_name=code)

    # Run stats
    print('###################################################################')
    print('Station:', code)
    print('Results based on %s years with data' % len(df))
    print('###################################################################')
    
    # Mann-Kendall
    print('M-K test:')
    mk_df = nivapy.stats.mk_test(df, 'temp_C')
    print(mk_df)
    print('')

    # Sen's slope
    print("Sen's slope:")
    res_df, sen_df = nivapy.stats.sens_slope(df, 
                                             value_col='temp_C',
                                             index_col=df.index)
    print(res_df)
    print('###################################################################')
    print('')
    
    # Plot
    nivapy.plotting.plot_sens_slope(res_df, sen_df,
                                    ylabel='Avg. temp. (C)',
                                    title='Station %s' % code)
    out_png = os.path.join(png_fold, 'sens_slp_%s.png' % code)
    plt.savefig(out_png, dpi=300)
    plt.close()
writer.save()

###################################################################
Station: 12-298
Results based on 21 years with data
###################################################################
M-K test:
                            description      value
var_s        Variance of test statistic    1096.67
s                    M-K test statistic         60
z             Normalised test statistic    1.78162
p      p-value of the significance test  0.0748115
trend        Type of trend (if present)   no trend

Sen's slope:
                                            description       value
sslp                              Median slope estimate   0.0201424
icpt                                Estimated intercept    -32.7721
lb     Lower bound on slope estimate at specified alpha -0.00256373
ub     Upper bound on slope estimate at specified alpha   0.0507106
trend                        Type of trend (if present)    no trend
###################################################################

#####