<a href="https://colab.research.google.com/github/atedstone/unil_envi_ggl_hydrology_practicals/blob/main/supraglacial_q.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Practical: Interpreting supraglacial melt and discharge

In [None]:
# Install ipympl so that we can use interactive plots
!pip install ipympl

**Important!** If this is the first time you are running the Notebook, restart your session **now** using the menu 'Runtime... Restart session'. You don't need to run the above cell again afterwards, you can continue with the rest of the Notebook.

In [None]:
# Activate ipympl in Colab
from google.colab import output
output.enable_custom_widget_manager()

In [None]:
# Packages for data download
import requests
import os

In [None]:
# Packages for data analysis
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib widget

## Download the data

In [None]:
files = {'Moulin_L41A_ice_ablation.tab': 'https://doi.pangaea.de/10.1594/PANGAEA.926842?format=textfile',
         'Moulin_L41A_Q.tab': 'https://doi.pangaea.de/10.1594/PANGAEA.926844?format=textfile',
         'KAN_M_hour_seb_2012.csv': 'https://unils-my.sharepoint.com/:x:/g/personal/andrew_tedstone_unil_ch/EX_-Q31D461Fq9iKzeswDGwBKm8wQMshDDSODn2pAAviYQ?download=1'
}
for f in files:
    if os.path.exists(f):
        print('File already downloaded.')
    else:
        print('Downloading...')
        response = requests.get(files[f])
        open(f, 'wb').write(response.content)
print('Done.')

## Open the datasets using Pandas

In [None]:
ablation = pd.read_csv('Moulin_L41A_ice_ablation.tab', sep=r'\t', skiprows=19, parse_dates=True, index_col=0, engine='python')

In [None]:
discharge = pd.read_csv('Moulin_L41A_Q.tab', sep=r'\t', skiprows=20, parse_dates=True, index_col=0, engine='python')

In [None]:
seb = pd.read_csv('KAN_M_hour_seb_2012.csv', parse_dates=True, index_col='time')

## Initial look at dataset structure

In [None]:
# In Jupyter Notebook, entering the name of a Pandas DataFrame prints its head and foot to screen
ablation

In [None]:
# We can also get some basic statistics calculated over the whole DataFrame
ablation.describe()

In [None]:
discharge

Notice that there are three columns of data in the discharge dataset. The columns with 'min' and 'max' in the titles are actually the - and + error bounds corresponding to the 95% confidence intervals. They are *not* the absolute min and max values. See the plotting code below for an example of how to use these columns.

In [None]:
seb

Here we have three columns of measurements acquired by a nearby automatic weather station, the PROMICE KAN_M site. The columns are as follows:

- `swnet` : Net shortwave radiation (i.e., $SW_{down} - SW_{up}$)
- `dlhf_u` : Latent heat flux (+ve = energy supply to surface)
- `dshf_u` : Sensible heat flux (+ve = energy supply to surface)

All units are $W m^{-2}$.

## Plot the data

In [None]:
# Set up the figure and subplots
plt.figure(figsize=(7,7))
ax_ablat = plt.subplot(411)
ax_seb = plt.subplot(412, sharex=ax_ablat)
ax_alb = plt.subplot(413, sharex=ax_ablat)
ax_disch = plt.subplot(414, sharex=ax_ablat)

# Ablation data
ax_ablat.errorbar(ablation.index+pd.Timedelta(hours=12), ablation['Ablation [mm]'], yerr=ablation['Ablation [±]'], 
                  drawstyle='steps-mid', elinewidth=0.7, ecolor='gray')
ax_ablat.grid()
ax_ablat.set_ylabel('Ablation (mm)')

# Surface energy balance data
ax_seb.plot(seb.index, seb.dlhf_u, label='Latent')
ax_seb.plot(seb.index, seb.dshf_u, label='Sensible')
ax_seb.plot(seb.index, seb.swnet, label='SWnet')
ax_seb.grid()
ax_seb.set_ylabel('W m-2')

# Albedo
# First remove invalid values, which occur during night times
seb['alb'] = seb['alb'][(seb['alb'] < 1) & (seb['alb'] > 0)]
ax_alb.plot(seb.index, seb.alb, label='Albedo', color='lightgrey')
ax_alb.plot(seb.index, seb.alb.rolling('24h').mean(), label='Albedo smoothed 24h', linewidth=2)
ax_alb.set_ylim(0, 1)
ax_alb.grid()
ax_alb.set_ylabel('Albedo (0-1)')

# Discharge at the confidence bounds
lower_q = discharge['Q hour mean [m**3/s]'] + discharge['Q hour max [m**3/s]']
upper_q = discharge['Q hour mean [m**3/s]'] - discharge['Q hour min [m**3/s]']

# Plot as a filled area
ax_disch.fill_between(lower_q.index, lower_q, upper_q, alpha=0.5)
discharge['Q hour mean [m**3/s]'].plot(ax=ax_disch)
ax_disch.grid()
ax_disch.set_ylabel(r'Q ($m^3s^{-1}$)')

# Plot vertical lines
#ax_disch.axvline('2012-07-08', color='red')

# Or plot shaded areas
#ax_disch.axvspan('2012-07-09', '2012-07-11', alpha=0.3, color='red')

## Questions / Exercises

### Based only on what you see in these data, do you think that you are looking at snow melt or ice melt?

### How much water flows into the moulin per day, expressed as Olympic swimming pools?

In [None]:
# Resample the discharge data to daily sums, matching sampling of ablation data
# Note the min_count is set to the number of hours in a day, so that we only
# look at days with full temporal data
# We multiply by 60secs * 60 mins to convert m3s-1 into m3day-1.
q_daily = discharge['Q hour mean [m**3/s]'].resample('1D').sum(min_count=24) * 60 * 60

In [None]:
vol_pool_m3 = # some value, go find it.....!
# now calculate the mean daily discharge and divide it by this volume.

### Is there a significant correlation between daily ablation and daily discharge?

*To answer this question you will find the code below helpful.*

In [None]:
import statsmodels.api as sm
# Do an 'Ordinary Least Squares' regression
# See https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html for more information

# Cut ablation data to same time period as discharge (discharge is shorter)
X = ablation.loc[q_daily.index[0]:q_daily.index[-1]]['Ablation [mm]']

# Get the discharge mean series
Y = q_daily

# Some days are missing data; we 'drop' (remove) those days.
model = sm.OLS(Y, sm.add_constant(X), missing='drop')

fit = model.fit()
fit.summary()

Put your answer to this question below. Remember to consider the significance of the test result.

### Is this value what you expected, and why?

### When is peak daily discharge? Why might this be?

To answer this question and the next one, there are at least a couple of approaches that you could use:

- Manual visual interpretation of diurnal values
- Time series resampling -- see the `q_daily` example above, you can supply different functions to the `sum()` operation that we used there.

### When is base (minimum) daily discharge? Why might this be?

### You are following a range of other courses related to data analysis during your studies. What other technique(s) might you use to extend this analysis?