# Examining ice sheet data in python with `Pandas`
## Part 3: Implementing the FöhnDA method on AWS18 data in python with `Pandas`

### Overview

In this notebook, we'll:
1. Read hourly AWS data using `pandas.read_csv` and have pandas index the DataFrame based on a date/time field and interpret the dates.
2. Build on skills developed in parts 1 and 2 of the Pandas introduction.
3. Implement the FohnDA method

## Data/methods references

AWS18 data were produced by Jakobs et al 2020:
> Jakobs, C. L., Reijmer, C. H., Smeets, C. J. P. P., Trusel, L. D., van de Berg, W. J., van den Broeke, M. R., & van Wessem, J. M. (2020). A benchmark dataset of in situ Antarctic surface melt rates and energy balance. Journal of Glaciology, 66(256), 291–302. https://doi.org/10.1017/jog.2020.6

* The local data file is: `data/IMAU_aws18_high-res_meteo_hourly.csv`

* These (and other AWS) data are freely-available here: https://doi.pangaea.de/10.1594/PANGAEA.910480
    * Note: The data in this notebook just have an extra decimal date column and are in CSV rather than tabular format. If using the raw data from Pangaea, there shouldn't need to be any modifications to the code other than changing the file name.

The FöhnDA method for detecting fohn-induced melting was introduced by Laffin et al 2021:
> Laffin, M. K., Zender, C. S., Singh, S., Van Wessem, J. M., Smeets, C. J. P. P., & Reijmer, C. H. (2021). Climatology and Evolution of the Antarctic Peninsula Föhn Wind‐Induced Melt Regime From 1979–2018. Journal of Geophysical Research: Atmospheres, 126(4). https://doi.org/10.1029/2020JD033682

* Here we use a slightly modified version of their method as described later on in this notebook. 

To start, let's import some python packages:

In [None]:
# for data reading
import pandas as pd

# for plotting
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter # we'll use this in the last step for our fancy plot

# seaborn adds some extra visual appeal to our plots
import seaborn as sns

# set some universal plot settings here
plt.rcParams["figure.dpi"] = 200 # default plot dpi
sns.set_style('darkgrid') # see: https://seaborn.pydata.org/tutorial/aesthetics.html
sns.set_context("notebook", font_scale=0.65) 
%config InlineBackend.figure_format = 'retina' # make high res plots for high res displays
pd.set_option('display.max_columns', None) # show all columns
pd.options.mode.chained_assignment = None  # turn off a specific type of warning

## What does the AWS18 data look like?

In [None]:
# First, tell Jupyter where to find the csv data:
aws18_datafile = './data/IMAU_aws18_high-res_meteo_hourly.csv'

We can use command line functions using a `!` within Jupyter notebooks, like:

In [None]:
# use the command line utility 'head' to look at the csv file:
!head $aws18_datafile

## Let's use pandas to explore AWS18 data and classify fohn vs non-fohn melting

### First, import the data using `pandas.read_csv` and speciying `parse_dates` and an `index_col`

Try to implement the code below yourself, but if you need help, expand the code below. 


<details>
  <summary>Expand here for help and to see the code.</summary>
    We'd want to tell `Pandas` how to index the data -- in this case, we can use the `Date/Time` column in the data. </br>
    We can also tell it to interpret those dates by specifying parse_dates </br>
    <code>aws18_df = pd.read_csv(aws18_datafile, parse_dates=['Date/Time'], index_col=['Date/Time'])</code>
</details>

In [1]:
# Read the aws18 csv file (enter your code here) 
aws18_df = 

## The Fohn effect

We know from the literature that this region, and this AWS specifically is subject to warm, dry, and windy conditions when the fohn effect is active. 

Laffin and coauthors (https://doi.org/10.1029/2020JD033682) for example developed a method termed `FohnDA` to classify in AWS data when a fohn melt event is likely occurring:
1. T2m > 0°C
2. Relative humidity < 30th percentile
3. Wind speed > 60th percentile

We could implement this in Excel and assess fohn vs. non-fohn melting. In fact, we *have* done this! 

But wouldn't it be easier in python and pandas given how easy it is to select and summarize the data? Yes!

## FohnDA in `Pandas`:

Fundamentally, we just need to select locations (i.e., rows) where the three detection criteria are met. 

And since this assignment is most interested in melt, let's use `Surf temp [°C] (modelled)` = 0 instead of T2m > 0°C.

First, let's get the threshold values for humidity and wind speed.
* To do this apply the `DataFrame.quantile` function. 

<details>
  <summary>Expand here for help and to see the code.</summary>
    <code>rh_30th = aws18_df['RH [%] (at 2m height)'].quantile(0.3)</code> </br>
    <code>ws_60th = aws18_df['ff [m/s] (at 10m height)'].quantile(0.6)</code>
</details>


In [None]:
# relative humidity 30th percentile (enter your code here)
rh_30th = 

In [None]:
# wind speed 60th percentile value: (enter your code here)
ws_60th = 

In [None]:
# after this, print out the values for each. Do they match what you found in Excel?
print(rh_30th)
print(ws_60th)

### Now we just need to find where the three variables are all met:

### Create a column for fohn-induced melt

We want to compare and readily query fohn abd non-fohn melt conditions, so it'll be helpful if we create a new column in the aws18_df where we flag fohn melt first.

In Part 2 of this tutorial, recall that we created new DataFrame fields to track summer vs non-summer melt. We can use that same logic here to track fohn and non-fohn melt conditions.

Here, let's create two new boolean [True/False] columns in our `DataFrame` named fohn_melt and non_fohn_melt. 

<details>
  <summary>Expand here for help and to see the code for classifying fohn melt.</summary>
    <pre><code>aws18_df['fohn_melt'] = (aws18_df['Surf temp [°C] (modelled)']==0) \
                        & (aws18_df['RH [%] (at 2m height)'] &lt rh_30th) \
                        & (aws18_df['ff [m/s] (at 10m height)'] &gt ws_60th)
        </code></pre>
</details>

 
                            
                            
<details>
  <summary>Expand here for help and to see the code for classifying non-fohn melt.</summary>
    <pre><code>aws18_df['non_fohn_melt'] = (aws18_df['fohn_melt']==False) \
                            & (aws18_df['Melt rate [mm w.e.] (surface melt, within dt)']>0)
        </code></pre>
</details>

In [None]:
# classify fohn melt here (enter your code)


In [None]:
# classify non-fohn melt here (enter your code)


In [None]:
# afterwards, show the data table to see that it has the two new columns added
aws18_df

### Great! We've implemented FohnDA via pandas!
#### Now we can find the locations (rows/hours) fohn and non-fohn melt conditions and describe them.

Recall that in part 2 of this tutorial, we selected specific data using the `DataFrame.loc` and a some criteria, like:

> ```aws18_df.loc[aws18_df['Melt rate [mm w.e.] (surface melt, within dt)']>0].describe()```

Here, let's select conditions where fohn_melt == True, and then describe.

After that do the same for when non_fohn_melt == True.

*How do these values compare to what you found in Excel?*

<details>
  <summary>Expand here for help and to see code.</summary>
    <pre><code>aws18_df.loc[aws18_df['fohn_melt']==True].describe()
        </code></pre>
</details>

In [None]:
# enter code here to create a descriptive table of fohn melt events 


In [None]:
# enter code here to create a descriptive table of non-fohn melt events 


### And we can define and query only the columns we're most interested in rather than looking at all data descriptions:

This creates a string array with the column names that we can then use to select:

## Now let's create variables that track cumulative fohn and non-fohn melt hours

* Need to implement this using some if/then type logic. 
* The basics are:
    * Create a new column that will hold a counter value for consecutive hours of melt
    * If a current hour is melting set the counter to 1 + what was in the previous counter value
        * Thus, a single hour of melt = 1
        * If this is followed by another melt hour we get 1 + 1

In [None]:
# Track cumulative fohn hours
# first, create a blank column
aws18_df['fohn_consec_hours'] = 0

# for every row in the range of the length of the dataframe
# len gives the length (i.e., total number of data rows)
# range iterates through the rows (i.e., 0, 1, 2, 3 ... n)

for i in range(len(aws18_df)):
    # check if this row holds a fohn melt event
    if aws18_df['fohn_melt'][i]==True:
        # if true, in set the melt counter value for this row (hour) to 1 plus the value from the previous row (hour)
        aws18_df['fohn_consec_hours'][i]=1 + aws18_df['fohn_consec_hours'][i-1]
    else:
        # otherwise set the value in this row to zero (i.e, there's no melt)
        aws18_df['fohn_consec_hours'][i]=0

In [None]:
# Track cumulative non-fohn hours
# first, create a blank column
aws18_df['non_fohn_consec_hours'] = 0

for i in range(len(aws18_df)):
    # check if this row holds a non-fohn melt event
    if aws18_df['non_fohn_melt'][i]==True:
        # if true, in set the melt counter value for this row (hour) to 1 plus the value from the previous row (hour)
        aws18_df['non_fohn_consec_hours'][i]=1 + aws18_df['non_fohn_consec_hours'][i-1]
    else:
        # otherwise set the value in this row to zero (i.e, there's no melt)
        aws18_df['non_fohn_consec_hours'][i]=0

## Now we can get info about the maximum consecutive duration fohn and non-fohn melt events:

Looking the max values in the descriptive table will work:

In [None]:
aws18_df.describe()

Or by just querying the max value:

In [None]:
max_fohn_melt_hrs = aws18_df['fohn_consec_hours'].max()
max_nonfohn_melt_hrs = aws18_df['non_fohn_consec_hours'].max()

print("Maximum consecutive hours of fohn-induced melt: " + str(max_fohn_melt_hrs))
print("Maximum consecutive hours of non-fohn-induced melt: " + str(max_nonfohn_melt_hrs))

### But when did the longest fohn melt event occur? 

We can use `idxmax` which gives the index value of the maximum value.

* See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.idxmax.html

>Return index of first occurrence of maximum over requested axis.

And we have Date/Time as our index, so it'll give us that:


In [None]:
max_fohn_time = aws18_df['fohn_consec_hours'].idxmax()
max_fohn_time

In [None]:
# Show us everything in that row
aws18_df.loc[max_fohn_time]

### But what about when it started?

This is definitely a bit trickier. It requires thinking programmatically and understanding/Googling some of the finer details of `pandas` indexing. 

In [None]:
# get the numerical index value (i.e., row number) at the time of maximum fohn melt duration
max_index = aws18_df.index.get_loc(max_fohn_time)
max_index

This tell us the row number where this max fohn duration value occurs.

Now if we want to find when it started, we just need to subtract the number of fohn melt hours minus 1 from this row number.

Then we can see what's in this row (especially the index value, which we set to the Date/Time column)

In [None]:
# length of event minus 1 hour will tell us when it started
# recall aws18_df['fohn_consec_hours'].max() tells us the max duration (50 hours, here)
n_hrs_previous = int(aws18_df['fohn_consec_hours'].max() - 1)

# now give the index (date/time) of when this was
aws18_df.index[max_index - n_hrs_previous]

Now we know when it started, we can also just see what else what happening then, and make sure it looks like we've selected the correct start time.

In [None]:
aws18_df.loc['2016-05-25 09:00:00']

## Let's make a plot of all melt, and highlight fohn melt

### First, let's convert hourly data to daily max values using the `pandas.DataFrame.resample` function:
This type of resampling is incredibly easy! In fact, this is what I used when we were exploring daily A2 data -- I just resampled from hourly to daily. This sort of operation would be much, much more difficult and time consuming to implement in Excel, and much more prone to introducing errors.

Documentation here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html
> Resample time-series data.

> Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

In [None]:
# convert to daily maxes
aws18_daily = aws18_df.resample('D').max()

## Now let's move on to making a nice-looking plot to explore the timing of fohn-induced melt and its relation to melt rates.

Below, I've written some code that plots daily max melt data and shows the fohn-induced melt as vertical blue bars.

I've set plotting function up in a `for` loop to to produce a plot in every year that I've defined in a list of strings.

The plot has two axes -- one for the melt data, one for the Boolean True/False for fohn_melt, which in the plot convert to an integer field where 1=True. As such, I limit the axis with the fohn melt to between 0 and 1.

Both data series are plotted using the [`matplotlib.pyplot.fill_between`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.fill_between.html) method, which gives the step-line appearance as opposed to lines. This works especially well here as we want to shade the areas of fohn-induced melt. 

In [None]:
# manually define an array of years as string values (i.e., in single quotes)
years = ['2015','2016','2017','2018']

for i in range(len(years)):
    
    # for the current iteration, i, reference the vale from the 'years' array we defined above
    year = years[i]
    
    # print out what year we're currently working with
    print(year)
    
    # Create figure and plot space
    fig, ax = plt.subplots(figsize=(15, 7.5))

    # subset in time based on the current year
    aws18_daily_sub = aws18_daily.loc[year+'-01':year+'-12']

    # shade areas where there's fohn-induced melt
    bg_steps = ax.fill_between(aws18_daily_sub.index.values,
                           aws18_daily_sub['fohn_melt'].astype(int),
                           label="FohnDA melt",
                           facecolor='steelblue',
                           step="mid",
                           linewidth=0,
                           alpha=0.6,
                           zorder=-1)

    # create a second y-axis to plot the melt data on
    ax2 = ax.twinx()
    melt_steps = ax2.fill_between(aws18_daily_sub.index.values,
                    aws18_daily_sub['Melt rate [mm w.e.] (surface melt, within dt)'],
                    label="Melt rate",
                    color="indianred",
                    step="mid",
                    zorder=1)

    # set some chart and axis properties
    ax.set_ylim((-0.05, 1))
    ax.yaxis.grid(False)
    ax.yaxis.set_ticklabels([])
    ax.yaxis.set_ticks([])
    ax2.yaxis.grid(False)
    ax2.yaxis.set_label_position("left")
    ax2.yaxis.tick_left()

    # Set title and labels for axes
    ax2.set(ylabel="Melt rate [mm w.e.]")

    # Define the date format
    date_form = DateFormatter("%b-%y")
    ax.xaxis.set_major_formatter(date_form)

    plt.show()