# Examining ice sheet data in python with `Pandas`
## Part 2: Exploring AWS18 Data

### Overview

In this notebook, we'll:
1. Explore reading a csv file to create a `pandas.DataFrame`
2. Have `pandas` index the DataFrame based on a date/time field and interpret the dates
3. Use `pandas` functions to summarize, plot, and subset data.
4. Create new fields in our `DataFrame` by assessing whether particular criteria are met in a given row. 


Data source: https://doi.pangaea.de/10.1594/PANGAEA.910480

Local file: `data/IMAU_aws18_high-res_meteo_daily.csv`

These data are daily means generated from higher-resolution (2-hourly) AWS observations and SEB model results from AWS18 on Larsen Ice Shelf. They were previously used this semester in Assignment 2.

#### To start, import some python packages and specify some options:

In [None]:
# for data reading
import pandas as pd

# for plotting
import matplotlib.pyplot as plt

# seaborn adds some extra visual appeal to our plots
import seaborn as sns

# set some universal plot settings here
plt.rcParams["figure.dpi"] = 200 # default plot dpi
sns.set_style('darkgrid') # see: https://seaborn.pydata.org/tutorial/aesthetics.html
sns.set_context("notebook", font_scale=0.65) 
%config InlineBackend.figure_format = 'retina' # make high res plots for high res displays

#### Let's check out the data before we read it:


In [None]:
# First, tell Jupyter where to find the csv data:
aws18_datafile = 'data/IMAU_aws18_high-res_meteo_daily.csv'

# and let's explore quickly before reading with pandas
!head $aws18_datafile

#### We can see that the data are separated by commas and have a Date/Time column, so let's read the file and use that as an index column. 

Let's also set tell pandas to read the Date/Time column as actual dates, not just a dumb index using `parse_dates=['Date/Time']`

In [None]:
aws18_df = pd.read_csv(aws18_datafile, index_col=['Date/Time'], parse_dates=['Date/Time'])

In [None]:
# what does the top of this dataframe object look like?
aws18_df.head()

In [None]:
# What about the end?
aws18_df.tail()

#### We can also use the .info() function to get some, uh, info about the data 

In [None]:
aws18_df.info()

#### Easy! Let's generate some basic summary information for the data:

In [None]:
aws18_df.describe()

#### *What do you think about the range of temperatures at AWS18?*

**

#### But what about those "..." columns?

For extra wide datasets, we need to tell pandas to show us all the data, if that's what we want:

In [None]:
pd.set_option('display.max_columns', None)
aws18_df.describe()

From here on out, we'll get all columns!

#### Let's make a quick plot of 2 meter temperature and surface temperature to see how these relate:

In [None]:
aws18_df['TTT [°C] (at 2m height)'].plot(ylabel='Temperature [°C]')
aws18_df['Surf temp [°C] (modelled)'].plot()

#### *This shows us how the skin (surface) temperature is limited to 0°C. Why?*


And a result, in summer there is typically a near-surface **temperature inversion**: temperature increases with height above the surface.
    
It's also interesting to see that air temperatures > 0°C are an imperfect indicator of surface melt. 
* Note the times that T2m > 0°C, yet the skin temperature is < 0 °C (i.e., not melting despite "warm" air).



***

## What if we wanted to quickly assess what the conditions are when melt is happening?

We can select only the data when melt is occurring according to the SEB model, and show the descriptive statistics for these hours quite easily.

#### To do this, we want to use the `pandas.DataFrame.loc` function:

* Docs here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
* Good examples here: https://www.earthdatascience.org/courses/use-data-open-source-python/use-time-series-data-in-python/date-time-types-in-pandas-python/subset-time-series-data-python/

In [None]:
aws18_df.loc[aws18_df['Melt rate [mm w.e.] (surface melt, within dt)']>0].describe()

#### *How many days of melt occurred out of the total number?*

#### *Take a look at 2m temperature now. How does this look?*

#### *What about the mean radiative and turbulent fluxes?*

**


### What if we were interested in subsetting by time instead? 

#### Indexing based on our Date/Time index field is easy using `pandas.DataFrame.loc`

For example, what if we just wanted data for summer 2017/18 (December 2017, January 2018, February 2018)?    

In [None]:
aws18_df.loc['2017-12':'2018-02']

### Let's get the descriptive stats now for that interval:

In [None]:
aws18_df.loc['2017-12':'2018-02'].describe()

## How about selecting all Decembers? 

Pandas understands time operators, and since we have a date/time index and told pandas to parse the dates when we read the csv file, we can query for index values based on dates/times:

In [None]:
aws18_df.loc[aws18_df.index.month==12]

### What if we wanted to select multiple months, like all DJFs?

One way is to select data where the month index is equal to 12 or 1 or 2.

In python, `|` is a 'bitwise operator' meaning OR:
* https://www.w3schools.com/python/python_operators.asp

In [None]:
# this creates a new dataframe containing a subset of aws18_df where the index month is in D,J, or F.
aws18_djf_df = aws18_df[(aws18_df.index.month==12) | (aws18_df.index.month==1) | (aws18_df.index.month==2)]
aws18_djf_df

In [None]:
# what's it look like?
aws18_djf_df.describe()

### How's summer compare to winter?

Let's create a JJA subset and then difference the descriptive statistics:

In [None]:
aws18_jja_df = aws18_df[(aws18_df.index.month==6) | (aws18_df.index.month==7) | (aws18_df.index.month==8)]
aws18_jja_df.describe()

In [None]:
aws18_jja_df.describe() - aws18_djf_df.describe()

What if we didn't want all of those stats, but we just wanted to know what the mean difference was between those two seasons?

In [None]:
aws18_jja_df.mean() - aws18_djf_df.mean()

#### *So what have we learned from looking at AWS18 data so far?*

It's colder in winter: on average 17.5°C colder. 

But the maximum (daily mean) temperature in JJA is quite high! +5.56°C (42°F) in the dark of the polar night! Only 0.74°C colder than the maximum mean daily temperature during summer. 

This is the fohn effect, which we'll explore in Part 3 of our introduction to Pandas. 

***

## Create a new fields in our DataFrame to track and compare summver vs non-summer melt events.

How much melt occurs in winter? And what are the conditions like when winter melt occurs?
Let's create a new boolean (True/False) field named non_summer_melt to track this.

It should be True if: 
1. Melt > 0
2. Month > 2 and < 12 (Month is between March and November)

In [None]:
aws18_df['non_summer_melt'] = (aws18_df['Melt rate [mm w.e.] (surface melt, within dt)']>0) \
                        & (aws18_df.index.month>2) \
                        & (aws18_df.index.month<12)

In [None]:
aws18_df.loc[aws18_df['non_summer_melt']==True].describe()

What if we just wanted to know the number of non-summer melt days? For that, we can call the `.value_counts` property:

In [None]:
aws18_df['non_summer_melt'].value_counts()

Now let's keep track summer melt by creating a summer_melt field in our DataFrame.

This should be True if:
1. Melt > 0
2. Month < 3 or > 11 (Month is between December and February)

Note that we'll want to place the two 'or' criteria within parenthases so that both are evaluated at the same time.  

In [None]:
aws18_df['summer_melt'] = (aws18_df['Melt rate [mm w.e.] (surface melt, within dt)']>0) \
                        & ((aws18_df.index.month<3) | (aws18_df.index.month>11))

In [None]:
aws18_df['summer_melt'].value_counts()

So we can see that 263 days in DJF melt compared to 144 in any other month, or about ~65% of melt occurs in summer. That's surprisingly low!

For a sanity check, we can double-check that we've accounted for all melt days (should be == 144 + 263) as follows:

In [None]:
(aws18_df['Melt rate [mm w.e.] (surface melt, within dt)']>0).value_counts()

OK great, our trackers for non-summer melt get the same number of melt days as when we query the whole dataset for any days with melt.

***

## Let's take a step back, and think about the radiative effects of clouds.

1. Let's subset the data to a single year: 2016
2. Smooth the data using a 10-day rolling mean to make them easier to interpret.
3. Make a single plot with SWnet, LWnet, and Rnet on one axis, with Cloud coverage on a second axis.

In [None]:
# subset and then smooth
aws18_df_subset = aws18_df.loc['2016-01-01':'2016-12-31']
aws18_df_subset = aws18_df_subset.rolling(10, center=True).mean()

# Create figure and plot space
fig, ax = plt.subplots(figsize=(10, 5))

# Plot data (here we're also assigning variables to our plots so that we can reference them later on)
swnet = ax.plot(aws18_df_subset['Net SW [W/m**2]'], label="SWnet", color="darkorange")

lwnet = ax.plot(aws18_df_subset['Net LW [W/m**2]'], label="LWnet", color="indianred")

rnet = ax.plot(aws18_df_subset['NET [W/m**2]'], label="Rnet", color="dimgrey")

# Add additional data to a second y axis
ax2 = ax.twinx() # make second axis
ax2.grid(False) # turn its grid lines off

clouds = ax2.plot(aws18_df_subset['Cloud cov [%]'], label="Cloud cover", color="steelblue")

# Set title and labels for axes
ax.set(ylabel = 'Energy flux [W/m**2]', title='AWS18')
ax2.set(ylabel = 'Cloud Cover [%]')

# create a combined legend (needed because we have 2 y-axes)
lines = swnet+lwnet+rnet+clouds # create list of all lines
labs = [l.get_label() for l in lines] # loop through the list of lines and get their labels
legend = ax.legend(lines, labs, loc='upper right', ncol=2) # plot the legend

plt.show()

*** 

# Practice sessions

#### Question 1: Are the winter peaks in clouds associated with warm air advection?

Assess this by plotting 10-day rollng means of T2m and cloud cover for this year.

Recall:
* That we can subset a dataset that is indexed based on a parsed date field using `.loc['':'']` as shown above.
* Also, that in part 1 of this intro to Pandas, we called the `DataFrame.rolling` method and applied the `.mean()` function to calculate rolling means of data.
* Lastly, that we can create a second y axis on which to plot data using `ax2 = ax.twinx()`

In [None]:
# Add your code here 

#### Question 2: What's different when it's extremely cloudy vs when it's mostly clear?
1. Calculate the 5% and 95% percentile values for clouds. (hint: use the `DataFrame.quantile` function)
2. Subset the data based on these values
3. Calculate the difference in the descriptive statistics.


In [None]:
# Add your code here 

#### Question 3: Is there a seasonal effect on the effect of clouds on the surface meteorology?
1. Create two subsets of data: DJF of all years and JJA of all years. Then do the same as above:
    1. Calculate the 5% and 95% percentile values for clouds. (hint: use the `DataFrame.quantile` function)
    2. Subset the data based on these values
    3. Calculate the difference in the descriptive statistics in summer vs winter.


In [None]:
# Add your code here 