# Interactive Charts

This tutorial is the third tutorial in a series of tutorials intended to walk you through downloading, accessing, processing, and visualizing the National Snow and Ice Data Center's (NSIDC) Snow Today's (https://nsidc.org/reports/snow-today) snow cover and albedo datasets. If you haven't already downloaded and explored Snow Today's snow fraction and albedo datasets, please visit the first tutorial, ST_01_Downloading_and_Exploring_Snow_Data. 
Included in this tutorial are steps to: 

1. Calculate snow cover extent for each day of each year in the snow cover datasets; 

2. Calculate the average albedo percentages for day of each year of the albedo datasets;

3. Create interactive graphs comparing annual snow cover extent; and  

4. Create interactive graphs comparing annual albedo percentages.  



## Table of Contents

#### [Snow Cover](#bullet1)
* [Packages](#package1)
* [Process Snow Data](#bullet2)
* [Calculate Snow Cover Extent](#bullet3)
* [Preprocessed Snow Data](#bullet4)
* [Statistical Analysis](#Bullet5)
* [Prepare Visualization](#Bullet6)
* [Create Interactive Figure](#bullet7)

#### [Albedo](#bullet8)
* [Packages](#package2)
* [Process Albedo Data](#bullet9)
* [Calculate Albedo Percentage](#bullet10)
* [Preprocessed Albedo Data](#bullet11)
* [Statistical Analysis](#Bullet12)
* [Prepare Visualization](#Bullet13)
* [Create Interactive Figure](#bullet14)


# Snow Cover <a class="anchor" id="bullet1"></a>


### Packages<a class="anchor" id="package1"></a> 

In [18]:
import glob
import os
from osgeo import gdal
import numpy as np
import pandas as pd
import plotly.graph_objects as go

### Glob the Datasets

In [19]:
# Glob will string together a list that matches the unique keys provided.
# In this case, we're creating a string of all of our snow cover datasets. 
snow_ds = glob.glob('snow_cover/*.h5')
snow_ds

['snow_cover\\Sierra2001.h5',
 'snow_cover\\Sierra2002.h5',
 'snow_cover\\Sierra2003.h5',
 'snow_cover\\Sierra2004.h5',
 'snow_cover\\Sierra2005.h5',
 'snow_cover\\Sierra2006.h5',
 'snow_cover\\Sierra2007.h5',
 'snow_cover\\Sierra2008.h5',
 'snow_cover\\Sierra2009.h5',
 'snow_cover\\Sierra2010.h5',
 'snow_cover\\Sierra2011.h5',
 'snow_cover\\Sierra2012.h5',
 'snow_cover\\Sierra2013.h5',
 'snow_cover\\Sierra2014.h5',
 'snow_cover\\Sierra2015.h5',
 'snow_cover\\Sierra2016.h5',
 'snow_cover\\Sierra2017.h5',
 'snow_cover\\Sierra2018.h5',
 'snow_cover\\Sierra2019.h5']

## Process Snow Data<a class="anchor" id="bullet2"></a>

Before we're able to calculate snow cover extent, there are few data wrangling steps we need to take. First, we need to pull the divisor (100) from the metadata. Next, we need to convert all NA values (255) to 0s so that we don't include those values in our calculations. Last, we need to calculate the size of each cell in dataset. Recall in Tutorial_01 that each cell is 500 meters by 500 meters in size. This information is stored in the data's referencing matrix. Let's pull this information out for later use.

In [53]:
# Get sub datasets of first snow fraction dataset.
datasets = gdal.Open(snow_ds[0], gdal.GA_ReadOnly).GetSubDatasets()

snow_divisor = int(gdal.Open(snow_ds[0], gdal.GA_ReadOnly).GetMetadata()['Grid_MODIS_GRID_500m_snow_fraction_divisor'])

# (datasets[3] is to choose the 4th dataset in the subdirectory (i.e., snow fraction). 
# The second bracket [0] is needed to open the dataset.
snow_data = gdal.Open(datasets[3][0])

# Changes the selected dataset into an array.
snow_data_array = snow_data.ReadAsArray()

# Converts the variables to 'float' to allow us to convert the NA values (255) to 0
snow_data_float=snow_data_array.astype('float')
snow_data_float[snow_data_float == 255] = 0

# Divide snow data by the divisor(100)
snow_data_float = snow_data_float / snow_divisor


# Pull referencing matrix from the h5 file.

ref_matrix_meta = snow_data.GetMetadata()['Grid_MODIS_GRID_500m_ReferencingMatrix'].split()
referencing_matrix = [int(ref_matrix_meta[2]), int(ref_matrix_meta[1]), int(ref_matrix_meta[0]), int(ref_matrix_meta[5]), int(ref_matrix_meta[4]), int(ref_matrix_meta[3])]


### 

## Calculate Snow Cover Extent<a class="anchor" id="bullet3"></a>

Now we're ready to calculate snow cover extent. But before going through everyday of every year, let's start with a single day. Notice that we're multiplying one of the cell sizes by -1. This is because we want to flip the sign from -, as it is in the referencing matrix, to positive, to get the absolute area of snow cover. Last, we're going to divide our results by 10^6 then by 1000 to get our results in x per 1000 km^2. 


In [25]:
# Taking the mean average of a single day in the dataset. 
# Multiply by transform_data[1] and -transform_data[5] (since pixel size is [500m] * [-500m], but we are only interested in absolute area) to get the mean area * percentage snow fraction area covered in m^2
# Divide by 10**6 to get the area in km^2
# Divide by 1000 to make the values consistent with output of Snow Today website (They report values in thousands of km^2)   
test = snow_data_float[200,:,:] * referencing_matrix[1] * -referencing_matrix[5] / 10**6 /1000
np.sum(test)

39.62034250000001

Here we see that for day 200 of the 2001 water year, there was approximately 39.62 *1000 km^2 of snow present in the Sierras. 

## Calculate Snow Cover Extent Per Year

### **Note** 
The next step is memory intensive and may take several minutes to run. Therefore, we've provide the results in a .csv document that can be called in the code chunks below. If you'd like to run the code yourself, unhash the code. 

In [28]:
# # Create an empty list to populate yearly for loop values with
# total_snow_cover_list = []

# # Choose the range of the data, which is dictated by how many files you globbed together.
# for i in range(len(snow_ds)): 

#     # Cycle through each year in the dataset
#     datasets = gdal.Open(snow_ds[i], gdal.GA_ReadOnly).GetSubDatasets()

#     # (datasets[3] is to choose the 4th dataset in the subdirectory (i.e., snow fraction). 
#     # The second bracket [0] is needed to open the dataset.
#     snow_data = gdal.Open(datasets[3][0])

#     # Converts the data into an array.
#     snow_data_array = snow_data.ReadAsArray()

#     # Converts the array to 'float' to allow us to convert the NA values (255) to 0
#     snow_data_float=snow_data_array.astype('float')
#     snow_data_float[snow_data_float == 255] = 0

#     # Divide by the divisor
#     snow_data_float = snow_data_float / snow_divisor

#     #Create an empty list to populate daily mean values per year with
#     new_list = []
    
#     #Takes the sum of the average snow fraction area per cell per given day in a year
#     for j in range(len(snow_data_float)):
        
#         total_snow_cover = snow_data_float[j,:,:] * referencing_matrix[1] * -referencing_matrix[5] / 10**6 /1000
#         total_snow_cover_year = np.sum(total_snow_cover)

#         #Appends daily values to list
#         new_list.append(total_snow_cover_year)

#     #Appends yearly values to list. 
#     total_snow_cover_list.append(new_list)

### Convert For Loop to DataFrame/Add "Names" for Rows and Columns

Now that we've processed our data, we can convert our data into a dataframe and add labels for days and years. 

In [29]:
# # Create empty lists for row names and col names
# row_names = []
# col_names = []

# # Create names for each day
# for i in range(366) :


#     col_names.append(str(i + 1))

# # Create names for year 
# for j in range(len(snow_ds)):

#     row_names.append(str(j + 2001))

# #Create dataframe and append with names. 
# snow_cover_df = pd.DataFrame(total_snow_cover_list, columns= col_names, index = row_names)

# # Transpose dataframe so years are columns and days are rows.
# snow_cover_df = pd.DataFrame.transpose(snow_cover_df)

## Preprocessed Snow Data<a class="anchor" id="bullet4"></a>


### **Note**


If you're skipping the above code chunks, run code from here down.

In [32]:
# snow_cover_df.to_csv('snow_cover_df.csv')
snow_cover_df = pd.read_csv(r'snow_cover_df.csv')

#Lets see what the dataset looks like
snow_cover_df

Unnamed: 0.1,Unnamed: 0,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,1,0.130207,0.024198,0.025737,0.025777,1.472098,0.119457,0.105140,0.093743,0.863630,0.097943,2.278673,1.131415,0.042470,0.041330,0.058185,0.489297,0.306940,0.027043,0.024198
1,2,0.107577,0.024215,0.027310,0.027137,1.304238,0.123240,0.097897,0.105487,1.057692,0.296538,2.563883,2.279045,0.041445,0.040458,0.053297,0.632573,0.287330,0.028850,0.024198
2,3,0.105295,0.024215,0.040040,0.032185,1.152065,0.130738,0.093450,0.137890,1.214685,0.524555,2.856710,3.975513,0.046667,0.040135,0.048095,0.844257,0.260620,0.031552,0.024198
3,4,0.152932,0.024217,0.055735,0.042260,0.977498,0.135002,0.101925,0.176535,1.310170,0.692208,3.091442,5.586023,0.057805,0.044590,0.043798,1.022547,0.227678,0.034243,0.024198
4,5,0.320527,0.024217,0.073467,0.052835,0.770540,0.138337,0.137552,0.211662,1.327885,0.780418,3.226008,7.094418,0.081010,0.090687,0.040350,1.125727,0.187615,0.036480,0.024198
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
361,362,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198
362,363,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198
363,364,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198
364,365,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198,0.024198


## Statistical Analysis <a class="anchor" id="bullet5"></a>

Now that we have a dataframe of total snow cover area per day per year, let's perform some basic statistical analyses that we can display on our interactive chart. 


First, we'll need to convert NA Values with 0s so that all days are valid in our calculations 

In [34]:
#Now need to convert NA Values for non-leap years
snow_cover_df = snow_cover_df.fillna(0)

### Calculate Interquartile Ranges and Average Snow Cover 

Next, we'll calculate the 25th, 50th (median), and 75th interquartile range. We'll use these three values to create a selectable legend icon on a our chart. Luckily, numpy has an easy-to-use function 'np.percentile' that can calculate these values for us. We'll also calculate average snow cover by taking the mean of row. Again, this will be a selectable visualization for the interactive chart. 

In [38]:
# Create empty list to input with for loop
IQR_25 = []
IQR_75 = []
IQR_50 = []
days = []
for i in range(len(snow_cover_df)): 
    #Takes the IQR of each day (25, 50, 75)
    Q1 = np.percentile(snow_cover_df.iloc[i], 25)
    Q2 = np.percentile(snow_cover_df.iloc[i], 50)
    Q3 = np.percentile(snow_cover_df.iloc[i], 75)
    #appends list with IQR outputs
    IQR_25.append(Q1)
    IQR_50.append(Q2)
    IQR_75.append(Q3)
    #Creates day list to append dataset with
    days.append(i + 1)
    
# Next, need to create a single column of mean values. 
snow_cover_df['Average Snow Cover'] = snow_cover_df.mean(axis = 1)

In [39]:
#Appends list for loop lists
snow_cover_df['IQR_25'] = IQR_25
snow_cover_df['IQR_75'] = IQR_75
snow_cover_df['IQR_50'] = IQR_50
snow_cover_df['days'] = days

## Prepare the Visualization <a class="anchor" id="bullet6"></a>

Now that we've calculated the interquartile ranges and averages, we need to create tick marks to put each month of the water year on. 

In [40]:
month_day = [31, 30, 31, 31, 28, 31, 30, 31, 30, 31, 31, 30]
new_list = []

j = 0 
for i in range(0,len(month_day)):
    j+=month_day[i]
    new_list.append(j)
     
print(new_list)

[31, 61, 92, 123, 151, 182, 212, 243, 273, 304, 335, 365]


Next, we need to create a list of years to graph. The for loop below creates a variable name for each year. The 'legendrank' dictates where on the legend each variable will be found. Since we want to display the most recent year first (2019), we'll have that be legendrank one. 

In [36]:
# Create a list of years to graph. legend rank allows lets you order where the lines are located on the chart. 
for i in range(len(snow_ds)):
    print("""go.Scatter("""
        """name = '""" + str(i + 2001) + """', """
        """y = snow_cover_df['"""+ str(i + 2001) + """'], x = snow_cover_df['days'], """
        """mode = 'lines', legendrank = """ + str(19-i) + """),"""
    )

go.Scatter(name = '2001', y = snow_cover_df['2001'], x = snow_cover_df['days'], mode = 'lines', legendrank = 19),
go.Scatter(name = '2002', y = snow_cover_df['2002'], x = snow_cover_df['days'], mode = 'lines', legendrank = 18),
go.Scatter(name = '2003', y = snow_cover_df['2003'], x = snow_cover_df['days'], mode = 'lines', legendrank = 17),
go.Scatter(name = '2004', y = snow_cover_df['2004'], x = snow_cover_df['days'], mode = 'lines', legendrank = 16),
go.Scatter(name = '2005', y = snow_cover_df['2005'], x = snow_cover_df['days'], mode = 'lines', legendrank = 15),
go.Scatter(name = '2006', y = snow_cover_df['2006'], x = snow_cover_df['days'], mode = 'lines', legendrank = 14),
go.Scatter(name = '2007', y = snow_cover_df['2007'], x = snow_cover_df['days'], mode = 'lines', legendrank = 13),
go.Scatter(name = '2008', y = snow_cover_df['2008'], x = snow_cover_df['days'], mode = 'lines', legendrank = 12),
go.Scatter(name = '2009', y = snow_cover_df['2009'], x = snow_cover_df['days'], mode = '

We'll copy and paste these in the code chunk below. The rest we'll have to be done manually. With a little tweaking, we'll be left an interactive graph display snow cover! 

## Create the Interactive Figure<a class="anchor" id="bullet7"></a>

In [41]:
#Plot the figure. 
fig = go.Figure([

#create median line
go.Scatter(
    #Name that appears on legend
    name = 'Median',
    # y-dim
    y = snow_cover_df['IQR_50'],
    # x-dim
    x = snow_cover_df['days'],
    # type of plot
    mode = 'lines',
    # Include to select/deselect multiple variables at once
    legendgroup = 'IQR',
    # Name of legend group on legend
    legendgrouptitle_text="<b>Interquartile Range</b>",
    # Legend position
    legendrank = 20,
    # Line color
    line=dict(color='rgb(31, 119, 180)'),
),
#Create IQR 75 line
go.Scatter(
        name = 'IQR 75',
        y = snow_cover_df['IQR_75'],
        x = snow_cover_df['days'],
        mode='lines',
        marker=dict(color="#444"),
        line=dict(width=0),
        legendgroup = 'IQR',
        # Here we 'hide' the name from appearing on the legend since it's lumped in with the legendgroup 'IQR'
        showlegend = False
    ),
    #Create IQR 25 fill color
    go.Scatter(
        name='IQR 25',
        y = snow_cover_df['IQR_25'],
        x = snow_cover_df['days'],
        marker=dict(color="#444"),
        line=dict(width=0),
        mode='lines',
        fillcolor='rgba(68, 68, 68, 0.3)',
        fill='tonexty',
        legendgroup = 'IQR',
        showlegend = False
    ),
    #Create mean line
    go.Scatter(
        name = 'Average Snow Cover',
        y = snow_cover_df['Average Snow Cover'],
        x = snow_cover_df['days'],
        mode = 'lines',
        legendgroup = 'Average',
        legendgrouptitle_text = '<b>Average</b>',
        legendrank = 21
    ),

#Create lines for each respective year
go.Scatter(name = '2001', y = snow_cover_df['2001'], x = snow_cover_df['days'], mode = 'lines', legendrank = 19),
go.Scatter(name = '2002', y = snow_cover_df['2002'], x = snow_cover_df['days'], mode = 'lines', legendrank = 18),
go.Scatter(name = '2003', y = snow_cover_df['2003'], x = snow_cover_df['days'], mode = 'lines', legendrank = 17),
go.Scatter(name = '2004', y = snow_cover_df['2004'], x = snow_cover_df['days'], mode = 'lines', legendrank = 16),
go.Scatter(name = '2005', y = snow_cover_df['2005'], x = snow_cover_df['days'], mode = 'lines', legendrank = 15),
go.Scatter(name = '2006', y = snow_cover_df['2006'], x = snow_cover_df['days'], mode = 'lines', legendrank = 14),
go.Scatter(name = '2007', y = snow_cover_df['2007'], x = snow_cover_df['days'], mode = 'lines', legendrank = 13),
go.Scatter(name = '2008', y = snow_cover_df['2008'], x = snow_cover_df['days'], mode = 'lines', legendrank = 12),
go.Scatter(name = '2009', y = snow_cover_df['2009'], x = snow_cover_df['days'], mode = 'lines', legendrank = 11),
go.Scatter(name = '2010', y = snow_cover_df['2010'], x = snow_cover_df['days'], mode = 'lines', legendrank = 10),
go.Scatter(name = '2011', y = snow_cover_df['2011'], x = snow_cover_df['days'], mode = 'lines', legendrank = 9),
go.Scatter(name = '2012', y = snow_cover_df['2012'], x = snow_cover_df['days'], mode = 'lines', legendrank = 8),
go.Scatter(name = '2013', y = snow_cover_df['2013'], x = snow_cover_df['days'], mode = 'lines', legendrank = 7),
go.Scatter(name = '2014', y = snow_cover_df['2014'], x = snow_cover_df['days'], mode = 'lines', legendrank = 6),
go.Scatter(name = '2015', y = snow_cover_df['2015'], x = snow_cover_df['days'], mode = 'lines', legendrank = 5),
go.Scatter(name = '2016', y = snow_cover_df['2016'], x = snow_cover_df['days'], mode = 'lines', legendrank = 4),
go.Scatter(name = '2017', y = snow_cover_df['2017'], x = snow_cover_df['days'], mode = 'lines', legendrank = 3),
go.Scatter(name = '2018', y = snow_cover_df['2018'], x = snow_cover_df['days'], mode = 'lines', legendrank = 2),
go.Scatter(name = '2019', y = snow_cover_df['2019'], x = snow_cover_df['days'], mode = 'lines', legendrank = 1)

])


# Can change default "off" variables. Right now, the only variable visible is year_2019 and IQR
variables_to_hide = [
'2001',
'2002',
'2003',
'2004',
'2005',
'2006',
'2007',
'2008',
'2009',
'2010',
'2011',
'2012',
'2013',
'2014',
'2015',
'2016',
'2017',
'2018',
'Average Snow Cover']
fig.for_each_trace(lambda trace: trace.update(visible="legendonly") 
                   if trace.name in variables_to_hide else ())

fig.update_layout(
    title = "<b> Annual Snow Cover Area: Sierra Nevada Region </b> <br> <sup>2001-2019</sup></br>",
    legend_title="<b>Year</b>",
    autosize=False,
    width=1200,
    height=700,
    template = 'none',
    font=dict(
        size=16),
xaxis = dict(
        tickmode = 'array',
        tickvals = [1, 31, 61, 92, 123, 151, 182, 212, 243, 273, 304, 335, 365],
        ticktext = ['<b>October</b>', '<b>November</b>', '<b>December</b>', '<b>January</b>', '<b>February</b>', '<b>March</b>', '<b>April</b>', '<b>May</b>', 
        '<b>June</b>', '<b>July', '<b>August</b>', "<b>September</b>", "<b>October</b>"],
        tickfont = dict(size=12))
)

fig.update_xaxes(title_text = "", gridcolor = 'lightgrey', gridwidth = 0.1)
fig.update_yaxes(title_text = "<b> Area (Thousands of Square Kilometers) </b>", 
    title_font = {"size": 15}, gridcolor = 'lightgrey', gridwidth = 0.1)

fig.show()

If you'd like to save this chart as an html widget, unhash the code chunk below. 

In [17]:
# fig.write_html('snow_cover_figure.html')

# Albedo<a class="anchor" id="bullet8"></a>

The process for creating an interactive visualization of albedo is very similar to that of snow cover. Once again, due to memory and time constraints, we'll have the preprocessed data in a .csv file that you can read in. 


## Packages <a class="anchor" id="package2"></a>

In [1]:
import glob
import os
from osgeo import gdal
import numpy as np
import pandas as pd
import plotly.graph_objects as go

### Glob the Datasets

In [14]:
# Glob together all of the albedo datasets.
albedo = glob.glob('albedo/*.h5')
albedo

['albedo\\SierraAlbedo2001.h5',
 'albedo\\SierraAlbedo2002.h5',
 'albedo\\SierraAlbedo2003.h5',
 'albedo\\SierraAlbedo2004.h5',
 'albedo\\SierraAlbedo2005.h5',
 'albedo\\SierraAlbedo2006.h5',
 'albedo\\SierraAlbedo2007.h5',
 'albedo\\SierraAlbedo2008.h5',
 'albedo\\SierraAlbedo2009.h5',
 'albedo\\SierraAlbedo2010.h5',
 'albedo\\SierraAlbedo2011.h5',
 'albedo\\SierraAlbedo2012.h5',
 'albedo\\SierraAlbedo2013.h5',
 'albedo\\SierraAlbedo2014.h5',
 'albedo\\SierraAlbedo2015.h5',
 'albedo\\SierraAlbedo2016.h5',
 'albedo\\SierraAlbedo2017.h5',
 'albedo\\SierraAlbedo2018.h5',
 'albedo\\SierraAlbedo2019.h5']

## Process Albedo Data<a class="anchor" id="bullet9"></a>

Similar to snow cover, we'll need to process the albedo data before calculating our daily averages. To start, we'll subset out all NA values. Next, we'll divide by the divisor. Since the divisor is not easily accessible in the data's metadata, we'll hard code the variable in.  We're interested in calculating the average albedo of snow only when snow is present. As such, we'll only be calculating the daily non-zero average of albedo. 

Before we do each day, let's try with a single day from a single year.

In [42]:
# Opens the first dataset in the albedo glob list. 
dataset = gdal.Open(albedo[0], gdal.GA_ReadOnly)

# Albedo divisor 
albedo_divisor = 10000

# Changes the selected dataset into an array.
albedo_array = dataset.ReadAsArray()

# Converts data to float
albedo_float = albedo_array.astype('float')

# Converts na values to nans 
albedo_float[albedo_float == 65535] = np.nan

# Divide by the albedo divisor
albedo_float = albedo_float / albedo_divisor

# Select the 20th day in the dataset
albedo_year_2001_day_20 = albedo_float[20:,:]

np.nanmean(albedo_year_2001_day_20)

0.7535675744916424

Here we see that for day 20 of the 2001 water year, there was approximately 75% albedo reflectance for present snow in the Sierras. 

## Calculate Albedo Percentages <a class="anchor" id="bullet10"></a>

### **Note** 

The next step is memory intensive and may take several minutes to hours to run, depending on the specifications of your system. Therefore, we've provide the results in a .csv document that can be called in the code chunks below. If you'd like to run the code yourself, unhash the code. 

In [None]:
# # Create an empty list to populate yearly for loop values with
# mean_albedo = []
# albedo_divisor = 10000

# for i in range(len(albedo)):
#     dataset = gdal.Open(albedo[i], gdal.GA_ReadOnly)

#     # Changes the selected dataset into an array.
#     albedo_array = dataset.ReadAsArray()

#     # Converts data to 'float'
#     albedo_float = albedo_array.astype('float')
    
#     # Converts na values to nans
#     albedo_float[albedo_float == 65535] = np.nan

#     # Divide data by the divisor 
#     albedo_float = albedo_float/albedo_divisor 
    
#     #Create an empty list to populate daily mean values per year with
#     new_list = []
    
#     #Takes the sum of the average snow fraction area per cell per given day in a year
#     for j in range(len(albedo_float)):

#         # Selects the first day in the dataset    
#         albedo_day = albedo_divisor[j:,:]
        
#         # Takes the non-zero average of albedo for the selected day
#         albedo_day_mean = np.nanmean(albedo_day) 
        
#         #Appends daily values to list
#         new_list.append(albedo_day_mean)

#     #Appends yearly values to list. 
#     mean_albedo.append(new_list)

In [None]:
# # Create empty lists for row names and col names
# row_names = []
# col_names = []
# # Create names for each day
# for i in range(366) :
#     col_names.append('Day ' + str(i + 1))
# # Create names for year 
# for j in range(len(albedo)):
#     row_names.append(str(j + 2001))
# #Create dataframe and append with names. 
# albedo_df = pd.DataFrame(mean_albedo, index = row_names)
# #Transpose dataframe so years are columns and days are rows.
# albedo_df = pd.DataFrame.transpose(albedo_df)

## Preprocessed Albedo Data <a class="anchor" id="bullet11"></a>


### **Note**


If you're skipping the above code chunks, run code from here down.

In [44]:
#albedo_df.to_csv('albedo_rates.csv')
albedo_df = pd.read_csv(r'albedo_df.csv')

## Adding Statistical Analysis to the DataFrame<a class="anchor" id="bullet12"></a>

Now that we have a dataframe of the daily mean of albedo, let's perform some basic statistical analyses that we can display on our interactive chart. 


First, we'll need to convert NA Values with 0s so that all days are valid in our calculations 

In [45]:
#Now need to convert NA Values for non-leap years
albedo_df = albedo_df.fillna(0)
albedo_df


Unnamed: 0,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,0.753892,0.771315,0.745700,0.772998,0.761415,0.762492,0.762376,0.763493,0.737907,0.775153,0.747911,0.759543,0.769590,0.778952,0.751131,0.781757,0.773072,0.767671,0.778525
1,0.753895,0.771315,0.745699,0.772998,0.761417,0.762490,0.762374,0.763492,0.737901,0.775152,0.747900,0.759521,0.769590,0.778953,0.751130,0.781753,0.773071,0.767671,0.778525
2,0.753898,0.771315,0.745699,0.772997,0.761420,0.762489,0.762373,0.763491,0.737896,0.775148,0.747885,0.759471,0.769590,0.778954,0.751130,0.781749,0.773069,0.767670,0.778525
3,0.753901,0.771315,0.745698,0.772997,0.761422,0.762487,0.762371,0.763490,0.737891,0.775144,0.747869,0.759406,0.769591,0.778955,0.751129,0.781744,0.773068,0.767670,0.778524
4,0.753903,0.771315,0.745697,0.772996,0.761425,0.762486,0.762370,0.763488,0.737888,0.775141,0.747853,0.759344,0.769591,0.778957,0.751129,0.781740,0.773067,0.767670,0.778524
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
361,0.638983,0.632695,0.596917,0.628008,0.611036,0.633663,0.548457,0.623059,0.637681,0.582645,0.625674,0.599029,0.573096,0.652264,0.472690,0.559326,0.621385,0.653106,0.656505
362,0.639221,0.632930,0.597191,0.628249,0.611283,0.633900,0.548762,0.623285,0.637916,0.582924,0.625918,0.599296,0.573377,0.652471,0.473042,0.559621,0.621631,0.653305,0.656730
363,0.639462,0.633166,0.597468,0.628493,0.611532,0.634138,0.549068,0.623513,0.638153,0.583205,0.626163,0.599565,0.573660,0.652680,0.473396,0.559918,0.621880,0.653505,0.656956
364,0.639704,0.633407,0.597746,0.628736,0.611784,0.634379,0.549377,0.623740,0.638391,0.583487,0.626410,0.599836,0.573949,0.652889,0.473752,0.560216,0.622130,0.653706,0.657184


### Calculate Interquartile Ranges and Average Albedo

Next, we'll calculate the 25th, 50th (median), and 75th interquartile range. We'll use these three values to create a selectable legend icon on a our chart. Luckily, numpy has an easy-to-use function 'np.percentile' that can calculate these values for us. We'll also calculate average average albedo for each day by taking the mean of row. Again, this will be a selectable visualization for the interactive chart. 

In [46]:
# Create empty list to input with for loop
IQR_25 = []
IQR_50 = []
IQR_75 = []
days = []
for i in range(len(albedo_df)): 
    #Takes the IQR of each day (25, 50, 75)
    Q1 = np.percentile(albedo_df.iloc[i], 25)
    Q2 = np.percentile(albedo_df.iloc[i], 50)
    Q3 = np.percentile(albedo_df.iloc[i], 75)
    #appends list with IQR outputs
    IQR_25.append(Q1)
    IQR_50.append(Q2)
    IQR_75.append(Q3)
    #Creates day list to append dataset with
    days.append(i + 1)

# Next, need to create a single column of mean values. 
albedo_df['Average Albedo'] = albedo_df.mean(axis = 1)

In [47]:
#Appends list for loop lists
albedo_df['IQR_25'] = IQR_25
albedo_df['IQR_75'] = IQR_75
albedo_df['IQR_50'] = IQR_50
albedo_df['days'] = days

## Prepare the Visualization<a class="anchor" id="bullet13"></a>


As we did with snow cover, we need to create tick marks to put each month of the water year on. 

In [48]:
month_day = [31, 30, 31, 31, 28, 31, 30, 31, 30, 31, 31, 30]
new_list = []

j = 0 
for i in range(0,len(month_day)):
    j+=month_day[i]
    new_list.append(j)
     
print(new_list)

[31, 61, 92, 123, 151, 182, 212, 243, 273, 304, 335, 365]


Again, as with snow cover, we need to create a list of years to graph. The for loop below creates a variable name for each year. The 'legendrank' dictates where on the legend each variable will be found. Since we want to display the most recent year first (2019), we'll have that be legendrank one. 

In [49]:
# Create a list of years to graph. legend rank allows lets you order where the lines are located on the chart. 
for i in range(len(albedo)):
    print("""go.Scatter("""
        """name = '""" + str(i + 2001) + """', """
        """y = albedo_df['"""+ str(i + 2001) + """'], x = albedo_df['days'], """
        """mode = 'lines', legendrank = """ + str(19-i) + """),"""
    )

go.Scatter(name = '2001', y = albedo_df['2001'], x = albedo_df['days'], mode = 'lines', legendrank = 19),
go.Scatter(name = '2002', y = albedo_df['2002'], x = albedo_df['days'], mode = 'lines', legendrank = 18),
go.Scatter(name = '2003', y = albedo_df['2003'], x = albedo_df['days'], mode = 'lines', legendrank = 17),
go.Scatter(name = '2004', y = albedo_df['2004'], x = albedo_df['days'], mode = 'lines', legendrank = 16),
go.Scatter(name = '2005', y = albedo_df['2005'], x = albedo_df['days'], mode = 'lines', legendrank = 15),
go.Scatter(name = '2006', y = albedo_df['2006'], x = albedo_df['days'], mode = 'lines', legendrank = 14),
go.Scatter(name = '2007', y = albedo_df['2007'], x = albedo_df['days'], mode = 'lines', legendrank = 13),
go.Scatter(name = '2008', y = albedo_df['2008'], x = albedo_df['days'], mode = 'lines', legendrank = 12),
go.Scatter(name = '2009', y = albedo_df['2009'], x = albedo_df['days'], mode = 'lines', legendrank = 11),
go.Scatter(name = '2010', y = albedo_df['2010'

We'll copy and paste these in the code chunk below. Like snow cover, the rest we'll have to be done manually. With a little tweaking, we'll be left an interactive graph display albedo! 

## Create the Interactive Figure<a class="anchor" id="bullet14"></a>

In [56]:
#Plot the figure. 
fig2 = go.Figure([

#create median line
go.Scatter(
    name = 'Median',
    y = albedo_df['IQR_50'],
    x = albedo_df['days'],
    mode = 'lines',
    legendgroup = 'IQR',
    legendgrouptitle_text="<b>Interquartile Range</b>",
    legendrank = 20,
    line=dict(color='rgb(31, 119, 180)'),
),
#Create IQR 75 line
go.Scatter(
        name = 'IQR 75',
        y = albedo_df['IQR_75'],
        x = albedo_df['days'],
        mode='lines',
        marker=dict(color="#444"),
        line=dict(width=0),
        legendgroup = 'IQR',
        showlegend = False
    ),
    #Create IQR 25 fill 
    go.Scatter(
        name='IQR 25',
        y = albedo_df['IQR_25'],
        x = albedo_df['days'],
        marker=dict(color="#444"),
        line=dict(width=0),
        mode='lines',
        fillcolor='rgba(68, 68, 68, 0.3)',
        fill='tonexty',
        legendgroup = 'IQR',
        showlegend = False
    ),
    go.Scatter(
        name = 'Average Albedo',
        y = albedo_df['Average Albedo'],
        x = albedo_df['days'],
        mode = 'lines',
        legendgroup = 'Average',
        legendgrouptitle_text = '<b>Average</b>',
        legendrank = 21
    ),

#Create lines for each respective line 
go.Scatter(name = '2001', y = albedo_df['2001'], x = albedo_df['days'], mode = 'lines', legendrank = 19),
go.Scatter(name = '2002', y = albedo_df['2002'], x = albedo_df['days'], mode = 'lines', legendrank = 18),
go.Scatter(name = '2003', y = albedo_df['2003'], x = albedo_df['days'], mode = 'lines', legendrank = 17),
go.Scatter(name = '2004', y = albedo_df['2004'], x = albedo_df['days'], mode = 'lines', legendrank = 16),
go.Scatter(name = '2005', y = albedo_df['2005'], x = albedo_df['days'], mode = 'lines', legendrank = 15),
go.Scatter(name = '2006', y = albedo_df['2006'], x = albedo_df['days'], mode = 'lines', legendrank = 14),
go.Scatter(name = '2007', y = albedo_df['2007'], x = albedo_df['days'], mode = 'lines', legendrank = 13),
go.Scatter(name = '2008', y = albedo_df['2008'], x = albedo_df['days'], mode = 'lines', legendrank = 12),
go.Scatter(name = '2009', y = albedo_df['2009'], x = albedo_df['days'], mode = 'lines', legendrank = 11),
go.Scatter(name = '2010', y = albedo_df['2010'], x = albedo_df['days'], mode = 'lines', legendrank = 10),
go.Scatter(name = '2011', y = albedo_df['2011'], x = albedo_df['days'], mode = 'lines', legendrank = 9),
go.Scatter(name = '2012', y = albedo_df['2012'], x = albedo_df['days'], mode = 'lines', legendrank = 8),
go.Scatter(name = '2013', y = albedo_df['2013'], x = albedo_df['days'], mode = 'lines', legendrank = 7),
go.Scatter(name = '2014', y = albedo_df['2014'], x = albedo_df['days'], mode = 'lines', legendrank = 6),
go.Scatter(name = '2015', y = albedo_df['2015'], x = albedo_df['days'], mode = 'lines', legendrank = 5),
go.Scatter(name = '2016', y = albedo_df['2016'], x = albedo_df['days'], mode = 'lines', legendrank = 4),
go.Scatter(name = '2017', y = albedo_df['2017'], x = albedo_df['days'], mode = 'lines', legendrank = 3),
go.Scatter(name = '2018', y = albedo_df['2018'], x = albedo_df['days'], mode = 'lines', legendrank = 2),
go.Scatter(name = '2019', y = albedo_df['2019'], x = albedo_df['days'], mode = 'lines', legendrank = 1)

])


# Can change default "off" variables. Right now, the only variable visible is year_2019 and IQRs
variables_to_hide = [
'2001',
'2002',
'2003',
'2004',
'2005',
'2006',
'2007',
'2008',
'2009',
'2010',
'2011',
'2012',
'2013',
'2014',
'2015',
'2016',
'2017',
'2018',
'Average Albedo']
fig2.for_each_trace(lambda trace: trace.update(visible="legendonly") 
                   if trace.name in variables_to_hide else ())

fig2.update_layout(
    title = "<b> Annual Snow Albedo Percentages: Sierra Nevada Region </b> <br> <sup>2001-2019</sup></br>",
    legend_title="<b>Year</b>",
    autosize=False,
    width=1200,
    height=700,
    template = 'none',
    font=dict(
        size=16),
    
    # Since most albedo falls between .55 and .8, we'll constrain our y axis to reflect that. 
    yaxis_range =[0.56, .8],
    xaxis_range = [1, 365],
xaxis = dict(
        tickmode = 'array',
        tickvals = [1, 31, 61, 92, 123, 151, 182, 212, 243, 273, 304, 335, 365],
        ticktext = ['<b>October</b>', '<b>November</b>', '<b>December</b>', '<b>January</b>', '<b>February</b>', '<b>March</b>', '<b>April</b>', '<b>May</b>', 
        '<b>June</b>', '<b>July', '<b>August</b>', "<b>September</b>", "<b>October</b>"],
        tickfont = dict(size=12))
)

fig2.update_xaxes(title_text = "", gridcolor = 'lightgrey', gridwidth = 0.1)
fig2.update_yaxes(title_text = "<b> Percentage </b>", 
    title_font = {"size": 15}, gridcolor = 'lightgrey', gridwidth = 0.1)

fig2.show()

If you'd like to save this chart as an html widget, unhash the code chunk below. 

In [None]:
# fig2.write_html('albedo_figure.html')