# Avalanche Forecast: Snow Cleaning

In this notebook I am going to download the snow data and clean it as best I can. then combine it with the weather data

### Import tools

In [39]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
plt.style.use('seaborn')

plt.rcParams['figure.figsize'] = (12, 9)

### Import the data 

In [40]:
snow = pd.read_csv("data/MWACSnowData.csv")
snow.head()

Unnamed: 0,DATE,SEASON,AVY_DANGER,AVY_CHARACTER,WET_DANGER,DRY_DANGER,WET_LOOSE,WET_SLAB,WIND_SLAB,STORM_SLAB,...,HST (CM),HS_CM,SETTLEMENT/MELT,Surf Temp (C),T-10 (C),T-20 (C),CURRENT TEMP,X24_HR_MAX,24HRMAX_SWING,24 HR MIN
0,12/18/10,2010/2011,2.0,DRY,0,2,0.0,0.0,2.0,0.0,...,,40.0,1.0,-6.9,-5.0,-3.7,-9.0,-9.0,,-14.0
1,12/19/10,2010/2011,1.0,DRY,0,1,0.0,0.0,1.0,0.0,...,3.5,42.0,0.0,-8.3,-4.8,-3.2,-14.0,-6.0,3.0,-14.0
2,12/20/10,2010/2011,1.0,DRY,0,1,0.0,0.0,1.0,0.0,...,,40.0,2.0,-10.3,-6.2,-4.6,-13.0,-7.0,-1.0,-13.0
3,12/21/10,2010/2011,3.0,DRY,0,3,0.0,0.0,3.0,0.0,...,,43.0,0.0,-4.4,-4.2,-2.9,-5.0,-5.0,2.0,-14.0
4,12/22/10,2010/2011,2.0,DRY,0,2,0.0,0.0,2.0,0.0,...,,45.0,0.0,-5.3,-2.4,-1.7,-7.0,-4.0,1.0,-7.0


### Take a quick peek at the data

In [41]:
snow.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 55 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   DATE                 1258 non-null   object 
 1   SEASON               1258 non-null   object 
 2   AVY_DANGER           1155 non-null   float64
 3   AVY_CHARACTER        1155 non-null   object 
 4   WET_DANGER           1258 non-null   int64  
 5   DRY_DANGER           1258 non-null   int64  
 6   WET_LOOSE            1155 non-null   float64
 7   WET_SLAB             1155 non-null   float64
 8   WIND_SLAB            1155 non-null   float64
 9   STORM_SLAB           1155 non-null   float64
 10  CORNICE_FALL         1155 non-null   float64
 11  PERSISTENT_SLAB      1155 non-null   float64
 12  DEEP_SLAB            1155 non-null   float64
 13  DRY_LOOSE            1155 non-null   float64
 14  GLIDE_AVALANCHE      1155 non-null   float64
 15  LONG_SLIDING_FALL    1155 non-null   o

We need to figure out what to do about all the null values 

In [42]:
snow.describe()

Unnamed: 0,AVY_DANGER,WET_DANGER,DRY_DANGER,WET_LOOSE,WET_SLAB,WIND_SLAB,STORM_SLAB,CORNICE_FALL,PERSISTENT_SLAB,DEEP_SLAB,...,HST (CM),HS_CM,SETTLEMENT/MELT,Surf Temp (C),T-10 (C),T-20 (C),CURRENT TEMP,X24_HR_MAX,24HRMAX_SWING,24 HR MIN
count,1155.0,1258.0,1258.0,1155.0,1155.0,1155.0,1155.0,1155.0,1155.0,1155.0,...,293.0,835.0,794.0,832.0,880.0,877.0,882.0,877.0,869.0,879.0
mean,2.105628,0.418919,1.654213,0.290909,0.341126,1.763636,0.212121,0.001732,0.158442,0.003463,...,12.041127,109.429621,20.532746,-6.708954,-6.976591,-5.78301,-8.126757,-3.624344,0.098389,-10.651877
std,0.897287,0.863025,1.245064,0.694279,0.859041,1.209607,0.82268,0.058849,0.620285,0.093024,...,23.869419,54.320875,45.089376,6.573803,5.397727,4.585766,7.2773,8.322385,6.574261,7.875199
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,-15.0,-25.9,-30.5,-24.5,-30.0,-31.5,-23.0,-54.0
25%,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,1.0,70.0,0.0,-11.0,-10.0,-8.7,-13.0,-9.0,-3.0,-16.0
50%,2.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,...,4.0,115.0,2.0,-6.5,-6.25,-5.0,-8.0,-4.0,0.0,-10.0
75%,3.0,0.0,3.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,11.5,153.0,6.875,-1.0,-2.6,-2.0,-3.0,2.0,4.0,-5.0
max,5.0,4.0,5.0,4.0,4.0,5.0,5.0,2.0,4.0,3.0,...,196.0,238.0,204.0,16.0,8.0,8.1,14.5,22.0,34.0,14.0


In [43]:

#converting the date to date time variable
snow["date"] = pd.to_datetime(snow["DATE"])

#drop the old date column
snow = snow.drop(["DATE"],axis=1)

#creating the year month and day columns
snow['year'] = pd.DatetimeIndex(snow['date']).year
snow['month'] = pd.DatetimeIndex(snow['date']).month
snow['day'] = pd.DatetimeIndex(snow['date']).day

### finding out the start date and end date of each seasons snow data

I manually looked through the data to find this:

2010/2011 season start: 2010-12-18 end: 2011-05-08

2011/2012 season start: 2012-01-05 end: 2012-05-05

2012/2013 season start: 2012-12-26 end: 2013-05-05

2013/2014 season start: 2013-12-15 end: 2014-04-28

2014/2015 season start: 2014-12-13 end: 2015-05-11

2015/2016 season start: 2016-01-18 end: 2016-04-17

2016/2017 season start: 2016-12-09 end: 2017-04-23

2017/2018 season start: 2017-12-12 end: 2018-04-24

2018/2019 season start: 2018-11-27 end: 2019-04-25

2019/2020 season start: 2019-01-04 end: 2020-03-29

### Write the cleaned data to the csv

In [44]:
snow.to_csv('data/snow_clean.csv')