
## Milestone Capstone:  
# Utilizing Meteorological Data with Supervised Learning to Predict Snowfall Amounts at Ski Resort
**By Dustin Rapp**  

--  
--  

## Introduction
***
Complex terrain in mountainous areas often make predicting snowfall difficult with prognostic weather models - especially on specific slopes or mountainsides where extremely localized air flows may complicate such forecasts.  With accurate snow forecasts, ski resorts can optimize their snowfall making, grooming, and snow removal operations. An accurate short term snowfall forecast, even for a small segment of the mountain may likely assist a ski resort's operation.  The goal of this study is to get a glimpse into the potential of utilizing a supervised learning techniques with freely available surface and meteorological data to predict snowfall on a slope at Copper Mountain Ski Resort in Colorado.  Copper Mountain Ski Resort may be especially interested in such predictive models because of the unique access to government funded meteorological data being recorded near or onsite to their resort.  

The purpose of this report is to discuss data to be utilized and make general assessment regarding how well a supervised learning model might perform. 


## Data
The Copper Mountain ski resort is unique as there is an official SNOTEL National Resources Conservation Service monitoring station the north slope of Copper Mountain, where many popular ski runs are located. SNOTEL is a telemetry automated system of snowpack and related climate sensors in the Western United States. In addition to reporting hourly snowfall amounts, it also records temperature.  The Copper Mountain ski resort is also has an Colorado Department of Transportation Automated Weather Observing System (AWOS) which monitors a suite of hourly variables near the top of Copper Mountain.  Additionally, a National Weather Service Automated Surface Observing Station (ASOS) is located in Leadville, CO approximately 30 km to the northwest of Copper Mountain.  The SNOTEL si referencenced as SNOTEL, K

These three stations give a comprehensive meteorological dataset of surface variables in the vicinity of the Copper Mountain Resort.   **Table 1** gives a listing of all surface level meteorological variables by station. Hourly data for each station was downloaded for years 2005-2017 from online sources.  Data sources for each station are found in **Table 2**.  A map showing the Copper Mountain SNOTEL site and the meteorological sites used in this assessment is also shown in **Figure 1**  
  

<div style="text-align: center"> **Table 1 - Meteorological Variables by Station**  </div> 

|**Station ID/Num** |**Station Type**   |**Elevation**   |**Variables**     | **Data Source**   |  
|:-----------------:|:-----------------:|:----------------:|:--------------:|:-----------------:|
| **SNOTEL 415**    | SNOTEL    |10550'     | Temperature <br> Snow Depth       |   National Resources <br> Conservation Service <br> (www.NRCS.gov)
| **KLXV**          | ASOS      |12075      | Wind Speed <br> Wind Direction <br> Cloud Cover| National Climatic <br> Data Center <br> ISHD Lite format <br> (www.NCDC.gov) 
| **KCCU**          | AWOS      |10550'     | Dewpoint <br> W<br> Wind Speed <br> Wind Direction <br> Cloud Cover| National Climatic <br> Data Center <br> ISHD Lite format <br> (www.NCDC.gov) 

  

***  



  ***
**Figure 1 - Map of SNOTEL and KCCU Station Locations at Copper Mountain Ski Report**  

 
KXLV site relative to Copper Mountain Ski Resort            |  Relative locations of the SNOTEL and KCCU sites at the Copper Mountain Ski Resort |
:---------------------------------------------------------- |:----------------------------------------------------------------------------------:|
![](figs/KLXV.png)                                          |  ![](figs/KCCU_and_SNOTEL_map.png)                                                   


  

  






## Data and Wrangling Cleaning

### Data Organization
Hourly surface data from each station, downloaded, organized and combined into a single  timeseries dataframes with UTM timestamps.  

The following cleanup steps were performed on this dataset:

 - While the KCCU and KXLV datasets were already in UTM time, the NRCS dataset was in local time and required conversion to UTM.   
 -  The KCCU and KXLV datasets are in the Integrated Surface Hourly Data (ISHD) format and did require some manipulation (e.g. divided by 10) to get values into typical units. 
 - Missing values (e.g. 9999 values) were translated to NaN values.
 -  Missing data for all variables was linearly interpolated for time periods where 3 hours or less of data was missing. 

The data was plotted to see if there were any extreme values warranting removal. It was noted that some of the KCCU data (especially temperature) did not demonstrate as much of a diurnal variation as the KXLV station.  These data are considered suspicious but were not removed from the dataset.  A more robust quality control of this dataset is outside the scope of this preliminary study, but should be considered for future studies.

A small amount of anomalous data was observed in the SNOTEL snow depth data and was removed.  These physically unrealistic readings (e.g. spikes in some of the snow depth data or snowdepth reports which occur when temperatures did not support snowfall) were removed as well as extreme negative values. 



### Additional Calculations

**Pressure**  
Changes in pressure are often a predictive indicator of weather conditions (i.e. pressure drops often accompany strong storm systems), a twelve hour pressure change variable was added to the datset.  This was calculated by subtracting the 00:00 observation from the upcoming 12:00 observation.

**Snowfall**    
As the SNOTEL data only includes snow depth data instead of snowfall data, snowfall was calculated based on changes in snowdepth. Due to the sensitivity of the SNOTEL snow depth measurement sensors to external forces (e.g. debris, air pressure), snow depth data from the SNOTEL site appeared noisy for smaller snowstorms (i.e. less then 3 inches). To minimize the small scale perturbations found in the data, 12 hour snowfall totals were estimated at 00:00 UTC and 12:00 UTC and only 12-hr snowfall events where greater then or equal to 3 inches occurred were considered a snowfall event.  The snowfall data was then added to meteorological dataframe.   

Because only 00:00 and 12:00 snowfall observations were utilized in the analysis, all variables in the meteorological dataframe were reduced from hourly observations to twelve hour observations.  A new dataframe was created utilizing only 00:00 and 12:00 observations.

A table showing the total number of snowfall events, along with mean, max, and standard deviation of snowfall for each year is found in **Table 3**.  A timeseries plot showing the snowdepth, along with these snowfall events is found in **Figure 2**.
  

***


**Table 3  Annual Statistics of 12-hr Snowfall Events (>=3")**  

|   Year    |  Number 12hr Snowfall Events >=3  |  Mean  |  Median  |  Max  |  Std Deviation  |  %Missing SnowDepth  |
|  :----:   | :-------------------------------: | :----: | :------: | :---: | :-------------: | :------------------: |
|   2006    |                26                 |  4.8   |    4     |  11   |      1.87       |         0.69         |
|   2007    |                29                 |  3.9   |   3.3    |  6.5  |      1.17       |         0.69         |
|   2008    |                27                 |  4.5   |   3.7    |   8   |      1.85       |         0.69         |
|   2009    |                27                 |  4.3   |    4     |  13   |      1.92       |         0.69         |
|   2010    |                30                 |  4.6   |    4     |   9   |      1.75       |         0.69         |
|   2011    |                32                 |  4.3   |    4     |   7   |      1.38       |         0.69         |
|   2012    |                14                 |  5.1   |    4     |  10   |      2.29       |         0.69         |
|   2013    |                32                 |  4.3   |    4     |  12   |      1.78       |         0.69         |
|   2015    |                23                 |  4.2   |    4     |   8   |      1.24       |         0.68         |
|   2016    |                32                 |  4.9   |    4     |  16   |      2.98       |         0.69         |
|   2017    |                29                 |  4.6   |    3     |  16   |      2.81       |         0.69         |
| 2013-2017 |                338                |  4.6   |    4     |  16   |      2.12       |         0.69         |


***

**Insert Figure 2  Timeseries of snow depth and snowfall events**  
 


  
<p float="left">
  <img src="figs/snowdepth_snowfall.png" width="1500" />
</p>
  



  
## Linear Regression Analysis  

To assess snowfall prediction potential with Ordinary Least Squares (OLS) model, a linear regression analysis was performed on each feature in the dataset.  For each potential variable, data was plotted against snowfall amounts which would occur over the next 12 hours.    Slope, standard error, R square values, along with p values were calculated for all variables. 

A table showing results from this analysis are shown in **Table 3**.  The data are sorted by largest R value. Note that the variables with the best predictive capabilities are dewpoint, KCCU Wind Speed, and pressure changes. Though Cloud Cover does have higher R values as well, the p values and amount of data missing is also very high. While the R values are not notably high (all are less then 0.2), p values for dewpoint, 12-hr pressure change are less then 0.05, indicating that there may be some predictive skill with an OLS model.  It is also important to note that that cloud cover is a categorical variable (values are in integers from 0 to 8) and wind direction is a circular variables (values range from 0 to 360 degrees) and do not lend themselves well to linear regression type statistic.  These two variables should be considered cautiously in a linear regression analysis, but will be considered  in an OLS analysis as some predictive properties may be 

There

***

**Table 3 - Output statistics from Linear Regression Analysis [1]**  

|                                                |       Max |       Min[1] |          Mean |   Slope |   Std Error |   R Value |   P-value |   % Missing |
|:------------------------------------------------|-----------|-----------|---------------|---------|-------------|-----------|-----------|-------------|
| KCCU Dewpoint (deg C)                        |      0    |    -27    |     -9.7  |   0.085 |       0.03  |     0.171 |     0.005 |       21.9 |
| KCCU CloudCover (oktas)                     |      8    |      0    |      7.4  |   0.145 |       0.085 |     0.142 |     0.089 |       57.1 |
| LXV Dewpoint (deg C)                          |      2.8  |    -22.8  |     -8.1  |   0.074 |       0.031 |     0.134 |     0.019 |       9.5 |
| LXV CloudCover (oktas)                       |      8    |      0    |      6.8  |   0.088 |       0.059 |     0.122 |     0.136 |       55.0 |
| LXV 12hr Pressure difference (hp)                 |     13.35 |    -20.2  |     -3.0  |  -0.051 |       0.024 |    -0.122 |     0.033 |       10.4 |
|CMtn WindSpeed (m/s)                        |     20.1  |      0    |      7.7  |   0.061 |       0.034 |     0.112 |     0.073 |       24.3 |
| CMtn Temperature (degC)                     |      7    |    -21    |     -4.7  |   0.035 |       0.031 |     0.069 |     0.261 |       21.3 |
| CMtn Temperature (deg C)                        |      7.6  |    -18.7  |     -3.6  |   0.036 |       0.03  |     0.066 |     22.6 |       0.0     |
| SNTL  Temperature (deg C)                    |      7.6  |    -18.7  |     -3.6  |   0.036 |       0.03  |     0.066 |     0.226 |       0     |
| LXV Temperature (deg C)                       |     13.3  |    -17.2  |     -3.0 |   0.03  |       0.026 |     0.065 |     0.257 |       9.5 |
| LXV Wind Direction (deg)                      |    360    |      0.0    |    184.1    |  -0.001 |       0.001 |    -0.055 |     0.336 |       0.095 |
| CMtn Wind Direction (deg)                    |    360    |      0.0    |    236.5    |   0.002 |       0.002 |     0.044 |     0.48  |       0.243 |
|LXV Pressure (hp)                            |   1028.5  |    983.3  |   1005.5     |  -0.011 |       0.015 |    -0.043 |     0.457 |       10.1 |
| LXV WindSpeed (m/s)                          |     13.4  |      0.0    |      3.7  |   0.005 |       0.054 |     0.005 |     0.926 |       9.5 |  

[1] Feature statistics are calculated based on only 12-hr values which  have a matching 12-hr snowfall value

## Conclusion
While not large, there are some significant relationships between some meteorological variables and snowfall amount when snowfall does occur.  It is anticipated that there may be some 12-hr snowfall predictive ability predicting snowfall utilizing a very simple Ordinary Least Squares model with only meteorological measurements - especially dewpoint and 12-hr pressure changes. There is recognition that snowfall is a very complex variable to forecast, and a simple OLS model may have limitations. Snowfall amounts can dependent on a variety of factors including snow water equivalent, temperature during crystal formation in the upper atmosphere, along with melting/freezing activity as snowflakes fall to the surface.  Upper air data may be helpful in overcoming some of these complexities and limitations, and my be integrated in this analysis.  Despite the complexities of a snowfall prediction, an OLS model is a good starting point to begin to understand how data science techniques could be utilized. 

