# Capstone Project: Predicting Snow Depth with Machine Learning Models

**Subtitle:** A comprehensive analysis using Random Forest and Bayesian Optimization  
**Author:** Audrey Malloy 

**Date:** April 8th, 2025

## Executive Summary

- **Problem Statement:** Accurately predicting snow depth is critical for environmental monitoring and watershed resource management and decision-making.
- **Objective:** Develop a machine learning model that predicts snow depth with high accuracy.
- **Key Findings:**
  - Best model achieved an R² score of 0.99 and RSME of ~3.00.
  - Feature importance analysis identified month, precipitation accumulation, and air temperature as critical contributors.
- **Recommendations:** 
  1. Implement real-time snow depth predictions.
  2. Use scenario analysis for drought and climate change preparedness.  
  3. Collect additional data for improved model accuracy.

## Introduction

**Problem Statement:**  
Snow depth prediction is essential for sectors like agriculture, hydrology, skiing, and urban planning.

**Why It Matters:**  
Accurate snow depth predictions help in disaster preparedness, resource management, and improving operational efficiency in snow-sensitive industries.

## Methodology

### Data Collection:
- Meteorological data (precipitation, temperature, soil temperature)
- Geographical data (elevation, station names)
---
### Data Summary:
#### General Information
- **Date**: Represents the observation date.
- **Station Name**: Name of the weather station.
- **Elevation**: Elevation of the station above sea level.
- **Latitude**: Latitude coordinate of the station.
- **Longitude**: Longitude coordinate of the station.

#### Snow and Precipitation Metrics
- **Snow Depth**: Measurement of snow depth.
- **Precipitation Accumulation**: Total precipitation measured over a time period.
- **Precipitation Increment**: Incremental change in precipitation.

#### Air Temperature Metrics
- **Air Temperature Average**: Average air temperature over the observation period.
- **Air Temperature Max**: Maximum air temperature recorded.
- **Air Temperature Min**: Minimum air temperature recorded.
- **Air Temperature Observations**: Total air temperature observations recorded.

#### Soil Temperature and Moisture Metrics
- **Soil Temperature Observations**: Observations of soil temperature at the station.
- **Soil Moisture Average**: Average soil moisture over the observation period.
- **Soil Moisture Max**: Maximum soil moisture observed.
- **Soil Moisture Min**: Minimum soil moisture observed.
- **Soil Temperature Average**: Average soil temperature over the observation period.
- **Soil Temperature Max**: Maximum soil temperature recorded.
- **Soil Temperature Min**: Minimum soil temperature recorded.

#### Short-Term Metrics (7-Day Observations)
- **7-Day Air Temperature Average**  
- **7-Day Precipitation Average**  
- **7-Day Snow Depth Average**  
- **7-Day Soil Temperature Average**  
- **7-Day Standard Deviations**:
  - Air Temperature, Precipitation, Snow Depth, and Soil Temperature.
- **7-Day Variances**:
  - Metrics for air temperature, precipitation, snow depth, and soil temperature.
- **7-Day Sums**:
  - Sum metrics for air temperature, precipitation, snow depth, and soil temperature.
- **7-Day Medians**:
  - Median metrics for air temperature, precipitation, snow depth, and soil temperature.
- **7-Day Min and Max**:
  - Minimum and maximum values for air temperature, precipitation, snow depth, and soil temperature.

#### Long-Term Metrics (30-Day Observations)
- **30-Day Air Temperature Average**  
- **30-Day Precipitation Average**  
- **30-Day Snow Depth Average**  
- **30-Day Soil Temperature Average**  
- **30-Day Standard Deviations**:
  - Air Temperature, Precipitation, Snow Depth, and Soil Temperature.
- **30-Day Variances**:
  - Metrics for air temperature, precipitation, snow depth, and soil temperature.
- **30-Day Sums**:
  - Sum metrics for air temperature, precipitation, snow depth, and soil temperature.
- **30-Day Medians**:
  - Median metrics for air temperature, precipitation, snow depth, and soil temperature.
- **30-Day Min and Max**:
  - Minimum and maximum values for air temperature, precipitation, snow depth, and soil temperature.

#### Additional Features
- **Month**: Month of the observation.
- **Year**: Year of the observation.

### Data Preprocessing:
- Handling missing values.
- Scaling and inputing dummy variables for categorical features.
- Windowing time series data into 7-day, 30-day with statistical metrics of variance, standard deviation, mean, max, min, and sum.



### Exploratory Data Analysis (EDA) Report

Exploratory Data Analysis (EDA) was conducted to gain insights into the snow depth dataset, identify trends, detect anomalies, and evaluate relationships between variables. This process guided feature selection and informed modeling decisions for snow depth predictions.

---

## 1. Variable Analysis

### 1.1 Time Series Analysis
- **Snowfall Months (October-May)**:
  - The dataset was limited to months with consistent snowfall (October-May) to avoid skewness from snow-free periods.
    
 <div style="text-align: center;">
    <img src="time_snowdepth.jpg" alt="Time Series Snow Depth Graph" width="800" height="600">
</div>
---
<div style="text-align: center;">
    <img src="winter_vs_year.jpg" alt="Snow Depth Winter Data vs Full Seasons Data" width="800" height="600">
</div>

- **September's Skewed Distribution**:
  - The month of September exhibited a heavily skewed distribution due to rare snowstorm events.
  - *Action Taken*: September data was excluded for consistency.




### 1.2 Station Data
- **Station Removal**:
  - Stations with insufficient data coverage or short timeframes were removed to ensure the dataset was robust and reliable.

### 1.3 Distribution Patterns
- Histograms and scatter plots were used to identify the distribution of variables such as snow depth, air temperature, and precipitation.

---

## 2. Relationships Between Variables and Target

### 2.1 Temperature Observations
- **Analysis of Observation Windows**:
  - Evaluated air temperature metrics over 7-day and 30-day windows.
  - Insights:
    - 7-day metrics demonstrated stronger relevance to snow depth compared to 30-day metrics.

  <div style="text-align: center;">
    <img src="windows_airtemp.jpg" alt="7-day 30- day comparison" width="800" height="600">
</div>
  

### 2.2 Snow Depth Distribution
- **By Elevation and Stations**:
  - Violin plots and box plots revealed snow depth variability across different stations (e.g., Brighton, Mill-D North, Dry Fork) and elevation ranges.
  - Insights:
    - Higher elevations consistently showed greater snow depth.

  <div style="text-align: center;">
    <img src="station_violinplot.jpg" alt="Station vs Snowdepth Violin Plot" width="800" height="600">
</div>   

### 2.3 Correlation Analysis
#### **Strong Positive Correlations**
- **7-Day Snow Depth Metrics**:
  - `7d_snowdepth_avg (0.988)`, `7d_snowdepth_sum (0.988)`, and `7d_snowdepth_max (0.988)` showed extremely strong correlations with the target variable.
  - *Insight*: Short-term metrics are closely tied to snow depth.
- **30-Day Snow Depth Metrics**:
  - `30d_snowdepth_min (0.923)` and `30d_snowdepth_avg (0.916)` demonstrated strong correlations, though slightly weaker than 7-day metrics.

#### **Moderate Positive Correlations**
- **Precipitation Variables**:
  - Metrics such as `30d_precip_std (0.620)` and `precip_accumulation (0.589)` showed moderate correlations.

#### **Negative Correlations**
- **Air Temperature**:
  - Metrics like `airtemp_avg (-0.279)` showed an inverse relationship with snow depth.
- **Soil Temperature**:
  - Features like `soiltemp_max (-0.454)` exhibited a stronger negative relationship compared to air temperature.

#### **Weak Correlations**
- **Geographical Variables**:
  - `Latitude (0.239)` and `elevation (0.236)` suggested weak-to-moderate influences.
- **Soil Moisture Variables**:
  - Metrics such as `soilmoisture_avg (0.073)` exhibited very weak correlations.

---

##  Key Insights

1. **Short-Term Metrics Dominate**:
   - Snow depth-specific 7-day metrics had the strongest correlations with the target variable, emphasizing the importance of recent conditions.
2. **Negative Impact of Temperature**:
   - Higher soil and air temperatures consistently correlated negatively with snow depth, reflecting snowmelt effects.
3. **Precipitation’s Role**:
   - While precipitation impacts snow depth, its correlation was moderate compared to snow depth-specific features.
4. **Snow Depth and Elevation**:
   - Snow depth increased significantly with elevation across stations, particularly Brighton and Mill-D North.

---

EDA provided valuable insights into the key factors influencing snow depth, such as short-term snow metrics, soil and air temperature, and elevation. These findings guided feature selection and model development, ensuring robust and accurate snow depth predictions.



### Model Selection:
- **Model:** Random Forest Regressor
- **Optimization Methods:** Bayesian Optimization and Grid Search CV
- ### Feature Importance:
Important Features:

![Feature Importance Graph](feature_importance.png)

### Evaluation Metrics:
-  Root Mean Squared Error (RMSE)

## Findings

### Model Performance: Root Mean Squared Error (RMSE)

| **Model**                          | **Root Mean Squared Error (RMSE)** |
|------------------------------------|------------------------------------|
| Randomized Search                  | 3.00                              |
| Grid Search CV                     | 3.00                              |
| Random Forest (no hypertuning)     | 3.03                              |
| Bayesian Optimization              | 3.00                              |
| Support Vector Machine (SVM)       | 9.73                              |
| Decision Tree                      | 4.62                              |

---

### **Key Takeaways**
- All Random Forest models (with or without hyperparameter tuning) show consistent and low RMSE values, indicating robust performance.
- **Support Vector Machine (SVM)** has the highest RMSE (9.73), suggesting it struggles more with predicting snow depth accurately.
- The **Decision Tree** model performs moderately well with an RMSE of 4.62 but lags behind the Random Forest models.

---
**Conclusion**: Bayesian Optimization and other hyperparameter-tuned Random Forest models provide the best predictive accuracy (RMSE ~3.00).

<div style="text-align: center;">
    <img src="actual_predicted.jpg" alt="Predicted vs Acutal Snowdepth" width="800" height="600">
</div>



### Scenario Analysis:
1. **Seasonal Trends:** Snow depth varies significantly between winter and summer months.
2. **Extreme Weather Events:** High precipitation leads to substantial snow accumulation.
3. **Long-term Air Temperature Effects:** Persistent cold temperatures increase snow depth; warmer conditions result in snow melt.
4. **Soil Temperature Influences:** Frozen soil supports higher snow accumulation than thawed soil.

# Predicted Snow Depth Across Scenarios

### Seasonal Trends
| elevation | precip_accumulation | soiltemp_max | predicted_snow_depth |
|-----------|----------------------|--------------|-----------------------|
| 8750      | 50                   | 10           | 131.555128            |
| 8750      | 5                    | 65           | 1.840968              |

---

### Extreme Weather Events
| elevation | precip_accumulation | soiltemp_max | predicted_snow_depth |
|-----------|----------------------|--------------|-----------------------|
| 8750      | 100                  | 5            | 63.335368             |
| 8750      | 1                    | 25           | 14.357263             |

---

### Long-term Air Temperature Effects
| elevation | precip_accumulation | soiltemp_max | predicted_snow_depth |
|-----------|----------------------|--------------|-----------------------|
| 8750      | 30                   | 0            | 93.563687             |
| 8750      | 30                   | 20           | 35.626638             |

---

### Soil Temperature Influences
| elevation | precip_accumulation | soiltemp_max | predicted_snow_depth |
|-----------|----------------------|--------------|-----------------------|
| 8750      | 30                   | -5           | 94.715084             |
| 8750      | 30                   | 20           | 95.015164             |

---
<div style="text-align: center;">
    <img src="scenarios.jpg" alt="Snow Depth across Scenarios Graph" width="800" height="600">
</div>


## Recommendations

1. Use the model for real-time snow depth predictions in environmental monitoring systems.
2. Implement scenario analysis for anticipating extreme snowfall events.
3. Enhance data collection systems to include additional features like wind speed and snowfall type.

### Further Research Ideas:
- Expand datasets with new meteorological features.
- Explore advanced machine learning algorithms (e.g., deep learning).

## Conclusion

The developed model achieves high accuracy in snow depth predictions, demonstrating its value in environmental monitoring and disaster management. The findings provide actionable insights and a foundation for future research.

 <div style="text-align: center;">
    <img src="snow_predictionTIME.jpg" alt="Predicted Snow Depth" width="800" height="600">
</div>

## References

- [Data Source Name]: [Link]
- [Research Paper Title]: [Link]
- [Tool Documentation]: [Link]