# Project 4: Comparative Predictive Models of Drought

by Daniel Groneberg, Ben McPeek, and Joey Notaro

---

## Notebook Summary

This notebook explores the collective effort to build predictive machine learning models of drought. The reader will become acquainted with the imperative for enhanced drought warning models and the background information behind how different models have historically aimed to predict droughts. Included in this notebook, the reader will find:

* Problem Statement
* Models & Evaluation Metrics
* Background Information
* Notebook Conclusion

---

## Problem Statement

The California Department of Water Resources has hired a team of data scientists to develop comparative machine learning models for predicting upcoming droughts. The effects of global climate change have made it increasingly challenging to predict when a drought may be looming on the horizon but all the more imperative as more extreme weather events occur with increasing frequency.

The team of data scientists want to know:

**Which predictive model is the best indicator** of whether a drought is likely to occur **given historical weather, climate, and satillite image data?** **Does one model perform better** than any other in predicting drought or drought-like conditions for the state of California to declare forecasts of droughts more accurately?

The team of data scientists will use three different models, all employing different assumptions regarding risk of drought and all employing slightly different metrics to evaluate effectiveness. The team will then present its findings and make recommendations to the State Water Resources Board about how to best anticipate droughts for emergency preparedness.

---

## Models & Evaluation Metrics

The team of scientists developed three different models with three different evaluation metrics based on specific assumptions regarding the most current climate science.

**Model 1** focuses on historical weather factors over time to predict the amount of precipitation to expect in the state of California. In this model, we will create a **time series neural network regression model**. This model assumes that the amount of precipitation is a good proxy for whether the state will experience a drought or not. Therefore, the predictions of precipitation may be used as a reasonable prediction of an impending drought as well. Since this is a regression model, this will also serve as a means to determine which weather and climator features have the greatest impact on precipitation over time and by proxy, drought conditions. This model will rely on an **evaluation metric of the Pearson correlation coefficient, R<sup>2</sup>**.
    
**Model 2** focuses on satellite image data of land areas affected by wildfires and those not affected by wildfire. In this model, we will create a **convolutional neural network binary classification model**. This model assumes that the image data of an area affected by wildfires is a good proxy for whether an area is drought-prone or not. Therefore, if we can gather satellite image data of different regions, we may use these images to make a reasonable prediction of an impending drought. This model will rely on an **evaluation metric of accuracy**.

**Model 3** focuses on satellite image data over time showing vegetative coverage of land area, which combines elements of both the first and second models. In this model we will create a **time series neural network classification model** to predict droug. This model assumes that the normalized difference vegetative index (NDVI) captured in satellite images can act as a good proxy for how much an area is experience arid or wet conditions and predict drought as well. This model is distinguished from the second model in that it is reviewing images over a period of time, so we may be able to anticipate future droughts based on past and current changes in the landscape. This model will rely on an **evaluation metric of accuracy**.

---

## Background Information

Before collecting and cleaning our multiple data sources, we will explore some of the background on modeling drought and the imperative for these models.

Drought prediction has historically relied on the accurate prediction of two main factors, precipitation and temperature. However, since climate is inherently variable, these factor maintain notoriously difficult to predict accurately [source](https://water.unl.edu/drought/droughtprediction#:~:text=Predicting%20drought%20depends%20on%20the,several%20months%20to%20several%20decades). While there are indices, such as the Standardized Precipitation Indices (SPI), which many climate scientists can agree are valuable tools in making three or six month predictions, the precise models used in arriving at the indices continue to be refined using an ensemble of models [source](https://cpo.noaa.gov/Divisions-Programs/Earth-System-Science-and-Modeling/Modeling-Analysis-Predictions-and-Projections-MAPP/MAPP-Task-Forces/Drought-Task-Force-I/How-Research-Is-Improving-How-We-Monitor-and-Predict-Drought/Drought-Prediction). In addition to the difficulty in predicting droughts with various models and indices, the U.S. Drought Monitor (USDM) also rates the severity of dryness and droughts across the U.S. at six increasing levels of intensity, so predictive modeling is not merely a simple matter of predicting whether there is or is not drought but also the likely severity and duration of a drought [source](https://droughtmonitor.unl.edu/). The image below represents one of the most current USDM drought maps depicting August 15, 2023:


![drought map](./images/current_usdm.png)


To make matters worse, global climate change is taking an already unpredictable modeling scenario and making it even more challenging to predict. Take just some of the more immediate impacts to California as an example. Higher average global temperatures result in more evaporation off land masses, drying out vegatation and soils [source](https://www.c2es.org/content/drought-and-climate-change/). These intensely dried conditions are part of the reason that this group of data scientists has chosen to model both wildfire image classification and soil moisture time series to predict drought. Although neither wildfire nor soil moisture can act as perfect features to predict drought we can see a somewhat sizeable overlap in these factors and drought. The map presented below shows active, large wildfires as of August 23, 2023 occuring in areas close to intense drought:


![wildfire map](./images/wildfire_map.png)


We believe that this multi-pronged approach to predict future precipitation, future soil moisture, and satellite images of land areas which can be classified as resembling a wildfire area may be among the forefront of machine learning models which give us better insight into future droughts.

---

## Notebook Conclusion

In this notebook, we introduced the problem statement presented to a team of data scientists building predictive models of drought. We discussed the imperative behind why an ensemble of models may be necessary to find more refined ways to predict droughts in an increasingly unpredictable climate future.

In Part 2, we begin exploring Model 1, the time series model to predict precipitation.