# FEMA Disaster Predictor and Capstone
*Year after year disasters strike the United States causing billions of dollars of damages and killing thousands of people. Individual states and counties are able to declare FEMA disasters under one of three different categories. Disaster Relief, Emergency Management and Fire Management are those categories. By declaring a disaster counties are able to receive Federal funds and bring in additional relief. This will be a predictor and visualization system to predict the amount of disasters in a given season in the United States.*
## 1. Data
Data was gathered from primarily one source. Since 1953 FEMA has provided data on every disaster declared.

[FEMA Data](https://www.fema.gov/openfema-data-page/disaster-declarations-summaries-v2)


## 2. Method
Create a weighted ensemble of highest performing models to create a predictive model for disasters per season.
1. Gather FEMA data
2. Clean data into a useable form
3. Create features to be used in model building
4. Build models 
5. Tune models
6. Ensemble models in a weighted voting regressor model
7. Determine highest performing model
8. Visualize Data in a Tableau Dashboard

## 3. Gather FEMA Data
[Data Wrangling Notebook](https://github.com/ColemanZ/Springboard/blob/main/Disaster%20Capstone%20Data%20Wrangling.ipynb)
Data was gathered from one source listed above. The FEMA data came in the form of a single CSV file containing a row for each d
* **Problem 1** Too many states
    At start there were 59 states provided. This ended up being US territories that are spread all across the world. This territories also have different thresholds for what is considered a disaster compared to states.
* **Solution 1** Drop territories
    After some deliberation I decided that this project would focus on the 50 US states and DC only.

* **Problem 2** Bad Features
    A number of features had issues. There were 3 features that contained null values. There were also a large number of useless columns that did not contain any useful or repeated information.
* **Solution 2** Drop bad features
    I decided that rather that columns that contained repeated information, unique identifiers, or null values should be dropped. 
    

## 4. Clean Data Into a Useable Form
[Exploratory Data Analysis Notebook](https://github.com/ColemanZ/Springboard/blob/main/Disaster%20Project%20EDA.ipynb)
The data as presented needed very little cleaning at this point. After clearing up the amount of incident types we had I did some basic visualizations to get an understanding of the data.

* **Problem 1** Too many incident types
    Incident types define what kind of disaster took place. To begin there were nearly 35 incident types, many of which were very similar to other incident types. Since this is a classification determined by the state that is declaring a FEMA disaster there wasn't any similarity from one state to another for what is considered one incident type vs another. For instance the difference between a blizzard and winter storm might have only been the state that is declaring the disaster.
* **Solution 1** Reconsolidate incident types
    To limit the number of incident types and make the eventual visualizations more impactful I decided to group any similar incident types together. This brought the total incident types down to 21.
    

## 5. Create Features to be used in Model Building
[Preprocessing and Training Data Development Notebook](https://github.com/ColemanZ/Springboard/blob/main/Disaster%20Capstone%20Pre-Processing.ipynb)

A number of features were created for use in the eventual model building. We already touched on the "incident length". Since the eventual project was predicting number of disasters in a season we also needed to determine which month and which season a disaster took place. In addition to this a large number of features needed to be encoded to be usable in model building. Most difficult was building a loop that counted the number of disasters in a given season. A number of features also needed to be encoded to be useful in model building. One hot encoding worked with declaration type as there were only three unique values. Catboost encoding needed to be used for incident type and state. 


## 6. Build Models
[Modeling Notebook](https://github.com/ColemanZ/Springboard/blob/main/Disaster%20Capstone%20Modeling.ipynb)
After the data was cleaned and features were created I used pycaret to get a feel of what models needed to be built initially. After running that a number of times I chose to build 4 models to eventually get tuned and then analyse. 
**The Models Built**
* Linear Regression
* Random Forest Regression
* Random Forest Classifier
* XG Boost


## 7. Tune Models
After all the models were built they were all tuned. Some of them changed slightly others did not. 
**Random Forest Regression Model**
**RMSE**:19.38
**MSE**:375.51
**Cross Validation Score**:.9947

**Random Forest Classifier Model**
**RMSE**:18.11
**MSE**:328.13
**Coefficiant of Determination**:.9832

## 9. Evaluate Models and Conclusion
Given how signficantly better the random forest models were there was no need to build a weighted voting classifier. Had this been built the success of the random forest regression model would have been dragged down but less successful models. The Random Forest Regressor model was 
## 10. Acknowledgements
Thank you to Raghunandan Patthar, my mentor at Springboard for his excellent advice throughout this project.