<img src="../img/intro.png">

---
## Problem Statement:


#### **Can we predict COVID-19 severity using demographic data?**
<br>

### Table of Contents
- [Models Summary](#Models-Summary)
- [Key Challenges](#Key-Challenges)
- [Interactive Demo](#Interactive-Demo)
- [Future Work](#Future-Work)

# Models Summary

We used regression models and classification models to predict COVID cases/100 people and COVID severity, respectively.

We used combined data for all 5 states, as well as state-level data for our models. Below is a list containing all of our modeling notebooks.

#### Modeling Notebooks
- [All 5 states modeling](../code/06_modeling_5_states.ipynb)
- [California data modeling](../code/07_modeling_ca.ipynb)
- [Florida data modeling](../code/07_modeling_fl.ipynb)
- [Illinois data modeling](../code/07_modeling_il.ipynb)
- [New York data modeling](../code/07_modeling_ny.ipynb)
- [Texas data modeling](../code/07_modeling_tx.ipynb)


We weren't able to achieve high accuracy when we used all 5 states data for modeling, however, the models performed significantly better in some cases when we used state-level data for our of our models.

>*See the [table](#Table-comparing-all-models) below comparing all models.*

This variation in models' predictive accuracy between all 5 states and state level data can be explained by data variation from one state to another, which eventually led to disparities between state-level models in terms of predictive features importance. For instance, population density and income per capita were the two most important features in our model using all 5 states data, but other features like race percentage, age groups percentage, and having health insurance were more important in models using state-level data.



### Table comparing all models


| Region             | Best Regression R2 | Best Classification Accuracy | Classification Baseline |
|--------------------|--------------------|------------------------------|-------------------------|
| CA, FL, IL, NY, TX | 47%                | 63%                          | 42%                     |
| California         | 75%                | 93%                          | 66%                     |
| Florida            | 76%                | 71%                          | 71%                     |
| Illinois           | 32%                | 73%                          | 54%                     |
| New York           | 81%                | 94%                          | 81%                     |
| Texas              | 49%                | 59%                          | 40%                     |

<br>



#### **Below is a list of all EDA notebooks.**

- [All 5 states EDA](../code/04_eda_preprocessing_5_states.ipynb)
- [California data EDA](../code/05_eda_ca.ipynb)
- [Florida data EDA](../code/05_eda_fl.ipynb)
- [Illinois data EDA](../code/05_eda_il.ipynb)
- [New York data EDA](../code/05_eda_ny.ipynb)
- [Texas data EDA](../code/05_eda_tx.ipynb)

# Key Challenges

- **COVID is an ongoing event.**
With more data being collected everyday, we are improving our understanding on how different states and counties are affected by COVID. We will try to improve our models' performance in the future as more data becomes available.

- **Widley varying state-level data.**
As we mentioned earlier, we noticed that data varies widely on state-level, which explaines why our state-level models performed better than our models using all 5 states data for the most part.

- **More features are needed.**
There are other factors which we didn't investigate in our models that might also be important to predict COVID severity, such as county mask wearing policies and the percentage of people wearing masks in each counties. 


# Interactive Demo

We utilized [Folium](https://python-visualization.github.io/folium/) and [Flask](https://flask.palletsprojects.com/en/1.1.x/) to build an interactive demo app.

You can select any of the 5 states in our home page for a county-level visualization of COVID severity.

<img src="../img/homepage.png">

>Let's use Texas as an example. You can see that different counties are highlighted with 3 different shades of red that goes from light to dark as COVID severity increases. 

<img src="../img/tx.png">

>You can hover over any county with your curser to get a more detailed breakdown of county-level factors that contributes to COVID severity score.

<img src="../img/txhover.png">

>Finally, we included a predictive model that you can use to predict COVID severity for other counties in the United States.

<img src="../img/demo.png">

In order to use our interactive demo, please download the repo from our github and run `app2.py` file in the [Flask](../Flask) folder.

#### Resources:
- [Kaggle: World Happiness Report up to 2020](https://www.kaggle.com/mathurinache/world-happiness-report)
- [World Happiness Report](https://worldhappiness.report/ed/2020/)
- [Wikipedia: World Happiness Report](https://en.wikipedia.org/wiki/World_Happiness_Report)