# Phase 3 Project
**Client:** Providence (Most medical centers) <br>
**Authors:** Tommy Phung

## Overview
With the growing doubts on vaccines effectivness, patients are questoning whether to take the Covid vaccine. We will be using the National 2009 H1N1 Flu Survey provided from [United States National Center for Health Statistics](https://www.cdc.gov/nchs/index.htm).
We will be modeling to see if we can predict whether an individual have taken the Seasonal Vaccine based on several different features from the their response in the survey.


***<p style="text-align: center;">Features</p>***


| Label                       | Description                                                                                                                                                                                                                                                                                                                                     |
|:-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| h1n1_concern                | Level of concern about the H1N1 flu                                                                                                                                                                                                                                                                                                             |
| h1n1_knowledge              | Level of knowledge about H1N1 flu                                                                                                                                                                                                                                                                                                               |
| behavioral_antiviral_meds   | Has taken antiviral medications                                                                                                                                                                                                                                                                                                                 |
| behavioral_avoidance        | Has avoided close contact with others with flu-like symptoms                                                                                                                                                                                                                                                                                     |
| behavioral_face_mask        | Has bought a face mask                                                                                                                                                                                                                                                                                                                           |
| behavioral_wash_hands       | Has frequently washed hands or used hand sanitizer                                                                                                                                                                                                                                                                                               |
| behavioral_large_gatherings | Has reduced time at large gatherings                                                                                                                                                                                                                                                                                                             |
| behavioral_outside_home     | Has reduced contact with people outside of own household                                                                                                                                                                                                                                                                                         |
| behavioral_touch_face       | Has avoided touching eyes, nose, or mouth                                                                                                                                                                                                                                                                                                       |
| doctor_recc_h1n1            | H1N1 flu vaccine was recommended by doctor                                                                                                                                                                                                                                                                                                       |
| doctor_recc_seasonal        | Seasonal flu vaccine was recommended by doctor                                                                                                                                                                                                                                                                                                   |
| chronic_med_condition       | Has a chronic medical conditions |
| child_under_6_months        | Has regular close contact with a child under the age of six months                                                                  |
| health_worker               | Is a healthcare worker                                                                                                                                                                                                                                                                                                                           |
| health_insurance          | Has health insurance                                                                                                                                                                                                                                                                                                                               |
| opinion_h1n1_vacc_effective | Respondent's opinion about H1N1 vaccine effectiveness                                                                                                                                                                                                                                                                                           |
| opinion_h1n1_risk           | Respondent's opinion about risk of getting sick with H1N1 flu without vaccine                                                                                                                                                                                                                                                                   |
| opinion_h1n1_sick_from_vacc | Respondent's worry of getting sick from taking H1N1 vaccine                                                                                                                                                                                                                                                                                     |
| opinion_seas_vacc_effective | Respondent's opinion about seasonal flu vaccine effectiveness                                                                                                                                                                                                                                                                                   |
| opinion_seas_risk           | Respondent's opinion about risk of getting sick with seasonal flu without vaccine                                                                                                                                                                                                                                                               |
| opinion_seas_sick_from_vacc | Respondent's worry of getting sick from taking seasonal flu vaccine                                                                                                                                                                                                                                                                             |
| age_group                   | Age group of respondent                                                                                                                                                                                                                                                                                                                         |
| education                   | Self-reported education level                                                                                                                                                                                                                                                                                                                   |
| race                        | Race of respondent                                                                                                                                                                                                                                                                                                                               |
| sex                         | Sex of respondent                                                                                                                                                                                                                                                                                                                               |
| income_poverty              | Household annual income of respondent with respect to 2008 Census poverty thresholds                                                                                                                                                                                                                                                             |
| marital_status              | Marital status of respondent                                                                                                                                                                                                                                                                                                                     |
| rent_or_own                 | Housing situation of respondent.                                                                                                                                                                                                                                                                                                                 |
| employment_status           | Employment status of respondent.                                                                                                                                                                                                                                                                                                                 |
| hhs_geo_region              | Respondent's residence using a 10-region geographic classification defined by the U.S. Dept. of Health and Human Services.                                                                                                                                         
| census_msa                  | Respondent's residence within metropolitan statistical areas (MSA) as defined by the U.S. Census                                                                                                                                                                                                                                                 |
| household_adults            | Number of other adults in household, top-coded to 3                                                                                                                                                                                                                                                                                             |
| household_children          | Number of children in household, top-coded to 3                                                                                                                                                                                                                                                                                                 |
| employment_industry         | Type of industry respondent is employed in. Values are represented as short random character strings                                                                                                                                                                                                                                             |
| employment_occupation       | Type of occupation of respondent. Values are represented as short random character strings                                                                                                                                                                                                                                                       |


***<p style="text-align: center;">Targets</p>***

| Label            | Description                                      |
|------------------|--------------------------------------------------:|
| h1n1_vaccine     | Whether respondent received H1N1 flu vaccine     |
| seasonal_vaccine | Whether respondent received seasonal flu vaccine |

For this model, we will be focusing the **season flu vaccine** label from the dataset.

## Import Libraries
Majority of the libraries being used are from sklearn in order to format the data and create the regression models.

In [2]:
import pandas as pd    # Read the dataset into a dataframe and general adjustments to data points
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score    # Split dataset to training sets, Perform multiple iteractions and perform cross value scores
from sklearn.preprocessing import MinMaxScaler, StandardScaler     # Scalers to scale the dataset 
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score 
from sklearn.tree import DecisionTreeClassifier    # The basic regression model using Decision tree
from sklearn.ensemble import RandomForestClassifier    # A more complex model with Random Forests
import joblib    # Enable to load model previously made
import matplotlib.pyplot as plt 
from sklearn.model_selection import KFold
from sklearn import tree
import numpy as np
from sklearn.compose import ColumnTransformer
from xgboost import XGBClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

## Import Dataset
The dataset was seperated into training dataset and testing dataset. The data only had the target results from the 'testing' set so we will only be using that for our modeling. <br>
We will be focusing mainly on the seasonal vaccine. 

In [3]:
features = pd.read_csv('Data/training_set_features.csv')    # Original Dataset
labels = pd.read_csv('Data/training_set_labels.csv')    # Original Dataset
target = labels['seasonal_vaccine'].copy()    # Only the seasonal vaccine column

## Business Understanding 
Vaccines are a useful way to prevent viral infections. One of the most common viral infections is influenza or most commonly known as the flu. The CDC recommended individuals to recieve the vaccine during the flu season of every year. However, since vaccines aren't manditory, individuals may deny them through personal beliefs or limited knowledge of the vaccine. Vaccines would be wasted where they could be used in other medical centers. <br> 

**For example, 1.1 billion Covid Vaccines were estimated to be wasted due to expired vaccines and supply chain issues.** <br>

In order to give patients the the vaccines as efficently as possible, hospitals and medical center store vaccines to be administer quickly whenever requested. With the current dataset, we could potentially predict whether a patient would want a vaccine based on their answers on the survey. This way, medical center can order an adiquite amount of vaccines with minimual waste. 


## Data Exploration
1. Number of obversations
2. Check for duplicates, NaN's and Missing values  -> Missing values found for multiple.
3. Check columns data type 

In [4]:
# 1. 26,707 observations
print(f'There are {len(features)} in the dataset')

# 2. Check for duplicates
df_list = [features, labels]
if sum([dataframe.duplicated().sum() for dataframe in df_list]) > 0:
    print('Dataframes have duplicates')
else:
    print('No duplicates found')

# Check for NaN / Missing Values
if sum([dataframe.isna().sum().values.sum() for dataframe in df_list]) > 0:
    print('Dataframes have missing values')
else:
    print('Dataframes have no missing values')

# 3. Check columns data types
for dtype in [object, 'number']:
    print('There are {} columns with the data type, {} data types that need to be converted. '.format(sum([len(features.select_dtypes(dtype).columns)]), dtype))

There are 26707 in the dataset
No duplicates found
Dataframes have missing values
There are 12 columns with the data type, <class 'object'> data types that need to be converted. 
There are 24 columns with the data type, number data types that need to be converted. 


## Data Understanding
The dataset consist of binary and numerical entries based on their answers on a survey. To make use of all the data from this survey, all the missing values will be filled with a random value with their weighted probablity. 
The Preprocessing steps are:
1. Dummy variables are needed for the object datatype to be modeled
2. Scaling is used to allow as little influnece in their values since majority are 1's, and 0's.