Module_4_Project

H1N1 Dataset Analysis

The Project: For my module 4 project I chose to do a quick data analysis over a H1N1 flu vaccination dataset from a Datadriven competition.

The Goal: The focus of this project was study the behavioral patterns of those who got the vaccinations for the H1N1 virus and try to promote said behaviors to the generl public to follow.

The Problem:

Certain behavioral patterns are hard to enforce
Not all people believe in the same thing
Not all people are capable of following the recommendations

The Solution:

Create a model that approximates likely behaviors of those who got the vaccinations
Promote those behaviors so everyone will be safe from future outbreaks
Prepaer for the next pandemic

The Process

The plan of attack was to do the following:

Explore & Clean data
Model the data
Find the feature importances to list out to see which factors were the biggest contributors

The Data:

The data was provided by the Driven data Competition under the title "Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines".

The Metrics:

The main metric used was ROC_AUC scoring.

The Models Chosen:

The baseline model - Random Forest
Other models used - Gradient Boosting

Conclusion

The biggest feature importances were the opinions of the individual if the vaccine was effective or not. Some of the other high impacting features were if the individual had health insurance or not and if they had any good behaviors prior to the spread of the virus like washing their hands. The model was able to predict with a ROC_AUC score of 0.85 for the

Future Recommendations

Gather more data to further analyze behavioral patterns
Apply feature engineering to improve model performance.
Take a psychological approach and see if anything changes.

Repository Guide

Notebooks

Data Clean Up With Baseline Model: https://github.com/Tyasuoka/Module_4_Project/blob/master/Jupyter%20Notebooks/Step2_Modeling.ipynb

CSV Files

Original Data: https://github.com/Tyasuoka/Module_4_Project/blob/master/CSVs/training_set_features.csv
Cleaned Data: https://github.com/Tyasuoka/Module_4_Project/blob/master/CSVs/Finalized_Dataset.csv

Presentation https://www.canva.com/design/DAD7cf_pXsc/Ldc6EKvxKc24pvs-FY2_WQ/view?utm_content=DAD7cf_pXsc&utm_campaign=designshare&utm_medium=link&utm_source=homepage_design_menu

Resources

The Data: Below is the direct link to the data source. https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/

Models: Below you will find model documentation

Random Forest: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html Gradient Boost: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

Human Resources

My Last Data Bender Cohort 02/17/20 classmates
Lindsey Berlin DS 02/17/20 Instructor
Bryan Arnold DS 002/17/20 Instructor

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CSVs		CSVs
Jupyter Notebooks		Jupyter Notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Module_4_Project

H1N1 Dataset Analysis

Conclusion

Future Recommendations

Repository Guide

Resources

Human Resources

About

Uh oh!

Releases

Packages

Languages

Tyasuoka/Module_4_Project

Folders and files

Latest commit

History

Repository files navigation

Module_4_Project

H1N1 Dataset Analysis

Conclusion

Future Recommendations

Repository Guide

Resources

Human Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages