# **Predicting H1N1 Vaccine Uptake: Insights for Future Public Health Strategies**

## **Business Understanding**

The constant struggle against the infectious diseases such as H1N1 influenza pandemic in 2009 has also contributed to the understanding of the role of the vaccination in the public health measures. Understanding how individuals respond to vaccines can help shape future health initiatives and improve vaccination rates. By accurately predicting individuals' likelihood of receiving the H1N1 vaccine, this project aims to provide valuable insights to public health organizations, policymakers, and healthcare providers.So, by creating a model, the organizations of public health, lawmakers, and health care providers will be able to identify factors that affect the decision to vaccinate. These insights may help in the design of specific public health campaigns and the most effective distribution of vaccines, and therefore ultimately lessen the transmission of infectious diseases through improved herd immunity. The outcomes of this project can suggest further research avenues and contribute to the creation of measures that would increase the public’s confidence in vaccination and ensure the effective control of present and future epidemics.



## Objectives

- Identify key factors that influence individuals' decisions to receive the H1N1 vaccine.
- Build a predictive model to forecast public response to newly introduced vaccines based on demographic and behavioral data.
- Provide actionable insights to public health organizations for developing targeted vaccination campaigns and strategies.
- Enhance preparedness for future pandemics by understanding vaccine acceptance and hesitancy patterns.

## **Data Understanding**

The dataset used in this project is derived from the National 2009 H1N1 Flu Survey conducted by the United States National Center for Health Statistics. This survey collected responses from individuals across the United States regarding their vaccination status against the H1N1 flu virus and the seasonal flu, alongside various demographic, behavioral, and opinion-based factors. The dataset provides a rich source of information to understand the factors influencing vaccination decisions, making it highly suitable for this project.

### Data Source and Suitability

The data comes from a reputable source, the U.S. Department of Health and Human Services (DHHS), specifically the National Center for Health Statistics (NCHS). The dataset includes responses from thousands of individuals, capturing a broad spectrum of the population. It is suitable for the project as it includes both the target variables (whether individuals received the H1N1 and seasonal flu vaccines) and numerous potential predictors, such as:

- Demographic information (e.g., age, sex, race, education, income)
- Health behaviors (e.g., hand washing, mask-wearing, social distancing)
- Health status (e.g., presence of chronic medical conditions, being a health worker)
- Opinions and concerns about the effectiveness and risks associated with the vaccines.

These features provide comprehensive insights into the factors that may influence an individual's decision to get vaccinated, allowing us to build a predictive model with practical applicability.

### Dataset Size and Descriptive Statistics

The dataset consists of several thousand rows, each representing a unique respondent. There are 36 columns in the dataset, including the respondent ID, two target variables (h1n1_vaccine and seasonal_vaccine), and 34 features.

Let's load the dataset and explore its size and some basic descriptive statistics for all features.


Importing the necessary libaries. 

In [1]:
##import necessary libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

#sklearn preprocessing
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.metrics import mean_squared_error

# sklearn classification models
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

#sklearn evaluation metrics and validation
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score