The data for this competition comes from the National 2009 H1N1 Flu Survey (NHFS).
In their own words:
The National 2009 H1N1 Flu Survey (NHFS) was sponsored by the National Center for Immunization and Respiratory Diseases (NCIRD) and conducted jointly by NCIRD and the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC). The NHFS was a list-assisted random-digit-dialing telephone survey of households, designed to monitor influenza immunization coverage in the 2009-10 season.
The target population for the NHFS was all persons 6 months or older living in the United States at the time of the interview. Data from the NHFS were used to produce timely estimates of vaccination coverage rates for both the monovalent pH1N1 and trivalent seasonal influenza vaccines.
Data Collection: Driven Data: https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/
Data Preprocessing: Cleaning and preprocessing the dataset to handle missing values, outliers, and inconsistencies. This step also involves transforming categorical variables into numerical representations, normalizing numeric features, and splitting the dataset into training and testing subsets.
Model Training: The procedure involves employing logistic regression with relevant libraries or frameworks to train the model. Training encompasses the adjustment of the model's parameters, such as coefficients, through the utilization of training data. The goal is to optimize the model's performance by minimizing the disparities between its predicted class labels and the actual class labels in the training dataset.
Model Evaluation: Assessing the performance of the trained logistic regression model involves using classification-specific evaluation metrics, such as accuracy, precision, recall, F1-score, ROC curve, AUC, and the confusion matrix, to gauge how well the model classifies and discriminates between different categories.
Contributions to this project are welcome. If you would like to contribute, please follow these steps:
- 1)Create a new branch from the main branch to work on your changes.
- 2)Make your modifications and commit your changes.
- 3)Push your branch to your forked repository.
- 4)Open a pull request to the original repository, describing the changes you made.
This project is licensed under the GPU License.
- The dataset used in this project is sourced from: https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/
- The Gradient Boosting Machine algorithm is implemented using the scikit-learn library.
If you have any questions or suggestions regarding this project, please feel free to contact me at eswaraditya63@gmail.com