Skip to content

According to the CDC, heart disease is one of the leading causes of death for people of most races in the US. Our ML project leads to a better understanding of how we can predict heart disease.

Notifications You must be signed in to change notification settings

Machine-Learning-Projects1/2020-BRFSS-Codebook-CDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A project for data enthusiasts 👋🏼

img GitHub last commit

A machin learning project to predict Heart disease using 17 Risk Factor provided by BRFSS

The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is a system of ongoing health-related telephone surveys designed to collect data on health-related risk behaviors, chronic health conditions, and the use of preventive services from the non-institutionalized adult population (≥ 18 years) residing in the United States. The BRFSS is administered and supported by CDC's Population Health Surveillance Branch, under the Division of Population Health at CDC's National Center for Chronic Disease Prevention and Health Promotion.

Originally, the dataset come from the CDC (1) and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]". In this dataset, We noticed many different factors (questions) that directly or indirectly influence heart disease, so we decided to select the most relevant variables from it and do some cleaning so that it would be usable for machine learning projects (2).



Jupyter Notebook

Step by step explaination of how we trained different models on CDC dataset and compare the models

Also, you can find out our Heath Disease Prediction APP, based on this repository


Refrences

  1. codebook20_llcp-v2-508.pdf
  2. personal-key-indicators-of-heart-disease

About

According to the CDC, heart disease is one of the leading causes of death for people of most races in the US. Our ML project leads to a better understanding of how we can predict heart disease.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published