# Business Understanding
Heart disease is the leading cause of death in the United States. The term "heart disease" refers to several types of heart conditions. The most common type of heart disease in the United States is coronary artery disease (CAD), which can lead to a heart attack. Machine learning leads to a better understanding of how we can predict heart disease.



# Data Understanding
### About the dataset 
Factors assessed by the BRFSS in 2020 included health status and healthy days, exercise, inadequate sleep, chronic health conditions, oral health, tobacco use, cancer screenings, and health-care access (core section). Optional Module topics for 2020 included prediabetes and diabetes, cognitive decline, electronic cigarettes, cancer survivorship (type, treatment, pain management) and sexual orientation/gender identity (SOGI).

# Data Collecting 
The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is a system of ongoing health-related telephone surveys designed to collect data on health-related risk behaviors, chronic health conditions, and the use of preventive services from the non-institutionalized adult population (≥ 18 years) residing in the United States. The BRFSS is administered and supported by CDC's Population Health Surveillance Branch, under the Division of Population Health at CDC's National Center for Chronic Disease Prevention and Health Promotion.

### Dataset Description [source](https://www.cdc.gov/brfss/annual_data/2020/pdf/codebook20_llcp-v2-508.pdf)

<br>

|Category|Label|Question|Value| 
|-|-|-|-|
|<b>HeartDisease</b>|Ever had CHD or MI| <i>-Respondents that have ever reported having coronary <br> -heart disease (CHD) or myocardial infarction (MI)</i>|-Yes<br>-No|
|<b>BMI</b>|Computed body mass index|<i>Computed body mass index</i>|Integer[1-9999]
|<b>Smoking</b>|Smoked at Least 100 Cigarettes|<i>Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]</i>|-Yes<br>-No|
|<b>AlcoholDrinking</b>|Heavy Alcohol Consumption Calculated Variable|<i>Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week)</i>|-Yes<br>-No|
|<b>Stroke</b>|Ever Diagnosed with a Stroke|<i>(Ever told) (you had) a stroke.</i>|-Yes<br>-No|
|<b>PhysicalHealth</b>|Number of Days Physical Health Not Good|<i>Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good?</i>|Number of days [1-30]|
|<b>MentalHealth</b>|Number of Days Mental Health Not Good|<i>Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?</i>|Number of days [1-30]|
|<b>DiffWalking</b>|Difficulty Walking or Climbing Stairs|<i>Do you have serious difficulty walking or climbing stairs?</i>|-Yes<br>-No|
|<b>Sex</b>|Are you male or female?|<i>Are you male or female?</i>|-Male<br>-Female|
|<b>AgeCategory</b>|Reported age in five-year age categories calculated variable|<i>Fourteen-level age category</i>|-Age [18-79]<br>-Age [80 or older]|
|<b>Race</b>|Imputed race/ethnicity value|<i>Imputed race/ethnicity value (This value is the reported race/ethnicity or an imputed race/ethnicity, if the respondent refused to give a race/ethnicity. The value of the imputed race/ethnicity will be the most common race/ethnicity response for that region of the state)</i>|-White<br>-Black<br>-Asian<br>-American Indian/Alaskan Native<br>-Hispanic<br>-Other|
|<b>Diabetic</b>|(Ever told) you had diabetes|<i>(Ever told) (you had) diabetes? (If ´Yes´ and respondent is female, ask ´Was this only when you were pregnant?´. If Respondent says pre-diabetes or borderline diabetes, use response code 4.)</i>|-Yes<br>-No<br>-No, borderline diabetes<br>-Yes (during pregnancy)|
|<b>PhysicalActivity</b>|Exercise in Past 30 Days|<i>During the past month, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?</i>|-Yes<br>-No|
|<b>GenHealth</b>|General Health|<i>Would you say that in general your health is:</i>|-Excellent<br>-Very good<br>-Good<br>-Fair<br>-Poor|
|<b>SleepTime</b>|How Much Time Do You Sleep|<i>On average, how many hours of sleep do you get in a 24-hour period?</i>|Number of hours [1-24]|
|<b>Asthma</b>|Ever Told Had Asthma|<i>(Ever told) (you had) asthma?</i>|-Yes<br>-No|
|<b>KidneyDisease</b>|Ever told you have kidney disease?|<i>Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease?</i>|-Yes<br>-No|
|<b>SkinCancer</b>|(Ever told) you had skin cancer?|<i>(Ever told) (you had) skin cancer?</i>|-Yes<br>-No|

In [2]:
import pandas as pd
df = pd.read_csv('https://uploadify.net/e61e32d6f69dbfb1/heart_2020_cleaned.csv')
display(df)

Unnamed: 0,"<!DOCTYPE html><html><head><meta name=""google"" content=""notranslate""><meta http-equiv=""X-UA-Compatible"" content=""IE=edge;""><style nonce=""Ar3ZTxHahOvZ4XozVXzH2g"">@font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:url(//fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:300;src:url(//fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmSU5fBBc9.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:400;src:url(//fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu4mxP.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:500;src:url(//fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmEU9fBBc9.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:700;src:url(//fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmWUlfBBc9.ttf)format('truetype');}</style><meta name=""referrer"" content=""origin""><title>heart_2020_cleaned.csv - Google Drive</title><meta property=""og:title"" content=""heart_2020_cleaned.csv""><meta property=""og:type"" content=""article""><meta property=""og:site_name"" content=""Google Docs""><meta property=""og:url"" content=""https://drive.google.com/file/d/1qsp-h8ZpswakZobdnjlOS-Q1OJ4u-SJD/view?usp=sharing&amp;usp=embed_facebook""><link rel=""shortcut icon"" href=""https://ssl.gstatic.com/images/branding/product/1x/drive_2020q4_32dp.png""><link rel=""stylesheet"" href=""https://fonts.googleapis.com/css?family=Google+Sans:300",400,500,"700"" nonce=""Ar3ZTxHahOvZ4XozVXzH2g""><link rel=""stylesheet"" href=""https://www.gstatic.com/_/apps-fileview/_/ss/k=apps-fileview.v.03McfMPRSy4.L.X.O/am=AAg/d=0/rs=AO0039sJla-x6f-YFSw73-N7BaUKg9JCMQ"" nonce=""Ar3ZTxHahOvZ4XozVXzH2g""><script nonce=""RcfnITjGSsU03YFSBhX8hg"">_docs_flag_initialData={""docs-ails"":""docs_cold""","docs-fwds:""docs_sdf""","docs-crs:""docs_crs_nfd""",docs-l2t:0,docs-shdn:0,"docs-tfh:""""",info_params:{},...,2.3,5.2,rw,283,0].7,[[null.2,null.199,null.200,"https://www.gstatic.com/og/_/js/k=og.qtm.en_US.hj89-rW3G9Y.O/rt=j/m=qabr,q_dnp,qapid/exm=qaaw,qadd,qaid,qein,qhaw,qhbr,qhch,qhga,qhid,qhin,qhpr/d=1/ed=1/rs=AA2YrTuy-g1QunQbD3MW84FOnc-xAfoKVw]]]]",};this.gbar_=this.gbar_||{};(function(_){var window=this;
0,try{,,,,,,,,,,...,,,,,,,,,,
1,/*,,,,,,,,,,...,,,,,,,,,,
2,Copyright The Closure Library Authors.,,,,,,,,,,...,,,,,,,,,,
3,SPDX-License-Identifier: Apache-2.0,,,,,,,,,,...,,,,,,,,,,
4,*/,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,(function(){var a=_.he();if(_.E(a,18))ck();else{var b=_.D(a,"19)||0;window.addEventListener(""load""",function(){window.setTimeout(ck,b)})}})();,,,,,,...,,,,,,,,,,
168,}catch(e){_._DumpException(e)},,,,,,,,,,...,,,,,,,,,,
169,})(this.gbar_);,,,,,,,,,,...,,,,,,,,,,
170,// Google Inc.,,,,,,,,,,...,,,,,,,,,,
