# **Personal Key Indicators of Heart Disease**

**Why did we choose this dataset?**

According to the world health organization, Cardiovascular diseases (CVDs) are the leading cause of death globally. In the first nine months of 2021, more than 300,000 people died in Ukraine from CVDs. There are a lot of factors which affect heart disease. Therefore, our team decided to analyze the relationship between various factors and their effects on CVDS in order to be able to avoid these diseases.

**Where did the dataset come from and what treatments did it undergo?**

Originally, the dataset came from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020.

**About the Dataset**

The Personal Key Indicators of Heart Disease dataset contains of 320K rows and 18 columns. Fortunately, we were able to skip the data cleaning step because the entire dataset was assembled successfully and all data from it was used for analysis. Below you can see a description of each column:


*   **HeartDisease**: Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI)
*   **BMI**: Body Mass Index (BMI)

*   **Smoking**: Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]
*   **AlcoholDrinking**: Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
*   **Stroke**: (Ever told) (you had) a stroke?
*   **PhysicalHealth**: Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good? (0-30 days)
*   **MentalHealth**: Thinking about your mental health, for how many days during the past 30 days was your mental health not good? (0-30 days)
*   **DiffWalking**: Do you have serious difficulty walking or climbing stairs?
*   **Sex**: Are you male or female?
*   **AgeCategory**: Fourteen-level age category
*   **Race**: Imputed race/ethnicity value
*   **Diabetic**: (Ever told) (you had) diabetes?
*   **PhysicalActivity**: Adults who reported doing physical activity or exercise during the past 30 days other than their regular job
*   **GenHealth**: Would you say that in general your health is...
*   **SleepTime**: On average, how many hours of sleep do you get in a 24-hour period?
*   **Asthma**: (Ever told) (you had) asthma?
*   **KidneyDisease**: Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease?
*   **SkinCancer**: (Ever told) (you had) skin cancer?












In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('heart_2020_cleaned.csv')
df.head()

Unnamed: 0,HeartDisease,BMI,Smoking,AlcoholDrinking,Stroke,PhysicalHealth,MentalHealth,DiffWalking,Sex,AgeCategory,Race,Diabetic,PhysicalActivity,GenHealth,SleepTime,Asthma,KidneyDisease,SkinCancer
0,No,16.6,Yes,No,No,3.0,30.0,No,Female,55-59,White,Yes,Yes,Very good,5.0,Yes,No,Yes
1,No,20.34,No,No,Yes,0.0,0.0,No,Female,80 or older,White,No,Yes,Very good,7.0,No,No,No
2,No,26.58,Yes,No,No,20.0,30.0,No,Male,65-69,White,Yes,Yes,Fair,8.0,Yes,No,No
3,No,24.21,No,No,No,0.0,0.0,No,Female,75-79,White,No,No,Good,6.0,No,No,Yes
4,No,23.71,No,No,No,28.0,0.0,Yes,Female,40-44,White,No,Yes,Very good,8.0,No,No,No


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 319795 entries, 0 to 319794
Data columns (total 18 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   HeartDisease      319795 non-null  object 
 1   BMI               319795 non-null  float64
 2   Smoking           319795 non-null  object 
 3   AlcoholDrinking   319795 non-null  object 
 4   Stroke            319795 non-null  object 
 5   PhysicalHealth    319795 non-null  float64
 6   MentalHealth      319795 non-null  float64
 7   DiffWalking       319795 non-null  object 
 8   Sex               319795 non-null  object 
 9   AgeCategory       319795 non-null  object 
 10  Race              319795 non-null  object 
 11  Diabetic          319795 non-null  object 
 12  PhysicalActivity  319795 non-null  object 
 13  GenHealth         319795 non-null  object 
 14  SleepTime         319795 non-null  float64
 15  Asthma            319795 non-null  object 
 16  KidneyDisease     31

## **How many people have more than 20 unlucky days when they sleep less than 8 hours, and how many people have more than 20 unlucky days when they sleep more than 8 hours?**

In [None]:
print(len(df.loc[(df.SleepTime < 8 ) & (df.MentalHealth > 20)].index))
print(len(df.loc[(df.SleepTime >= 8 ) & (df.MentalHealth > 20)].index))

14666
6263


### *The result of this research*

If you take all people who have more than 20 problem days, then 70% of them sleep less than 8 hours. Therefore, we can confidently say that **SleepTime affects our MentalHealth**.

## **Does BMI affect heart disease?**

In [None]:
def range_bmi(row):
    if row.BMI < 18.5:
        row["BMICategory"] = "Underweight (BMI < 18.5)"
    elif 18.5 <= row.BMI < 25:
        row["BMICategory"] = "Normal weight (18.5 <= BMI < 25.0)"
    elif 25 <= row.BMI < 30:
        row["BMICategory"] = "Overweight (25.0 <= BMI < 30.0)"
    elif 30 <= row.BMI < 40:
        row["BMICategory"] = "Obese (30.0 <= BMI < 40.0)"
    elif 40 <= row.BMI:
        row["BMICategory"] = "Extremly Obese (BMI >= 40.0) "
    return row

(df.apply(range_bmi, axis="columns").groupby("BMICategory").apply(lambda dfy: dfy.HeartDisease.map(lambda x: x == "Yes").mean()) * 100).round(2)

BMICategory
Extremly Obese (BMI >= 40.0)          11.06
Normal weight (18.5 <= BMI < 25.0)     6.48
Obese (30.0 <= BMI < 40.0)            10.25
Overweight (25.0 <= BMI < 30.0)        8.72
Underweight (BMI < 18.5)               7.85
dtype: float64

### *The result of this research*

As we can see, with the growth of the BMI, the percentage of morbidity gradually increases (starting with Overweight). But the difference between Extremly Obese and Normal weight is only 4.6%, which is not so critical. That is, the BMI **does not have such an effect** on heart diseases.

## **Which Race has the most skin cancer patients?**

In [None]:
df.groupby('SkinCancer')['Race'].value_counts()['Yes']

Race
White                             28561
Other                               480
Hispanic                            414
American Indian/Alaskan Native      170
Black                               138
Asian                                56
Name: Race, dtype: int64

### *The result of this research*

As we can see, the White Race suffers much more from SkinCancer. People of Asian Race suffer the least. That's why we don't forget to protect our skin! But this research is not as accurate as possible because we counted numbers, not percentages. Therefore, we cannot completely take them for granted.

## **In which AgeCategory Male and Female are most affected by KidneyDisease**

In [None]:
df[df["KidneyDisease"] == "Yes"] [["Sex","AgeCategory"]].value_counts()

Sex     AgeCategory
Female  80 or older    1121
        70-74          1031
Male    70-74           939
Female  75-79           882
        65-69           871
Male    80 or older     846
        65-69           818
        75-79           717
Female  60-64           715
Male    60-64           657
Female  55-59           535
Male    55-59           463
Female  50-54           406
Male    50-54           294
Female  45-49           272
        40-44           229
Male    45-49           179
Female  35-39           153
Male    40-44           137
        35-39           103
Female  30-34           100
        18-24            74
        25-29            65
Male    30-34            64
        18-24            58
        25-29            50
dtype: int64

### *The result of this research*

In general, we see the following trend: with age, both sexes have more KidneyDisease. But you can see that women suffer from them a little more.

## **How many people feel bad for more than 15 days with and without heart disease**

In [None]:
x = df[df["PhysicalHealth"] > 15]
x["HeartDisease"].value_counts()

No     20058
Yes     5946
Name: HeartDisease, dtype: int64

### *The result of this research*

Paradoxically, most heart disease patients believe their PhysicalHealth is good, while most people who do not suffer heart diseases feel physically unwell for more than half a month. But almost 23% of people who are not physically fit have heart disease.

## **How many people have satisfactory GenHealth, if they don't smoke and do sports?**

In [None]:
print(len(df[df["GenHealth"].isin(["Fair", "Very good", "Good"]) & (df["PhysicalActivity"] == "Yes") & (df["Smoking"] == "No")]))
print(len(df[df["GenHealth"].isin(["Fair", "Very good", "Good"]) & (df["PhysicalActivity"] == "Yes") & (df["Smoking"] == "Yes")]))

108243
75987


### *The result of this research*

More than a third of people who feel good in general do not smoke and do sports. Surprisingly, 40% of people who smoke and do sports, of all people who do sports, feel good. Therefore we draw conclusions!

## **How Smoking and AlcoholDrinking affects the presence of diseases?**

In [None]:
def find_p_disease(df):
    return pd.Series([df.HeartDisease.map(lambda x: x == "Yes").mean() * 100, 
                      df.Stroke.map(lambda x: x == "Yes").mean() * 100, 
                      df.Diabetic.map(lambda x: x == "Yes").mean() * 100, 
                      df.Asthma.map(lambda x: x == "Yes").mean() * 100, 
                      df.KidneyDisease.map(lambda x: x == "Yes").mean() * 100,
                      df.SkinCancer.map(lambda x: x == "Yes").mean() * 100],
                     index=["HeartDisease", "Stroke", "Diabetic", "Asthma", "KidneyDisease", "SkinCancer"])
df.groupby(["Smoking", "AlcoholDrinking"]).apply(find_p_disease).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,HeartDisease,Stroke,Diabetic,Asthma,KidneyDisease,SkinCancer
Smoking,AlcoholDrinking,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
No,No,6.17,2.86,11.43,12.7,3.22,8.53
No,Yes,3.0,1.35,4.65,13.07,1.24,7.75
Yes,No,12.78,5.41,16.1,14.53,4.75,10.64
Yes,Yes,6.63,3.01,6.15,13.16,2.01,9.31


### *The result of this research*

We can see a very strange trend: in general, people who smoke but do not drink alcohol suffer more from diseases than people who both smoke and drink alcohol. Most of all, **Smoking affects heart disease and stroke**.

## **How Smoking and AlcoholDrinking affect DiffWalking?**

In [None]:
list_ = []
list_.append(len(df.loc[(df.Smoking == 'Yes' ) & (df.DiffWalking == 'Yes')].index))
list_.append(len(df.loc[(df.Smoking == 'Yes' ) & (df.DiffWalking == 'No')].index))
list_.append(len(df.loc[(df.AlcoholDrinking == 'Yes' ) & (df.DiffWalking == 'Yes')].index))
list_.append(len(df.loc[(df.AlcoholDrinking == 'Yes' ) & (df.DiffWalking == 'No')].index))
list_.append(len(df.loc[(df.AlcoholDrinking == 'No' ) & (df.DiffWalking == 'Yes') & (df.Smoking == 'No' ) ].index))
list_.append(len(df.loc[(df.AlcoholDrinking == 'No' ) & (df.DiffWalking == 'No') & (df.Smoking == 'No' ) ].index))

df1 = pd.DataFrame(list_, index=[ "Smoking and DiffWalking", "Smoking but not DiffWalking", 
                                    "Drinking and DiffWalking", "Drinking but not DiffWalking", "Not Drinking , Not Smoking but DiffWalking", 
                        "Not Drinking, Not Smoking and not DiffWalking"], columns=["Amounnt"])
df1

Unnamed: 0,Amounnt
Smoking and DiffWalking,24855
Smoking but not DiffWalking,107053
Drinking and DiffWalking,2040
Drinking but not DiffWalking,19737
"Not Drinking , Not Smoking but DiffWalking",19100
"Not Drinking, Not Smoking and not DiffWalking",160425


### *The result of this research*

About 18% of smokers have DiffWalking. About 9% of people who drink alcohol have problems with DifWalking. And only 10% who don't smoke and don't drink alcohol have DifWalking. That is, we conclude that AlcoholDrinking and Smoking **do not have a strong effect** on DifWalking.

## **Analysis Based on AgeCategory**

In [None]:
df = df.sort_values(by="AgeCategory")

df_groupby_age = pd.DataFrame(index=df["AgeCategory"].unique())
df_groupby_age["Amount"] = df["AgeCategory"].value_counts()
df_groupby_age["Amount with heartDisease"] = df[df["HeartDisease"] == "Yes"]["AgeCategory"].value_counts()
df_groupby_age["% of heart disease"] = round(df[df["HeartDisease"] == "Yes"]["AgeCategory"].value_counts() * 100 / df["AgeCategory"].value_counts(), 3)
df_groupby_age["% smokers with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Smoking"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["% alcoholics with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["AlcoholDrinking"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["%  stroke with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Stroke"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["%  DiffWalking with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["DiffWalking"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["%  Diabetic with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Diabetic"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["%  Asthma with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Asthma"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["%  KidneyDisease with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["KidneyDisease"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age["% of positive SkinCancer with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["SkinCancer"] == "Yes")]["AgeCategory"].value_counts() * 100 / df_groupby_age["Amount with heartDisease"], 3)
df_groupby_age

Unnamed: 0,Amount,Amount with heartDisease,% of heart disease,% smokers with heart disease,% alcoholics with heart disease,% stroke with heart disease,% DiffWalking with heart disease,% Diabetic with heart disease,% Asthma with heart disease,% KidneyDisease with heart disease,% of positive SkinCancer with heart disease
18-24,21064,130,0.617,24.615,6.923,14.615,8.462,7.692,30.769,4.615,3.846
25-29,16955,133,0.784,44.361,12.03,12.03,10.526,2.256,31.579,3.759,1.504
30-34,18753,226,1.205,58.85,9.292,13.274,14.602,7.08,28.319,4.867,1.327
35-39,20550,296,1.44,61.486,8.446,10.135,21.959,13.851,29.73,6.419,2.027
40-44,21006,486,2.314,65.638,8.436,14.815,29.424,22.84,30.453,8.642,4.321
45-49,21791,744,3.414,62.5,5.78,17.608,34.005,31.317,29.167,10.484,4.57
50-54,25382,1383,5.449,60.231,7.448,16.992,37.744,31.526,24.15,10.991,5.061
55-59,29757,2202,7.4,62.08,6.267,16.485,40.509,35.15,24.114,11.126,7.084
60-64,33686,3327,9.877,61.286,4.599,16.772,38.203,36.038,22.362,12.444,11.271
65-69,34151,4101,12.008,59.863,4.218,14.923,33.772,35.747,17.703,11.826,14.314


### *The result of this research*


*   With age, people are more likely to suffer from heart disease.
*   The difference between 18-24 and 80+ is as much as 21.9%.
*   Smoking up to 40-44 increases the percentage of heart diseases, but then there is a slight decline. However, as we can see, smoking does affect heart disease.
*   Alcohol does not affect heart disease as much. Moreover, with age, drinking alcohol, on the contrary, reduces the percentage of heart diseases.

*   Stroke and Asthma are linked to heart disease.







## **Analysis Based on Race**

In [None]:
df_groupby_race = pd.DataFrame(index=df["Race"].unique())
df_groupby_race["Amount"] = df["Race"].value_counts()
df_groupby_race["Amount with heartDisease"] = df[df["HeartDisease"] == "Yes"]["Race"].value_counts()
df_groupby_race["% of heart disease"] = round(df[df["HeartDisease"] == "Yes"]["Race"].value_counts() * 100 / df["Race"].value_counts(), 3)
df_groupby_race["% smokers with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Smoking"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["% alcoholics with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["AlcoholDrinking"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["%  stroke with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Stroke"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["%  DiffWalking with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["DiffWalking"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["%  Diabetic with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Diabetic"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["%  Asthma with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Asthma"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["%  KidneyDisease with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["KidneyDisease"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race["% of positive SkinCancer with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["SkinCancer"] == "Yes")]["Race"].value_counts() * 100 / df_groupby_race["Amount with heartDisease"], 3)
df_groupby_race

Unnamed: 0,Amount,Amount with heartDisease,% of heart disease,% smokers with heart disease,% alcoholics with heart disease,% stroke with heart disease,% DiffWalking with heart disease,% Diabetic with heart disease,% Asthma with heart disease,% KidneyDisease with heart disease,% of positive SkinCancer with heart disease
Other,10928,886,8.108,61.4,4.628,21.558,40.858,36.795,27.088,14.447,10.384
White,245212,22507,9.179,59.706,4.305,14.902,35.149,31.013,16.546,12.329,21.127
Asian,8068,266,3.297,44.737,1.88,19.549,22.556,33.459,20.677,7.519,3.759
Hispanic,27446,1443,5.258,44.213,4.643,17.256,41.026,39.709,25.433,12.474,4.019
American Indian/Alaskan Native,5202,542,10.419,69.373,3.137,26.199,51.107,41.144,26.568,14.945,7.934
Black,22939,1729,7.537,53.326,2.429,23.193,47.773,44.303,23.308,15.674,1.272


### *The result of this research*



*   White and American Indian/Alaskan Native have the most heart disease.
*   Black are more likely to have diabetes along with heart disease.

*   Asian is the least affected by various diseases. Maybe it is related to their lifestyle.





## **Analysis Based on Sex**

In [None]:
df_groupby_sex = pd.DataFrame(index=df["Sex"].unique())
df_groupby_sex["Amount"] = df["Sex"].value_counts()
df_groupby_sex["Amount with heartDisease"] = df[df["HeartDisease"] == "Yes"]["Sex"].value_counts()
df_groupby_sex["% of heart disease"] = round(df[df["HeartDisease"] == "Yes"]["Sex"].value_counts() * 100 / df["Sex"].value_counts(), 3)
df_groupby_sex["% smokers with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Smoking"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["% alcoholics with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["AlcoholDrinking"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["%  stroke with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Stroke"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["%  DiffWalking with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["DiffWalking"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["%  Diabetic with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Diabetic"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["%  Asthma with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["Asthma"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["%  KidneyDisease with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["KidneyDisease"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex["% of positive SkinCancer with heart disease"] = round(df[(df["HeartDisease"] == "Yes") & (df["SkinCancer"] == "Yes")]["Sex"].value_counts() * 100 / df_groupby_sex["Amount with heartDisease"], 3)
df_groupby_sex

Unnamed: 0,Amount,Amount with heartDisease,% of heart disease,% smokers with heart disease,% alcoholics with heart disease,% stroke with heart disease,% DiffWalking with heart disease,% Diabetic with heart disease,% Asthma with heart disease,% KidneyDisease with heart disease,% of positive SkinCancer with heart disease
Male,151990,16139,10.618,62.935,4.418,14.734,30.324,33.224,13.817,11.333,20.46
Female,167805,11234,6.695,52.341,3.81,17.901,45.701,32.001,24.061,14.474,14.937


### *The result of this research*


*   Male suffer more from heart disease.
*   Female with DifWalking and heart disease are 15% more than the same Male.



## **The connection between MentalHealth, PhysicalHealth and diseases**

In [None]:
data_tmp = df.copy()
data_tmp["MentalHealth"] = data_tmp.MentalHealth.map(lambda x: ">20" if x > 20 else "<= 20")
data_tmp["PhysicalHealth"] = data_tmp.PhysicalHealth.map(lambda x: ">20" if x > 20 else "<= 20")

data_tmp.groupby(["MentalHealth", "PhysicalHealth"]).apply(find_p_disease).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,HeartDisease,Stroke,Diabetic,Asthma,KidneyDisease,SkinCancer
MentalHealth,PhysicalHealth,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
<= 20,<= 20,7.38,3.1,11.53,12.19,3.03,9.15
<= 20,>20,23.57,11.19,28.76,21.66,11.78,14.13
>20,<= 20,8.74,4.6,12.32,21.11,3.8,6.78
>20,>20,22.8,13.19,28.31,29.21,12.2,10.82


### *The result of this research*
We can conclude that all these diseases affect PhysicalHealth more than anything else. Diabetic and Asthma affect a person's life more than anything else.

## **Do people without bad habits suffer from heart disease?**

In [None]:
data_tmp = df.copy()

data_tmp["Smoking"] = (data_tmp.Smoking == "No").astype(int)
data_tmp["AlcoholDrinking"] = (data_tmp.AlcoholDrinking == "No").astype(int)
data_tmp["SleepTime"] = ((6 <= data_tmp.SleepTime) & (data_tmp.SleepTime <= 10)).astype(int)
data_tmp["PhysicalActivity"] = (data_tmp.PhysicalActivity == "Yes").astype(int)

data_tmp["health_lfstyle_per"] = data_tmp.Smoking * 0.3 + data_tmp.AlcoholDrinking * 0.2 + data_tmp.SleepTime * 0.2 + data_tmp.PhysicalActivity * 0.3

pd.DataFrame({"HeartDisease_per": [data_tmp[data_tmp["health_lfstyle_per"] > 0.75].HeartDisease.map(lambda x: x == "Yes").mean(), data_tmp[data_tmp["health_lfstyle_per"] <= 0.75].HeartDisease.map(lambda x: x == "Yes").mean()]}, 
              index=["> 0.75", "<= 0.75"]).rename_axis("health_lfstyle_per")

Unnamed: 0_level_0,HeartDisease_per
health_lfstyle_per,Unnamed: 1_level_1
> 0.75,0.050724
<= 0.75,0.116982


### *The result of this research*

Only 0.05% of people with healthy habits have heart disease. But you must always remember that life is unpredictable. Therefore, let's appreciate every moment!

## **Conclusion**

Our team analyzed a very important dataset for each person. In fact, our analysis is only the beginning of research on this important topic. We have analyzed many factors that do or do not affect heart disease. This work can be used for further research so that terrible diseases can be predicted even more accurately. Let's appreciate our life!