# Heart Attack in Youth Vs Adult in Germany

*This project focuses on using Python to explore and analyze men's Heart Attack in Germany between Youth and Adult, with the goal of answering the following questions:*

- Between youth and adults, who are the most prone to cardiac arrests?
  
- Between each age group, which gender is most prone to cardiac arrests?
  
- Over time, what is the evolution of heart attacks, has it increased or decreased?
  
- Are smokers more at risk?
  
- Are diabetics more at risk?
  
- What are the states most affected by cardiac arrests and what characterizes them?
  
- Does an individual's income make them more prone to cardiac arrests?




## Import and clean Data

### Importing packages

In [34]:
import pandas as pd
import numpy as np

### Load and Inspect our Dataset

We will use this dataset, which we sourced from Kaggle: [Heart Attack in Youth Vs Adult in Germany](https://www.kaggle.com/datasets/ankushpanday1/heart-attack-in-youth-vs-adult-in-germany).

The heart attack data is stored in a single file called : **heart_attack.csv**()

In [35]:
# Load dataset
heart_attack = pd.read_csv('heart_attack.csv')

# Print columns infos and a preview of the first five rows 
print({heart_attack.info()})
heart_attack.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 275644 entries, 0 to 275643
Data columns (total 22 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   State                     275644 non-null  object 
 1   Age_Group                 275644 non-null  object 
 2   Heart_Attack_Incidence    275644 non-null  int64  
 3   Year                      275644 non-null  int64  
 4   Gender                    275644 non-null  object 
 5   BMI                       275644 non-null  float64
 6   Smoking_Status            275644 non-null  object 
 7   Alcohol_Consumption       275644 non-null  float64
 8   Physical_Activity_Level   275644 non-null  object 
 9   Diet_Quality              275644 non-null  object 
 10  Family_History            275644 non-null  int64  
 11  Hypertension              275644 non-null  int64  
 12  Cholesterol_Level         275644 non-null  float64
 13  Diabetes                  275644 non-null  i

Unnamed: 0,State,Age_Group,Heart_Attack_Incidence,Year,Gender,BMI,Smoking_Status,Alcohol_Consumption,Physical_Activity_Level,Diet_Quality,...,Cholesterol_Level,Diabetes,Urban_Rural,Socioeconomic_Status,Air_Pollution_Index,Stress_Level,Healthcare_Access,Education_Level,Employment_Status,Region_Heart_Attack_Rate
0,Lower Saxony,Youth,0,2018,Other,25.6,Former Smoker,4.2,Moderate,Average,...,154.4,0,Rural,Low,31.58,Moderate,Moderate,Primary,Retired,1.92
1,Saxony,Adult,0,2021,Female,36.7,Smoker,2.4,Low,Poor,...,75.0,1,Rural,Low,46.22,High,Easy,Primary,Unemployed,14.16
2,Hesse,Youth,1,2022,Female,28.6,Smoker,29.5,High,Poor,...,121.9,0,Urban,Middle,15.69,High,Hard,Secondary,Student,3.49
3,Lower Saxony,Adult,0,2015,Male,27.6,Non-Smoker,4.2,Moderate,Poor,...,152.3,0,Urban,Low,26.5,High,Hard,Tertiary,Student,3.24
4,Hamburg,Adult,0,2015,Female,15.2,Smoker,4.3,Moderate,Good,...,130.3,0,Urban,High,11.21,High,Moderate,Tertiary,Employed,9.98


In [36]:
heart_attack.describe()

Unnamed: 0,Heart_Attack_Incidence,Year,BMI,Alcohol_Consumption,Family_History,Hypertension,Cholesterol_Level,Diabetes,Air_Pollution_Index,Region_Heart_Attack_Rate
count,275644.0,275644.0,275644.0,275644.0,275644.0,275644.0,275644.0,275644.0,275644.0,275644.0
mean,0.15007,2018.997319,24.992669,4.985734,0.30054,0.400564,130.034229,0.199525,27.486225,10.489019
std,0.357141,2.582667,4.996535,5.001789,0.458494,0.490014,30.009242,0.399644,13.001527,5.483277
min,0.0,2015.0,1.4,0.0,0.0,0.0,-7.9,0.0,5.0,1.0
25%,0.0,2017.0,21.6,1.4,0.0,0.0,109.8,0.0,16.22,5.75
50%,0.0,2019.0,25.0,3.5,0.0,0.0,130.1,0.0,27.43,10.49
75%,0.0,2021.0,28.4,6.9,1.0,1.0,150.3,0.0,38.78,15.24
max,1.0,2023.0,47.9,70.0,1.0,1.0,272.4,1.0,50.0,20.0


In [37]:
heart_attack = heart_attack.rename(
    columns = {
        'Heart_Attack_Incidence': 'HA_incidence',
        'Region_Heart_Attack_Rate': 'Region_HA_rate',
        'Socioeconomic_Status': 'SocioStatus',
        'Air_Pollution_Index': 'Air_Pollution',
        'Physical_Activity_Level': 'Physical_Activity'
    }
)


In [73]:
def dataCleaning(df):
    for i in df.columns:
        if df[i].dtype == 'object':
            df[i] = df[i].str.lower()
            df[i] = df[i].str.strip()
    return df

In [90]:
heart_attack = dataCleaning(heart_attack)
heart_attack.head()

Unnamed: 0,state,age_group,ha_incidence,year,gender,bmi,smoking_status,alcohol_consumption,physical_activity,diet_quality,...,cholesterol_level,diabetes,urban_rural,sociostatus,air_pollution,stress_level,healthcare_access,education_level,employment_status,region_ha_rate
0,lower saxony,youth,0,2018,other,25.6,former smoker,4.2,moderate,average,...,154.4,0,rural,low,31.58,moderate,moderate,primary,retired,1.92
1,saxony,adult,0,2021,female,36.7,smoker,2.4,low,poor,...,75.0,1,rural,low,46.22,high,easy,primary,unemployed,14.16
2,hesse,youth,1,2022,female,28.6,smoker,29.5,high,poor,...,121.9,0,urban,middle,15.69,high,hard,secondary,student,3.49
3,lower saxony,adult,0,2015,male,27.6,non-smoker,4.2,moderate,poor,...,152.3,0,urban,low,26.5,high,hard,tertiary,student,3.24
4,hamburg,adult,0,2015,female,15.2,smoker,4.3,moderate,good,...,130.3,0,urban,high,11.21,high,moderate,tertiary,employed,9.98


In [152]:
# heart_attack['year'].value_counts(normalize=True).sort_index()

ha_only = heart_attack[heart_attack['ha_incidence'] == 1]

age_group_HA = ha_only['age_group'].value_counts(normalize=True)*100

print(f"Percentage distribution by age group:\nAdult: {round(age_group_HA.loc['adult'])} %\nYouth: {round(age_group_HA.loc['youth'])} %")



Percentage distribution by age group:
Adult: 70 %
Youth: 30 %
3


In [150]:
pivot_ha = pd.pivot_table(
    ha_only,
    values = 'gender',
    columns = 'gender',
    index = 'age_group',
    aggfunc = 'count'
)


pivot_ha.head()

Unnamed: 0_level_0,air_pollution,air_pollution,air_pollution,alcohol_consumption,alcohol_consumption,alcohol_consumption,bmi,bmi,bmi,cholesterol_level,...,state,stress_level,stress_level,stress_level,urban_rural,urban_rural,urban_rural,year,year,year
gender,female,male,other,female,male,other,female,male,other,female,...,other,female,male,other,female,male,other,female,male,other
age_group,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
adult,9622,9635,9895,9622,9635,9895,9622,9635,9895,9622,...,9895,9622,9635,9895,9622,9635,9895,9622,9635,9895
youth,3980,4116,4118,3980,4116,4118,3980,4116,4118,3980,...,4118,3980,4116,4118,3980,4116,4118,3980,4116,4118
