# <b> EDA on 2022 annual CDC survey data of 400k+ adults related to their health status.

### **Contributers:** 

- Ali Bin Kashif : [Know about Ali here.](https://www.linkedin.com/in/ali-bin-kashif/)
- Muhammad Aman : [Know about Aman here.](https://www.linkedin.com/in/muhammad-aman-5a9317288/)

### <b>About Dataset:</b>

According to the CDC, heart disease is a leading cause of death for people of most races in the U.S. (African Americans, American Indians and Alaska Natives, and whites). About half of all Americans (47%) have at least 1 of 3 major risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetes status, obesity (high BMI), not getting enough physical activity, or drinking too much alcohol. Identifying and preventing the factors that have the greatest impact on heart disease is very important in healthcare. In turn, developments in computing allow the application of machine learning methods to detect "patterns" in the data that can predict a patient's condition.

The dataset originally comes from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to collect data on the health status of U.S. residents. As described by the CDC: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.

### **Variables**:

There are 40 variables(columns) in this dataset.

- **State :** The U.S. state where the individual resides.

- **Sex :** Gender of the individual (Male or Female).

- **GeneralHealth :** Self-reported general health status of the individual.

- **PhysicalHealthDays :** Number of days in the past 30 days that physical health was not good.

- **MentalHealthDays :** Number of days in the past 30 days that mental health was not good.

- **LastCheckupTime :** Time since the last routine checkup or health examination.

- **PhysicalActivities :** Frequency of engaging in physical activities or exercises.

- **SleepHours :** Average number of hours of sleep per night.

- **RemovedTeeth :** Number of permanent teeth removed due to dental issues.

- **HadHeartAttack :** Whether the individual has had a heart attack.

- **HadAngina :** Whether the individual has experienced angina (chest pain or discomfort).

- **HadStroke :** Whether the individual has had a stroke.

- **HadAsthma :** Whether the individual has had asthma.

- **HadSkinCancer :** Whether the individual has had skin cancer.

- **HadCOPD :** Whether the individual has had Chronic Obstructive Pulmonary Disease (COPD).

- **HadDepressiveDisorder :** Whether the individual has had a depressive disorder.

- **HadKidneyDisease :** Whether the individual has had kidney disease.

- **HadArthritis :** Whether the individual has had arthritis.

- **HadDiabetes :** Whether the individual has had diabetes.

- **DeafOrHardOfHearing :** Whether the individual is deaf or hard of hearing.

- **BlindOrVisionDifficulty :** Whether the individual has blindness or vision difficulty.

- **DifficultyConcentrating :** Self-reported difficulty in concentrating.

- **DifficultyWalking :** Self-reported difficulty in walking.

- **DifficultyDressingBathing :** Self-reported difficulty in dressing or bathing.

- **DifficultyErrands :** Self-reported difficulty in running errands.

- **SmokerStatus :** Current smoking status of the individual (smoker, former smoker, non-smoker).

- **ECigaretteUsage :** Whether the individual uses e-cigarettes.

- **ChestScan :** Whether the individual has had a chest scan.

- **RaceEthnicityCategory :** Categorized race or ethnicity of the individual.

- **AgeCategory :** Categorized age group of the individual.

- **HeightInMeters :** Height of the individual in meters.

- **WeightInKilograms :** Weight of the individual in kilograms.

- **BMI :** Body Mass Index calculated from height and weight.

- **AlcoholDrinkers :** Whether the individual consumes alcohol.

- **HIVTesting :** Whether the individual has undergone HIV testing.

- **FluVaxLast12 :** Whether the individual received a flu vaccine in the last 12 months.

- **PneumoVaxEver :** Whether the individual has ever received a pneumonia vaccine.

- **TetanusLast10Tdap :** Time since the last tetanus vaccination (in the last 10 years, received Tdap).

- **HighRiskLastYear :** Whether the individual has been considered at high risk for the past year.

- **CovidPos :** Whether the individual tested positive for COVID-19.


# Problem Statement:

### **Understanding and Mitigating Cardiovascular Health Disparities:** An In-Depth Analysis of Risk Factors and Health Behaviors in the Adult Population.

The problem centers on identifying and understanding the factors that exert the most significant influence on heart disease prevalence.This exploration is crucial for healthcare initiatives and interventions aimed at prevention and management. The dataset encompasses a range of variables, including demographic information, health behaviors, chronic conditions, and COVID-19 status, providing a rich source for uncovering patterns, correlations, and disparities in cardiovascular health.

We're looking at big questions like what things make heart problems more likely, how mental and physical health are linked, what habits and steps help prevent heart issues, how other health problems affect the heart, and if issues like hearing or vision problems are connected to heart disease. We're also checking how smoking and e-cigarettes are used and how COVID-19 affects heart health.

Our main goal is to extract actionable insights from this wealth of data. We want to make plans that focus on specific things to help reduce differences in heart health and make everyone's heart healthier in the adult population.

# Research and Analytical Questions:

Write question here

# Importing Python Libraries

In [6]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Loading dataset

In [7]:
# Extract the given zip file if csv file is not present in the dataset folder
heart_data = pd.read_csv('./dataset/heart_2022_with_nans.csv')

# Data Identification and Cleaning

In [10]:
heart_data.head()

Unnamed: 0,State,Sex,GeneralHealth,PhysicalHealthDays,MentalHealthDays,LastCheckupTime,PhysicalActivities,SleepHours,RemovedTeeth,HadHeartAttack,...,HeightInMeters,WeightInKilograms,BMI,AlcoholDrinkers,HIVTesting,FluVaxLast12,PneumoVaxEver,TetanusLast10Tdap,HighRiskLastYear,CovidPos
0,Alabama,Female,Very good,0.0,0.0,Within past year (anytime less than 12 months ...,No,8.0,,No,...,,,,No,No,Yes,No,"Yes, received tetanus shot but not sure what type",No,No
1,Alabama,Female,Excellent,0.0,0.0,,No,6.0,,No,...,1.6,68.04,26.57,No,No,No,No,"No, did not receive any tetanus shot in the pa...",No,No
2,Alabama,Female,Very good,2.0,3.0,Within past year (anytime less than 12 months ...,Yes,5.0,,No,...,1.57,63.5,25.61,No,No,No,No,,No,Yes
3,Alabama,Female,Excellent,0.0,0.0,Within past year (anytime less than 12 months ...,Yes,7.0,,No,...,1.65,63.5,23.3,No,No,Yes,Yes,"No, did not receive any tetanus shot in the pa...",No,No
4,Alabama,Female,Fair,2.0,0.0,Within past year (anytime less than 12 months ...,Yes,9.0,,No,...,1.57,53.98,21.77,Yes,No,No,Yes,"No, did not receive any tetanus shot in the pa...",No,No


In [11]:
heart_data.shape

(445132, 40)