# Heart Disease Prediction 

### Here, we tried to explore the different factors and pointers (often known as symptoms ) to determine if a person is prone to or already having some sort of heart diesease!


## Importing Libraries as necessary


In [1]:
import pandas as pd

## Importing Dataset

In [4]:
file = "heart.csv"

df =pd.read_csv(file)

df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


### Feature description

#### age -  age of the patient
#### sex - male or female ( 0 = female , 1 = male)
#### cp - Chest Pain 
#### trestbps - Resting Blood Pressure
#### chol - amoung of Cholesterol present in the blood in mg/dl
#### fbs - Fasting Blood Sugar ( 1 for high, 0 for normal)
#### restecg - resting Electrocardiogram result
#### thalach - maximum heartrate achieved
#### exang - excercise induced angina
#### oldpeak - ST depression induced by exercise relative to rest
#### slope - slope of the ST segment during peak exercise.
#### ca - number of major vessels colored by fluoroscopy
#### thal - type of heart defect or abnormality identified during the exercise stress          test. It refers to the thallium scintigraphy technique, which involves            using a radioactive substance (thallium) to assess blood flow to the heart        muscle.
####        The "thal" feature has three possible values:
####        3: Normal
####        6: Fixed defect (prior heart attack, not reversible)
####        7: Reversible defect (partial blood flow restriction, potentially indicating ischemia)

### Target

#### target - Indication of having heart disease or not. (0 for No disease, 1 for Active heart disease)


## Checking for Missing values

In [5]:
df.isnull().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

This indicates that there are no missing values in the dataset

## Checking for Duplicate values and managing them

In [7]:
df_dup = df.duplicated().any()
df_dup

True

In [9]:
df = df.drop_duplicates()

In [11]:
df.duplicated().any()

False

The duplicated values have been handled

# Data Processing

In [14]:
#Dividing the dataset into catagorical and numerical value columns

#catergorical Columns
cat_col = []

#Numerical Columns
num_col = []


#Here in the dataset the catagorical values are mostly in between the range of 1-7. And numerical Values ranges above 10.

for column in df.columns :
    if df[column].nunique() <= 10 :
        cat_col.append(column)
    else :
        num_col.append(column)
        
# Checking cat_col and num_col

print("Catagorical value columns are : " , cat_col)
print("Numerical value columns are. : " , num_col)

Catagorical value columns are :  ['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'ca', 'thal', 'target']
Numerical value columns are. :  ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
