# **TSK-000-184**#

I am using Heart Attack Dataset. 

**Source:** The dataset is collected from KAGGLE.

**The attributes of the dataset include the following:**

age, gender, heart rate, systolic blood pressure, diastolic blood pressure, blood sugar, ck-mb and troponin with negative or positive output. 

The medical dataset classifies either the patient is likely to have a heart attack or not. The positive and negative output is displayed by 1 or 0 respectively. The genders, male and female are set to value 1 and 0 respectively in the dataset. If blood glucose levels are above 120 then the values for the attribute are set to 1 otherwise 0.





In [139]:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

heartattack_data = pd.read_csv(r"D:\internee.pk\TSK-000-184\data preprocessing using python\Medicaldataset.csv")


# Data Cleaning

**Checking for any duplicates in the dataset for removal**

In [140]:
# heartattack_data = heartattack_data.drop_duplicates() # no duplicates were found in the dataset
heartattack_data[heartattack_data.duplicated()]

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,Blood sugar,CK-MB,Troponin,Result


**Checking for any null values in the dataset for removal**

In [141]:

heartattack_data = heartattack_data.dropna() # no null values were found in the dataset
heartattack_data

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,Blood sugar,CK-MB,Troponin,Result
0,63,1,66,160,83,160.0,1.80,0.012,negative
1,20,1,94,98,46,296.0,6.75,1.060,positive
2,56,1,64,160,77,270.0,1.99,0.003,negative
3,66,1,70,120,55,270.0,13.87,0.122,positive
4,54,1,64,112,65,300.0,1.08,0.003,negative
...,...,...,...,...,...,...,...,...,...
1314,44,1,94,122,67,204.0,1.63,0.006,negative
1315,66,1,84,125,55,149.0,1.33,0.172,positive
1316,45,1,85,168,104,96.0,1.24,4.250,positive
1317,54,1,58,117,68,443.0,5.80,0.359,positive


In [142]:
heartattack_data.head(15)

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,Blood sugar,CK-MB,Troponin,Result
0,63,1,66,160,83,160.0,1.8,0.012,negative
1,20,1,94,98,46,296.0,6.75,1.06,positive
2,56,1,64,160,77,270.0,1.99,0.003,negative
3,66,1,70,120,55,270.0,13.87,0.122,positive
4,54,1,64,112,65,300.0,1.08,0.003,negative
5,52,0,61,112,58,87.0,1.83,0.004,negative
6,38,0,40,179,68,102.0,0.71,0.003,negative
7,61,1,60,214,82,87.0,300.0,2.37,positive
8,49,0,60,154,81,135.0,2.35,0.004,negative
9,65,1,61,160,95,100.0,2.84,0.011,negative


In [143]:
heartattack_data.tail(15)

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,Blood sugar,CK-MB,Troponin,Result
1304,56,0,68,123,70,102.0,2.28,0.255,positive
1305,47,1,71,117,61,140.0,4.39,1.37,positive
1306,60,0,78,109,69,230.0,19.47,0.003,positive
1307,56,1,75,157,67,104.0,2.41,0.45,positive
1308,45,1,117,100,68,202.0,3.18,0.003,negative
1309,47,1,94,105,81,135.0,36.24,0.263,positive
1310,70,0,80,135,75,351.0,2.21,10.0,positive
1311,85,1,112,115,69,114.0,2.19,0.062,positive
1312,48,1,84,118,68,96.0,5.33,0.006,negative
1313,86,0,40,179,68,147.0,5.22,0.011,negative


In [144]:
heartattack_data.describe()

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,Blood sugar,CK-MB,Troponin
count,1319.0,1319.0,1319.0,1319.0,1319.0,1319.0,1319.0,1319.0
mean,56.193328,0.659591,78.336619,127.170584,72.269143,146.634344,15.274306,0.360942
std,13.638173,0.474027,51.63027,26.12272,14.033924,74.923045,46.327083,1.154568
min,14.0,0.0,20.0,42.0,38.0,35.0,0.321,0.001
25%,47.0,0.0,64.0,110.0,62.0,98.0,1.655,0.006
50%,58.0,1.0,74.0,124.0,72.0,116.0,2.85,0.014
75%,65.0,1.0,85.0,143.0,81.0,169.5,5.805,0.0855
max,103.0,1.0,1111.0,223.0,154.0,541.0,300.0,10.3


# Data Preprocessing

**Removing blood sugar column as it is not playing a significant role in our analysis and prediction**

In [145]:
heartattack_data = heartattack_data.drop(columns="Blood sugar")
heartattack_data

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,CK-MB,Troponin,Result
0,63,1,66,160,83,1.80,0.012,negative
1,20,1,94,98,46,6.75,1.060,positive
2,56,1,64,160,77,1.99,0.003,negative
3,66,1,70,120,55,13.87,0.122,positive
4,54,1,64,112,65,1.08,0.003,negative
...,...,...,...,...,...,...,...,...
1314,44,1,94,122,67,1.63,0.006,negative
1315,66,1,84,125,55,1.33,0.172,positive
1316,45,1,85,168,104,1.24,4.250,positive
1317,54,1,58,117,68,5.80,0.359,positive


I have converted the male and female values from integer 1 and 0 to string 'Male' and "Female'

In [146]:
mapping = { 1: 'Male', 0: 'Female'}

heartattack_data['Gender'] = heartattack_data['Gender'].map(mapping)
heartattack_data.head(15)

Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,CK-MB,Troponin,Result
0,63,Male,66,160,83,1.8,0.012,negative
1,20,Male,94,98,46,6.75,1.06,positive
2,56,Male,64,160,77,1.99,0.003,negative
3,66,Male,70,120,55,13.87,0.122,positive
4,54,Male,64,112,65,1.08,0.003,negative
5,52,Female,61,112,58,1.83,0.004,negative
6,38,Female,40,179,68,0.71,0.003,negative
7,61,Male,60,214,82,300.0,2.37,positive
8,49,Female,60,154,81,2.35,0.004,negative
9,65,Male,61,160,95,2.84,0.011,negative


**Performing Normalization on CK_MB column**

In [147]:
# **Initialize the MinMaxScaler**
scaler = MinMaxScaler()

# Reshape the 'CK-MB' column to a 2D array (required by MinMaxScaler)
CK_MB = heartattack_data['CK-MB'].values.reshape(-1, 1)

# Fit the scaler to the data and transform the data
ck_mb_normalized = scaler.fit_transform(CK_MB)

# Convert the normalized data back to a 1D array and assign it to the DataFrame
heartattack_data['CK-MB_normalized'] = ck_mb_normalized.flatten()
heartattack_data.head(25)


Unnamed: 0,Age,Gender,Heart rate,Systolic blood pressure,Diastolic blood pressure,CK-MB,Troponin,Result,CK-MB_normalized
0,63,Male,66,160,83,1.8,0.012,negative,0.004935
1,20,Male,94,98,46,6.75,1.06,positive,0.021453
2,56,Male,64,160,77,1.99,0.003,negative,0.005569
3,66,Male,70,120,55,13.87,0.122,positive,0.045212
4,54,Male,64,112,65,1.08,0.003,negative,0.002533
5,52,Female,61,112,58,1.83,0.004,negative,0.005035
6,38,Female,40,179,68,0.71,0.003,negative,0.001298
7,61,Male,60,214,82,300.0,2.37,positive,1.0
8,49,Female,60,154,81,2.35,0.004,negative,0.006771
9,65,Male,61,160,95,2.84,0.011,negative,0.008406


In [148]:
heartattack_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1319 entries, 0 to 1318
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Age                       1319 non-null   int64  
 1   Gender                    1319 non-null   object 
 2   Heart rate                1319 non-null   int64  
 3   Systolic blood pressure   1319 non-null   int64  
 4   Diastolic blood pressure  1319 non-null   int64  
 5   CK-MB                     1319 non-null   float64
 6   Troponin                  1319 non-null   float64
 7   Result                    1319 non-null   object 
 8   CK-MB_normalized          1319 non-null   float64
dtypes: float64(3), int64(4), object(2)
memory usage: 92.9+ KB
