## About the Dataset

This dataset contains medical information used to predict the presence or absence of heart disease in patients. It includes 918 entries and 12 columns related to demographic, behavioral, and clinical characteristics.

### Features:

- **Age**: Age of the patient (in years).
- **Sex**: Gender of the patient (1 = male; 0 = female).
- **ChestPainType**: Type of chest pain experienced (e.g., typical angina, atypical angina, non-anginal pain, asymptomatic).
- **RestingBP**: Resting blood pressure (in mm Hg).
- **Cholesterol**: Serum cholesterol level (in mg/dl).
- **FastingBS**: Fasting blood sugar > 120 mg/dl (1 = true; 0 = false).
- **RestingECG**: Resting electrocardiographic results (e.g., normal, ST-T wave abnormality, left ventricular hypertrophy).
- **MaxHR**: Maximum heart rate achieved.
- **ExerciseAngina**: Exercise-induced angina (1 = yes; 0 = no).
- **Oldpeak**: ST depression induced by exercise relative to rest.
- **ST_Slope**: The slope of the peak exercise ST segment (e.g., upsloping, flat, downsloping).
- **ca**: The number of major vessels (0–3)(nominal)
- **thal**: A blood disorder called thalassemia
0: NULL 1: normal blood flow 2: fixed defect (no blood flow in some part of the heart) 3: reversible defect (a blood flow is observed but it is not normal)(nominal)
- **HeartDisease**: Target variable (1 = presence of heart disease; 0 = absence).


## Importing required modules and libraries

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
import pickle

## Data Collection and Preprocessing

In [3]:
# loading the dataset
df=pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [4]:
df.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0


In [5]:
df.shape

(303, 14)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [7]:
df.isna().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [8]:
df.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [9]:
df['target'].value_counts()
# 1 ---> defective heart
# 0 ---> healthy heart

target
1    165
0    138
Name: count, dtype: int64

In [10]:
# independent and dependent variable
X=df.drop(columns='target')
y=df['target']

In [11]:
X

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3


In [12]:
y

0      1
1      1
2      1
3      1
4      1
      ..
298    0
299    0
300    0
301    0
302    0
Name: target, Length: 303, dtype: int64

## Train Test Split

In [13]:
X_train,X_test,y_train,y_test=train_test_split(X,y,stratify=y,test_size=0.2,random_state=2)

In [14]:
X_train

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
61,54,1,1,108,309,0,1,156,0,0.0,2,0,3
238,77,1,0,125,304,0,0,162,1,0.0,2,3,2
160,56,1,1,120,240,0,1,169,0,0.0,0,0,2
158,58,1,1,125,220,0,1,144,0,0.4,1,4,3
289,55,0,0,128,205,0,2,130,1,2.0,1,1,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...
100,42,1,3,148,244,0,0,178,0,0.8,2,2,2
49,53,0,0,138,234,0,0,160,0,0.0,2,0,2
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3
194,60,1,2,140,185,0,0,155,0,3.0,1,0,2


In [15]:
X_test

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
255,45,1,0,142,309,0,0,147,1,0.0,1,3,3
72,29,1,1,130,204,0,0,202,0,0.0,2,0,2
83,52,1,3,152,298,1,1,178,0,1.2,1,0,3
268,54,1,0,122,286,0,0,116,1,3.2,1,2,2
92,52,1,2,138,223,0,1,169,0,0.0,2,4,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
42,45,1,0,104,208,0,0,148,1,3.0,1,0,2
187,54,1,0,124,266,0,0,109,1,2.2,1,1,3
8,52,1,2,172,199,1,1,162,0,0.5,2,0,3
122,41,0,2,112,268,0,0,172,1,0.0,2,0,2


In [16]:
y_train

61     1
238    0
160    1
158    1
289    0
      ..
100    1
49     1
300    0
194    0
131    1
Name: target, Length: 242, dtype: int64

In [17]:
y_test

255    0
72     1
83     1
268    0
92     1
      ..
42     1
187    0
8      1
122    1
19     1
Name: target, Length: 61, dtype: int64

## Standardization

In [18]:
scaler=StandardScaler()

In [19]:
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)

In [20]:
X_train

array([[-0.04180248,  0.69617712,  0.04467671, ...,  0.96628239,
        -0.69876652,  1.18825929],
       [ 2.48724773,  0.69617712, -0.93821081, ...,  0.96628239,
         2.28537756, -0.50326276],
       [ 0.17811493,  0.69617712,  0.04467671, ..., -2.30421185,
        -0.69876652, -0.50326276],
       ...,
       [ 1.49761939,  0.69617712, -0.93821081, ..., -0.66896473,
         1.29066287,  1.18825929],
       [ 0.61794975,  0.69617712,  1.02756422, ..., -0.66896473,
        -0.69876652, -0.50326276],
       [-0.59159601, -1.43641607,  0.04467671, ..., -0.66896473,
        -0.69876652, -0.50326276]])

In [21]:
X_test

array([[-1.03143083,  0.69617712, -0.93821081,  0.67190529,  1.17731284,
        -0.39735971, -1.00386825, -0.16057723,  1.4500221 , -0.89200846,
        -0.66896473,  2.28537756,  1.18825929],
       [-2.79077011,  0.69617712,  0.04467671, -0.04125734, -0.78243294,
        -0.39735971, -1.00386825,  2.34503   , -0.68964466, -0.89200846,
         0.96628239, -0.69876652, -0.50326276],
       [-0.26171989,  0.69617712,  2.01045173,  1.26620749,  0.97200614,
         2.51661148,  0.87935746,  1.25167412, -0.68964466,  0.17224485,
        -0.66896473, -0.69876652,  1.18825929],
       [-0.04180248,  0.69617712, -0.93821081, -0.5166991 ,  0.74803519,
        -0.39735971, -1.00386825, -1.57282858,  1.4500221 ,  1.94600038,
        -0.66896473,  1.29066287, -0.50326276],
       [-0.26171989,  0.69617712,  1.02756422,  0.43418441, -0.42781227,
        -0.39735971,  0.87935746,  0.84166566, -0.68964466, -0.89200846,
         0.96628239,  3.28009226, -0.50326276],
       [ 1.49761939, -1.436416

## Model Training

### Logistic Regression

In [22]:
model=LogisticRegression()

In [23]:
model.fit(X_train,y_train)

## Model Evaluation

### Accuracy Score

In [24]:
y_train_pred=model.predict(X_train)
print("Accuracy on training data:",accuracy_score(y_train,y_train_pred))

Accuracy on training data: 0.8471074380165289


In [25]:
y_test_pred=model.predict(X_test)
print("Accuracy on testing data:",accuracy_score(y_test,y_test_pred))

Accuracy on testing data: 0.7868852459016393


## Making the predictive system

In [26]:
input_data=(41,0,1,130,204,0,0,172,0,1.4,2,0,2) # input for defective heart

# changing the input data to numpy array
input_data_np=np.asarray(input_data)

# reshaping the array as we are predicting for one instance
input_data_reshaped=input_data_np.reshape(1,-1)

# standardizing the input_data
input_data_std=scaler.transform(input_data_reshaped)

prediction=model.predict(input_data_std)
print(prediction)
if prediction==0:
    print("the person has healthy heart")
else:
    print("the person has defective heart")

[1]
the person has defective heart


In [27]:
input_data=(62,0,0,140,268,0,0,160,0,3.6,0,2,2) # input for healthy heart

# changing input dat into numpy array
input_data_np=np.asarray(input_data)

# reshaping the array as we are predicting for one instance
input_data_reshaped=input_data_np.reshape(1,-1)

#standardizing the input data
input_data_std=scaler.transform(input_data_reshaped)

prediction=model.predict(input_data_std)
print(prediction)
if prediction==0:
    print("the person has healthy heart")
else:
    print("the person has defective heart")

[0]
the person has healthy heart


## Saving the model using pickle

In [28]:
pickle.dump(model,open('heart_model.pkl','wb'))

In [29]:
pickle.dump(scaler,open('standardized_heart.pkl','wb'))