### Heart disease prediction

#### Step 1:Importing the libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

#### Data collection and analysis

#### Step 2: Loading the dataset
##### Cleveland Heart Disease dataset

In [None]:
hd = pd.read_csv("heart_disease_data.csv")

#### Dataset Description: Below is a brief description of the features present in the dataset:
- Age: Patients Age in years
- Sex: Gender (Male : 1; Female : 0) (Nominal)
- cp: Type of chest pain experienced by patient. This term categorized into 4 category.
0 typical angina, 1 atypical angina, 2 non- anginal pain, 3 asymptomatic (Nominal)
- trestbps: patient's level of blood pressure at resting mode in mm/HG 
- chol: Serum cholesterol in mg/dl
- fbs: Blood sugar levels on fasting > 120 mg/dl represents as 1 in case of true and 0 as false (Nominal)
- restecg: Result of electrocardiogram while at rest are represented in 3 distinct values
0 : Normal 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of >0.05 mV) 2: showing probable or definite left ventricular hypertrophyby Estes' criteria (Nominal)
- thalach: Maximum heart rate achieved
- exang: Angina induced by exercise 0 depicting NO 1 depicting Yes (Nominal)
- oldpeak: Exercise induced ST-depression in relative with the state of rest (Numeric)
- slope: ST segment measured in terms of slope during peak exercise
0: up sloping; 1: flat; 2: down sloping(Nominal)
- ca: The number of major vessels (0–3)(nominal)
- thal: A blood disorder called thalassemia
0: NULL 1: normal blood flow 2: fixed defect (no blood flow in some part of the heart) 3: reversible defect (a blood flow is observed but it is not normal(nominal)
- target: It is the target variable which we have to predict 1 means patient is suffering from heart disease and 0 means patient is normal.

In [None]:
hd.head()

In [None]:
hd.shape

In [None]:
hd.describe()

In [None]:
hd['target'].value_counts()

In [None]:
hd.groupby('target').mean()

In [None]:
X = hd.drop(columns='target', axis=1)
Y = hd['target']

In [None]:
X

In [None]:
Y

In [None]:
hd.info()

In [None]:
hd.isnull().sum()

In [None]:
X_train,X_test,Y_train,Y_test= train_test_split(X,Y,test_size=0.2,stratify=Y, random_state=2)

In [None]:
X.shape

In [None]:
X_train.shape

In [None]:
Y_train.shape

In [None]:
X_test.shape

In [None]:
Y_test.shape

In [None]:
model = LogisticRegression()

In [None]:
model.fit(X_train,Y_train)

In [None]:
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction,Y_train)

In [None]:
training_data_accuracy

In [None]:
X_test_prediction = model.predict(X_test)
testing_data_accuracy = accuracy_score(X_test_prediction,Y_test)

In [None]:
testing_data_accuracy

In [None]:
input_data = (67,1,0,160,286,0,0,108,1,1.5,1,3,2)

# changing input data to numpy array
input_numpy = np.asarray(input_data)

# Reshape the array as we are predicting for one instance

input_reshape = input_numpy.reshape(1,-1)

prediction = model.predict(input_reshape)
print(prediction)

if(prediction[0]==0):
    print('The person has heart disease')
else:
    print('The person has no heart disease')


In [None]:
import pickle

filename= 'heart.sav'
pickle.dump(model , open(filename,'wb'))

In [None]:
import pickle

# Save trained model
pickle.dump(model, open('Heart_model.pkl', 'wb'))