# Machine Learning Model for Heart Disease Prediction 
#### This notebook creates a Machine Learning model which would be capable of predicting weather a person can have a heart disease or not, based on the given medical information 

## What is Heart Disease?

Heart disease describes a range of conditions that affect your heart. Heart diseases include:

* Blood vessel disease, such as coronary artery disease
* Heart rhythm problems (arrhythmias)
* Heart defects you're born with (congenital heart defects)
* Heart valve disease
* Disease of the heart muscle
* Heart infection
Many forms of heart disease can be prevented or treated with healthy lifestyle choices.

## About Dataset

https://www.kaggle.com/johnsmith88/heart-disease-dataset

### Context
This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 76 attributes, including the predicted attribute, but all published experiments refer to using a subset of 14 of them. The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease.


### Content
Attribute Information:

1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholestoral in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) colored by flourosopy
13. thal: 0 = normal; 1 = fixed defect; 2 = reversable defect

The names and social security numbers of the patients were recently removed from the database, replaced with dummy values.

### Importing Packages

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report,confusion_matrix
from lightgbm import LGBMClassifier

ModuleNotFoundError: No module named 'matplotlib'

### Importing Dataset

In [None]:
df = pd.read_csv('/content/drive/MyDrive/Heart_Disease_Project/heart.csv')
df.head()

In [None]:
df.info()

In [None]:
df.isna().sum()

In [None]:
df.describe()

In [None]:
df.target.value_counts().plot(kind="bar")#this shows that our dataset is balanced!

In [None]:
df.sex.value_counts().plot(kind="bar")
plt.xlabel("1 = Male                 0 = Female");

In [None]:
pd.crosstab(df["cp"],df["target"]).plot(kind="bar");
plt.xlabel("Type of Chest pain")

In [None]:
plt.scatter(df.age,df.thalach);
plt.xlabel("Age")
plt.ylabel("Max Heart Rate");

In [None]:
plt.figure(figsize=(10,6))
plt.scatter(df.age[df.target == 0],df.chol[df.target == 0],c="g")
plt.scatter(df.age[df.target == 1],df.chol[df.target == 1],c="r")
plt.xlabel("Age                \nReference:    Green = Safe Cholestrol        Red = Danger Cholestrol")
plt.ylabel("Cholestrol");

In [None]:
plt.figure(figsize=(10,6))
plt.scatter(df.age[df.target==0],df.trestbps[df.target==0],c="b")
plt.scatter(df.age[df.target==1],df.trestbps[df.target==1],c="r")
plt.xlabel("Age                \nReference:    Blue = Safe Rest BP        Red = Danger Rest BP")
plt.ylabel("Resting BP");

In [None]:
corre = df.corr()
plt.figure(figsize=(15,10))
sns.heatmap(corre,annot=True,cbar=False);

### Splitting our data into Train and Test

In [None]:
from sklearn.model_selection import train_test_split
X = df.drop(["target",],axis=1)
y = df["target"]
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=10) 
X_train.shape,X_test.shape

In [None]:
#from sklearn.preprocessing import StandardScaler
#sc = StandardScaler()
#scaled_X_train = sc.fit_transform(X_train)
#scaled_X_test = sc.transform(X_test)

### Initialising our model

Here we are using lightGBM Classifier in order to train and predict on our dataset.

#### About LGBM:
It is a gradient boosting framework that makes use of tree based learning algorithms that is considered to be a very powerful algorithm when it comes to computation. It is considered to be a fast processing algorithm.

 

While other algorithms trees grow horizontally, LightGBM algorithm grows vertically meaning it grows leaf-wise and other algorithms grow level-wise. LightGBM chooses the leaf with large loss to grow. It can lower down more loss than a level wise algorithm when growing the same leaf.



In [3]:
model1 = LGBMClassifier()
model1.fit(X_train,y_train)
model1.score(X_test,y_test) #Accuracy

NameError: name 'LGBMClassifier' is not defined

### Evaluating the model

In [None]:
#Classification Report
y_preds = model1.predict(X_test)
print(classification_report(y_true =y_test,y_pred =y_preds))


In [None]:
#Confusion Matrix
cm = confusion_matrix(y_test,y_preds)
sns.heatmap(cm,annot=True,cbar=False);

In [None]:
#Cross Validation Score
cvs=cross_val_score(model1,X_train,y_train,cv=5,scoring="accuracy")
print("Accuracy = ",np.mean(cvs)*100)


### Feature Importance

In [None]:
feature_imp = pd.DataFrame({'Value':model1.feature_importances_,'Feature':X_train.columns})
plt.figure(figsize=(40, 20))
sns.set(font_scale = 5)
sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value", ascending=False))
plt.title('LightGBM Features')
plt.tight_layout()
#plt.savefig('lgbm_importances-01.png')


## Final Model

In [None]:
model = LGBMClassifier()
model.fit(X,y)
model.score(X,y)

### Save and Load model

In [4]:
import pickle 
pickle.dump(model,open("testmodel.pkl","wb"))

NameError: name 'model' is not defined

In [None]:
model_load = pickle.load(open("testmodel.pkl","rb"))
model_load.predict([[65,1,2,200,200,250,1.8,2]])

# Final Features:
1. age
2. sex
3. cp
4. trestbps
5. chol 
6. thalach
7. oldpeak
8. ca