

---


**Predictive Analysis for Heart Disease Patient Re-admission in Hospital**


---


We are proposing idea of predictive analytics for the **‘Heart Diagnosis Patient’**.

With the help of ML algorithms we are predicting whether the patient need to re-admit the hospital again in the future or not, on the basis of the predictive analysis of the patient’s past and present reports.


---



In [1]:
#required library
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

from sklearn.model_selection import train_test_split
#from lazypredict.Supervised import LazyClassifier

In [2]:
data = pd.read_csv('framingham5k_New.csv')




---


We are **evaluating the accuracy of the algorithms.**


---

1.We are evaluating the accuracy of the **Decision Tree**.

---




In [3]:
X=data.iloc[:,:-1]
y=data.iloc[:,-1]



In [4]:
#spliting of test data and target data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
X_train.shape, X_test.shape

((2968, 15), (1272, 15))

In [5]:
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

In [6]:
# Making Predictions with Our Model
predictions = dt.predict(X_test)
print(predictions[:5])

[0. 1. 0. 0. 0.]


In [7]:
# Measuring the accuracy of our model
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, predictions))

0.7539308176100629




---


2.We are evaluating the accuracy of the Naive Bayes algorithm.


---



In [8]:
model = GaussianNB()
cv_scores = cross_val_score(model, X, y, cv=5)
    
print(model, 'Accuracy: ', round(cv_scores.mean()*100, 3), '%')

GaussianNB() Accuracy:  82.335 %


The average accuracy obtained was 82.335%

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [10]:
predict_train = model.fit(X_train, y_train).predict(X_train)


# predict the target on the test dataset
predict_test = model.predict(X_test)

# Accuracy Score on test dataset
accuracy_test = accuracy_score(y_test,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

accuracy_score on test dataset :  0.8207547169811321




---


Accuracy score for test data recieved is 82%


---



In [11]:
from sklearn.metrics import classification_report
#print(classification_report(y_train,predict_train))
print(classification_report(y_test,predict_test))

              precision    recall  f1-score   support

         0.0       0.87      0.93      0.90      1076
         1.0       0.37      0.23      0.28       196

    accuracy                           0.82      1272
   macro avg       0.62      0.58      0.59      1272
weighted avg       0.79      0.82      0.80      1272





---


3.We are evaluating the accuracy of the K-Nearest Neighbors(KNN).


---



In [12]:
#Create KNN Object.
knn = KNeighborsClassifier()
#Create x and y variables.
x=data.iloc[:,:-1]
y=data.iloc[:,-1]
#Split data into training and testing.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=4)
#Training the model.
knn.fit(x_train, y_train)
# Generate predictions on the test set
y_pred = knn.predict(x_test)
# Checking performance our model with classification report.
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

         0.0       0.85      0.95      0.89       719
         1.0       0.17      0.06      0.09       129

    accuracy                           0.81       848
   macro avg       0.51      0.50      0.49       848
weighted avg       0.75      0.81      0.77       848





---


Accuracy score for test data recieved is 81%


---





---


2.We are evaluating the accuracy of the Random Forest.


---



In [13]:
#spliting of test data and target data.
X=data.iloc[:,:-1]
y=data.iloc[:,-1]
  
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

In [14]:
#applying random forest algorithm
rm = RandomForestClassifier()
rm.fit(X_train, y_train)

# predict the mode
y_pred = rm.predict(X_test)

# performance evaluatio metrics
print(classification_report(y_pred, y_test))
score=accuracy_score(y_test,y_pred)

score

              precision    recall  f1-score   support

         0.0       0.99      0.85      0.92      1252
         1.0       0.05      0.50      0.09        20

    accuracy                           0.85      1272
   macro avg       0.52      0.68      0.50      1272
weighted avg       0.98      0.85      0.90      1272



0.8459119496855346



---


"The random forest algorithm resulted in an accuracy of 85%."


---



Combining all algorithm and predict


In [15]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score,precision_score,recall_score

estimators =[
   ('knn',knn),
    (' dt',dt),
    ('model',model),
    ('rm',rm)
    ]

# Build stack model
stack_model = StackingClassifier(
estimators=estimators, final_estimator=LogisticRegression()
)


# Train stacked model
stack_model.fit(X_train,y_train)

#Make predictions

y_train_pred = stack_model.predict (X_train)
y_test_pred = stack_model.predict(X_test)



In [16]:
# Test set model performance
stack_model_test_accuracy = accuracy_score (y_test, y_test_pred)#Calculate Accuracy 
stack_model_test_f1 = f1_score(y_test,y_test_pred, average= 'weighted') #Calculate F1-score
stack_model_test_precision = precision_score(y_test, y_test_pred, average='weighted') # Calculate Precision
stack_model_test_recall = recall_score(y_test, y_test_pred, average='weighted') # Calculate Recall

print('-------------------------------------------------------')
print( 'Model per formance for Test set ')
print('- Accuracy:%s'  %stack_model_test_accuracy)
print('- Precision:%s' %stack_model_test_precision)
print('- Recall: %s' %stack_model_test_recall)
print('- F1 score: %s'  %stack_model_test_f1)

Model performance for Training set
_ Accuracy: 0.9221698113207547 
_ Precision: 0.9287051706755005 
_ Recall: 0.9221698113207547 
- F1 score: 0.9103573788263035
-------------------------------------------------------
Model per formance for Test set 
- Accuracy:0.8474842767295597
- Precision:0.805665400767577
- Recall: 0.8474842767295597
- F1 score: 0.7911134404845096


In [17]:
import pickle
pickle_out = open("1.pkl","wb")
pickle.dump(stack_model, pickle_out)
pickle_out.close()