**Data Description**

age: The person's age in years

sex: The person's sex (1 = male, 0 = female)

cp: The chest pain experienced (Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic)

trestbps: The person's resting blood pressure (mm Hg on admission to the hospital)

chol: The person's cholesterol measurement in mg/dl

fbs: The person's fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)

restecg: Resting electrocardiographic measurement (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria)

thalach: The person's maximum heart rate achieved

exang: Exercise induced angina (1 = yes; 0 = no)

oldpeak: ST depression induced by exercise relative to rest ('ST' relates to positions on the ECG plot. See more here)

slope: the slope of the peak exercise ST segment (Value 1: upsloping, Value 2: flat, Value 3: downsloping)

ca: The number of major vessels (0-3)

thal: A blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)

target: Heart disease (0 = no, 1 = yes)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df = pd.read_csv('heart.csv')
df.head()

In [None]:
df.shape

In [None]:
import pandas_profiling as ppl

In [None]:
profile = ppl.ProfileReport(df)
profile

Check for null values in the dataset. No needed as pandas_profiling has already done this job

In [None]:
df.isnull().sum().sort_values(ascending=False)

Now , Check for the Correlation in the data.

In [None]:
plt.figure(figsize=(12,10))
sns.heatmap(df.corr(),cmap='viridis',annot=True)

**Check the Correlation of features with the target variable.**

In [None]:
df.corr()['target'].sort_values(ascending=False)

The following plot shows the Distribution of Age. This Graph tells that the highest number of people suffering from heart diseases are in the age group of 55-65 years.

In [None]:
sns.set_style('whitegrid')
plt.figure(figsize=(10,5))
sns.distplot(df['age'],color='cyan',kde=False)

### Now , Let's Look at target. It is such a quite balanced with almost equal number of both classes

In [None]:
sns.countplot(df['target'],palette='rainbow')

## It's time to do some other plots.

In [None]:
plt.figure(figsize=(10,7))
sns.boxplot(df['target'], df['trestbps'],hue=df['sex'], palette = 'viridis')

In [None]:
sns.countplot(x='target',hue='sex',data=df)

In [None]:
sns.boxplot(x='target',y='age',hue='sex',data=df)

### The following function changes int-type categorical columns to object-type to perform OneHotEncoding (using pd.get_dummies). If we don't change them to object-type,after performing OneHotEncoding the values remains same.So that's why we changed them to object-type. Then we append the categorical column into categories .

In [None]:
categories = []
def categorical(df):
    for column in df.drop('target',axis=1).columns :
        if len(df[column].value_counts()) <10 and df[column].dtype != 'object': # and df[column].dtype != 'object' is no needed.
            df[column] = df[column].astype('object')
            categories.append(column)
    return df

In [None]:
df = categorical(df)

In [None]:
categories

In [None]:
df.head()

In [None]:
df.info()

### Creating Dummy Variables for those categorical columns. Make sure that drop_first = True to avoid "Dummy Variable Trap".

In [None]:
onehot = pd.get_dummies(df[categories],drop_first = True)
onehot

In [None]:
df.drop(categories,axis=1,inplace=True) # Removing those categorical columns
df

In [None]:
y = df['target']

In [None]:
df.drop('target',axis=1,inplace=True)
df = pd.concat([df,onehot],axis=1)
df.head()

In [None]:
X = df.values

In [None]:
X.shape

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=0)

In [None]:
X_train.shape,X_test.shape

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train[:,0:5] = sc.fit_transform(X_train[:,0:5])
X_test[:,0:5] = sc.transform(X_test[:,0:5])

In [None]:
from sklearn.ensemble import RandomForestClassifier,VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV

In [None]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

In [None]:
rf = RandomForestClassifier()
rf.fit(X_train,y_train)

In [None]:
predictions = rf.predict(X_test)
confusion_matrix(y_test,predictions)

# Hyperparameter Tuning Starts...!

## Tuning Random Forest

In [None]:
n_estimators = [200,300,400,500,600,700]
max_depth = range(1,12)
criterions = ['gini', 'entropy']
parameters = {'n_estimators':n_estimators,
              'max_depth':max_depth,
              'criterion': criterions
              }
grid = GridSearchCV(estimator=RandomForestClassifier(max_features='auto',n_jobs=-1),
                    param_grid=parameters,
                    cv=5,
                    verbose=1,
                    n_jobs = -1)
grid.fit(X_train,y_train)

In [None]:
rf_grid = grid.best_estimator_
rf_grid.fit(X_train,y_train)

In [None]:
predictions = rf_grid.predict(X_test)
confusion_matrix(y_test,predictions)

## Let's look at some important features...!

In [None]:
feature_importances = pd.DataFrame(rf_grid.feature_importances_,
                                   index=df.columns,
                                   columns=['importance'])
feature_importances.sort_values(by='importance', ascending=False)

## Tuning Logistic Regression

In [None]:
C_vals = [0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,2,3,3.2,3.6,
          4,5,6,7,8,9,10]
penalties = ['l1','l2']
solvers = ['liblinear', 'sag','lbfgs']
parameters = {'penalty': penalties, 'C': C_vals, 'solver':solvers}

grid = GridSearchCV(estimator=LogisticRegression(),
                    param_grid=parameters,
                    scoring='accuracy',
                    cv=5,
                    verbose=1,
                    n_jobs=-1)
grid.fit(X_train,y_train)

In [None]:
lr_grid = grid.best_estimator_
lr_grid.fit(X_train,y_train)

In [None]:
predictions = lr_grid.predict(X_test)
confusion_matrix(y_test,predictions)

## Tuning SVM

In [None]:
C = [0.01, 0.1, 1,1.2,1.5,2,2.5,3,3.2,3.5,4]
gamma = [0.0001,0.001,0.005, 0.01, 0.1, 1]
parameters = {'C': C, 'gamma' : gamma}
grid = GridSearchCV(estimator=SVC(kernel = 'rbf', probability=True),
                    param_grid=parameters,
                    scoring='accuracy',
                    verbose=1,
                    cv=5,
                    n_jobs=-1)
grid.fit(X_train,y_train)

In [None]:
svm_grid = grid.best_estimator_
svm_grid.fit(X_train,y_train)

In [None]:
predictions = svm_grid.predict(X_test)
confusion_matrix(y_test,predictions)

In [None]:
feature_importances = pd.DataFrame(rf_grid.feature_importances_,
                                   index=df.columns,
                                    columns=['importance'])
feature_importances.sort_values(by='importance', ascending=False)

## Tuning Bagging Classifier

In [None]:
from sklearn.ensemble import BaggingClassifier

In [None]:
n_estimators = [200,300,330,370,400,430,470,500,600,700]


parameters = {'n_estimators':n_estimators}

grid = GridSearchCV(BaggingClassifier(base_estimator= None),
                                 param_grid=parameters,
                                 cv=5,verbose=1,
                                 n_jobs = -1)
grid.fit(X_train,y_train)

In [None]:
bag_grid = grid.best_estimator_
bag_grid.fit(X_train,y_train)

In [None]:
predictions = bag_grid.predict(X_test)
confusion_matrix(y_test,predictions)

## Tuning XGBClassifier

In [None]:
base_score = [0.1,0.3,0.5,0.7,0.9]
max_depth = range(4,15)
learning_rate = [0.01,0.1,0.2,0.3,0.4]
gamma = [0.001,0.01,0.1,0.3,0.5]
parameters = {'base_score':base_score,
              'max_depth':max_depth,
              'learning_rate': learning_rate,
              'gamma':gamma
              }
grid = GridSearchCV(estimator=XGBClassifier(n_jobs=-1),
                    param_grid=parameters,
                    cv=5,
                    verbose=1,
                    n_jobs = -1)
grid.fit(X_train,y_train)

In [None]:
xgb_grid = grid.best_estimator_
xgb_grid.fit(X_train,y_train)

In [None]:
predictions = xgb_grid.predict(X_test)
confusion_matrix(y_test,predictions)

## Now, Combine all of them using Voting Classifier...!

In [None]:
vot_clf = VotingClassifier(estimators=[('rf',rf_grid),
                                       ('lr',lr_grid),
                                       ('svc',svm_grid),
                                       ('bag',bag_grid),
                                       ('xgb',xgb_grid)], voting='hard')
vot_clf.fit(X_train,y_train)

In [None]:
predictions = vot_clf.predict(X_test)
confusion_matrix(y_test,predictions)

In [None]:
vot_clf.score(X_test,y_test)

In [None]:
rf_grid.score(X_test,y_test)

In [None]:
bag_grid.score(X_test,y_test)

In [None]:
xgb_grid.score(X_test,y_test)

###  Let's use Artificial Neural Network (ANN) ...!

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout
from tensorflow.keras.callbacks import EarlyStopping

In [None]:
model = Sequential()

model.add(Dense(units=30,activation = 'relu' ,input_shape=(22,)))

model.add(Dropout(0.2))

model.add(Dense(units=15,activation = 'relu'))

model.add(Dropout(0.2))

model.add(Dense(units=7,activation = 'relu'))

model.add(Dropout(0.2))

model.add(Dense(units=1,activation = 'sigmoid'))


model.compile(optimizer = 'adam',loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5)

In [None]:
history = model.fit(x=X_train, 
                    y=y_train, 
                    epochs=200,
                    validation_data=(X_test, y_test),
                    verbose=1,
                    callbacks=[early_stop]
                    )

In [None]:
predictions = model.predict(X_test)

In [None]:
predictions = [1 if i>0.5 else 0 for i in predictions]

In [None]:
confusion_matrix(y_test,predictions)

## Tuning ANN Using GridSearch ....!

In [None]:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.layers import BatchNormalization

### Create a function to build our ANN model.

### Keras provides a wrapper class KerasClassifier that allows us to use our deep learning models with scikit-learn, this is especially useful when you want to tune hyperparameters using scikit-learn's RandomizedSearchCV or GridSearchCV.

In [None]:
def build_model(layers,dropout_rate=0):
    model = Sequential()
    for i,nodes in enumerate(layers):
        if i==0:
            model.add(Dense(nodes,activation='relu',input_dim=X_train.shape[1]))
        else :
            model.add(Dense(nodes,activation='relu'))
            
        model.add(BatchNormalization())
        
        if dropout_rate:
            model.add(Dropout(dropout_rate))
    
    model.add(Dense(1,activation='sigmoid'))
    model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
    return model

    
model = KerasClassifier(build_fn=build_model,verbose=0)

### Define the parameters when we fit our ANN except X and y , such as epochs,callbacks etc.

In [None]:
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5)
callbacks = [early_stop]

fit_parameters = {'callbacks': callbacks,
                  'epochs': 200,
                  'validation_data' : (X_test,y_test),
                  'verbose' : 0}

### Define some of the Hyperparameters of our model.

In [None]:
layers = [(15,1),(20,10,1),(30,15,7,1)]

parameters = dict(layers=layers,dropout_rate=[0,0.1,0.2,0.3],batch_size=[32,64,128,256])

grid = GridSearchCV(estimator=model,
                    param_grid=parameters,
                    cv=5,
                    verbose=1,
                    n_jobs=-1)

### To fit the fit_params we have to do "**fit_params"

In [None]:
grid.fit(X_train,y_train,**fit_parameters)

In [None]:
predictions = grid.predict(X_test)
confusion_matrix(y_test,predictions)

### I had used grid for every tuned model.But Below grid has the tuned ANN model because it is the latest one.

In [None]:
all_models = [rf_grid,
              lr_grid,
              svm_grid,
              bag_grid,
              xgb_grid,
              vot_clf,
              grid]
c = {}
for i in all_models :
    a = i.predict(X_test)
    b = accuracy_score(y_test,a)
    c[i] = b

In [None]:
c

## Final Prediction !!!

In [None]:
predictions = (max(c,key=c.get)).predict(X_test)

confusion_matrix(y_test,predictions)

In [None]:
print(classification_report(y_test,predictions))

## Save and Load the Model

In [None]:
import pickle

### I saved the vot_clf model because ANN or any Deep Learning model can be saved in the h5 file format.

In [None]:
filename = 'model.pkl'
pickle.dump(vot_clf, open(filename, 'wb'))

In [None]:
loaded_model = pickle.load(open(filename, 'rb'))
predictions = loaded_model.predict(X_test)
confusion_matrix(y_test,predictions)