**CIS499**: Independent Study - Internship (Summer 2021)




This file consists of the machine learning algorithm implementation for the neural networks algorithm. This machine learning model uses Sckit-learn, pandas and numpy machine learning libraries for implementation.

Step 1 **Importing the required libraries.**

In [None]:
# Import required libraries
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import sklearn
from sklearn.neural_network import MLPClassifier
from sklearn.neural_network import MLPRegressor

# Import necessary modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.metrics import r2_score

**Step 2** Reading the data and performing a basic data check.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/JWALT99/CIS420/main/heart.csv') #Allowing the data to be read
df.head()                                                                           #Printing the first 5 rows of data

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


**Data contains** <br>

* age - age in years <br>
* sex - (1 = male; 0 = female) <br>
* cp - chest pain type <br>
* trestbps - resting blood pressure (in mm Hg on admission to the hospital) <br>
* chol - serum cholestoral in mg/dl <br>
* fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) <br>
* restecg - resting electrocardiographic results <br>
* thalach - maximum heart rate achieved <br>
* exang - exercise induced angina (1 = yes; 0 = no) <br>
* oldpeak - ST depression induced by exercise relative to rest <br>
* slope - the slope of the peak exercise ST segment <br>
* ca - number of major vessels (0-3) colored by flourosopy <br>
* thal - 3 = normal; 6 = fixed defect; 7 = reversable defect <br>
* target - have disease or not (1=yes, 0=no)

In [None]:
print(df.shape)                                                                    #Describing the size of the dataset
df.describe().transpose()                                                          #Describing the data with averages

(303, 14)


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,303.0,54.366337,9.082101,29.0,47.5,55.0,61.0,77.0
sex,303.0,0.683168,0.466011,0.0,0.0,1.0,1.0,1.0
cp,303.0,0.966997,1.032052,0.0,0.0,1.0,2.0,3.0
trestbps,303.0,131.623762,17.538143,94.0,120.0,130.0,140.0,200.0
chol,303.0,246.264026,51.830751,126.0,211.0,240.0,274.5,564.0
fbs,303.0,0.148515,0.356198,0.0,0.0,0.0,0.0,1.0
restecg,303.0,0.528053,0.52586,0.0,0.0,1.0,1.0,2.0
thalach,303.0,149.646865,22.905161,71.0,133.5,153.0,166.0,202.0
exang,303.0,0.326733,0.469794,0.0,0.0,0.0,1.0,1.0
oldpeak,303.0,1.039604,1.161075,0.0,0.0,0.8,1.6,6.2


**Step 3** Creating the model

In [None]:
target_column = ['chol'] 
predictors = list(set(list(df.columns))-set(target_column))
df[predictors] = df[predictors]/df[predictors].max()
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,303.0,0.706056,0.117949,0.376623,0.616883,0.714286,0.792208,1.0
sex,303.0,0.683168,0.466011,0.0,0.0,1.0,1.0,1.0
cp,303.0,0.322332,0.344017,0.0,0.0,0.333333,0.666667,1.0
trestbps,303.0,0.658119,0.087691,0.47,0.6,0.65,0.7,1.0
chol,303.0,246.264026,51.830751,126.0,211.0,240.0,274.5,564.0
fbs,303.0,0.148515,0.356198,0.0,0.0,0.0,0.0,1.0
restecg,303.0,0.264026,0.26293,0.0,0.0,0.5,0.5,1.0
thalach,303.0,0.740826,0.113392,0.351485,0.660891,0.757426,0.821782,1.0
exang,303.0,0.326733,0.469794,0.0,0.0,0.0,1.0,1.0
oldpeak,303.0,0.167678,0.18727,0.0,0.0,0.129032,0.258065,1.0


**Step 4** Creating the training and test datasets

In [None]:
X = df[predictors].values
y = df[target_column].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)
print(X_train.shape); print(X_test.shape)

(212, 13)
(91, 13)


**Step 5** Building, Predicting and Evaluating the machine learning model

In [None]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(8,8,8), activation='relu', solver='adam', max_iter=500)
mlp.fit(X_train,y_train)

predict_train = mlp.predict(X_train)
predict_test = mlp.predict(X_test)

  y = column_or_1d(y, warn=True)


In [None]:
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_train,predict_train))
print(classification_report(y_train,predict_train))

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
              precision    recall  f1-score   support

         131       0.00      0.00      0.00         1
         149       0.00      0.00      0.00         1
         160       0.00      0.00      0.00         1
         164       0.00      0.00      0.00         1
         166       0.00      0.00      0.00         1
         169       0.00      0.00      0.00         1
         172       0.00      0.00      0.00         1
         174       0.00      0.00      0.00         1
         175       0.06      1.00      0.11         3
         176       0.00      0.00      0.00         1
         177       0.00      0.00      0.00         1
         180       0.00      0.00      0.00         1
         182       0.00      0.00      0.00         1
         185       0.00      0.00      0.00         1
         187       0.00      0.00      0.00         1
         188   

  _warn_prf(average, modifier, msg_start, len(result))
