## The Heart Dataset

File name: 'D6_Heart_Dataset_2.csv'

This dataset has been obtained from Kaggle.

The dataset contains 303 observations with 13 features and 1 class label with 0 and 1 values.
These features are discussed below:
1. age: in years
2. sex: (1 = male; 0 = female)
3. cp: chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 4 = asymptomatic)
4. trestbps: resting blood pressure, in mm Hg on admission to the hospital
5. chol: serum cholestrol in mg/dl
6. fbs: fasting blood sugar, 120 mg.dl (1 = true; 0 = false)
7. restecg: restinng electrocardiographic results (values: 0,1,2)
8. thalach: maximum heart ache achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
12. ca: number of major vessels (0-3) coloured by flouroscopy
13. thal: (3 = normal; 6 = fixed defect; 7 = reversable defect)
14. target: the predicted attribute, diagnosis of heart disease (0 = fit; 1 = diseased)

This is a binary classification problem.
Does not contain any categorical data, the dataset is clean. sed)

In [None]:
# Loading and exploring dataset
import pandas as pd
#Reading the file into a dataframe
#PATH='C:/Users/maria/Dropbox/Machine Learning and Deep Learning/Machine Learning/Undergraduate/Lectures/Datasets and Notebooks' #laptop
PATH='C:/Users/admin/Dropbox/Machine Learning and Deep Learning/Machine Learning/Undergraduate/Lectures/Datasets and Notebooks'  #office
data=pd.read_csv(f'{PATH}/D6_Heart_Dataset_2.csv')
#Displaying the read contents
data

In [None]:
# separating predictors and target
X = data.drop("target",axis=1) #predictors
Y = data["target"]  #target

In [None]:
# Splitting into train and test sets
from sklearn.model_selection import train_test_split 
X_train,X_test,Y_train,Y_test = train_test_split(X, Y,test_size=0.20,random_state=0) 

## Logistic Regression

### Model 1 - Without Data Scaling

In [None]:
%%time
# Create logistic regression object
from sklearn.linear_model import LogisticRegression 
logistic_regression1 = LogisticRegression(solver="liblinear", 
                                         random_state=0) 
# try different values for max_iter and observe the difference in training time

# Train model 
model1 = logistic_regression1.fit(X_train, Y_train) 
#Predictions 
Y_pred1 = model1.predict(X_test) 

In [None]:
# Printing results
from sklearn import metrics 
import matplotlib.pyplot as plt  
from sklearn.metrics import confusion_matrix, classification_report 

print("The accuracy is "+str(metrics.accuracy_score(Y_test,Y_pred1)*100)+"%") 
print(confusion_matrix(Y_test, Y_pred1))  
target_names = ['class 0', 'class 1'] 
print(classification_report(Y_test, Y_pred1, target_names=target_names))  

In [None]:
Y_pred1

In [None]:
model1.predict_proba(X_test)

### Model 2 - With Data Scaling

In [None]:
#Let us see values in X_train
X_train

In [None]:
# Standardizing train data
from sklearn import preprocessing
standard_scaler = preprocessing.StandardScaler()
X_train_standardized=pd.DataFrame(standard_scaler.fit_transform(X_train)) # returns standardized array
X_train_standardized

# Normalizing train data
#normal_scaler = preprocessing.MinMaxScaler()
#X_train_normalized=pd.DataFrame(normal_scaler.fit_transform(X_train)) # returns standardized array
#X_train_normalized

In [None]:
# Standardizing test data
X_test_standardized=pd.DataFrame(standard_scaler.fit_transform(X_test))
X_test_standardized

#Normalizing test data
#X_test_normalized=pd.DataFrame(normal_scaler.fit_transform(X_test)) # returns standardized array
#X_test_normalized

In [None]:
%%time
# Create logistic regression object
logistic_regression2 = LogisticRegression(solver="liblinear", 
                                         random_state=0) 
# try different values for max_iter and observe the difference in training time

# Train model 
model2 = logistic_regression2.fit(X_train_standardized, Y_train) 
#Predictions 
Y_pred2 = model2.predict(X_test_standardized) 

In [None]:
# # Printing results

print("The accuracy is "+str(metrics.accuracy_score(Y_test,Y_pred2)*100)+"%") 
print(confusion_matrix(Y_test, Y_pred2))  
target_names = ['class 0', 'class 1'] 
print(classification_report(Y_test, Y_pred2, target_names=target_names)) 