# <font color='#31394d'> Logistic Regression Practice Exercise </font>

For this exercise we are going to use the heart dataset to predict whether or not someone will get a heart attack. You can read more about the dataset here: https://archive.ics.uci.edu/ml/datasets/Heart+Disease). The dataset is provided as a csv file in the `data` folder. 

🚀 <font color='#d9c4b1'> Exercise: </font> Start by reading in the dataset from the `data` folder and having a look at the data. Don't forget to import the necessary packages!

In [1]:
# Your code goes here!
import pandas as pd
import numpy as np
df = pd.read_csv('data/heart.csv')
df.head()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,Y
0,63,1,1,145,233,1,2,150,0,2.3,3,0.0,False
1,67,1,4,160,286,0,2,108,1,1.5,2,3.0,True
2,67,1,4,120,229,0,2,129,1,2.6,2,2.0,True
3,37,1,3,130,250,0,0,187,0,3.5,3,0.0,False
4,41,0,2,130,204,0,2,172,0,1.4,1,0.0,False


🚀 <font color='#d9c4b1'> Exercise: </font> Now standardize the features. You can learn more about standardization in the `Logistic Regression.ipynb` notebook that we used during the session!

In [2]:
# Your code goes here!
from sklearn.preprocessing import StandardScaler

# Separate the features and target variable
X = df.drop('Y', axis=1)
y = df['Y']

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Verify that the mean of each feature is now 0 and the standard deviation is 1
print(X.mean(axis=0))
print(X.std(axis=0))

[-3.14872617e-16 -1.78229783e-17  6.68361687e-17 -7.12919133e-17
 -6.23804241e-17 -5.04984386e-17 -4.75279422e-17  4.69338429e-16
  1.85656024e-18 -7.12919133e-17 -4.45574458e-17 -3.56459566e-17]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


🚀 <font color='#d9c4b1'> Exercise: </font> Fit a standard logistic regression model and determine which features look most promising.

In [3]:
# Your code goes here!
from sklearn.linear_model import LogisticRegression

# Fit a logistic regression model to the standardized data
lr = LogisticRegression()
lr.fit(X, y)

# Print the coefficient (weight) of each feature in the model
coefficients = lr.coef_[0]
feature_names = df.columns[:-1]
for feature, coef in zip(feature_names, coefficients):
    print(f'{feature}: {coef}')

X1: -0.061239983466839225
X2: 0.8521175171444971
X3: 0.6401165299116853
X4: 0.4356129975064694
X5: 0.28421529995424744
X6: -0.29379629453707523
X7: 0.16021017860054784
X8: -0.489033044816707
X9: 0.5175927985841722
X10: 0.3199261247543762
X11: 0.41258918993635796
X12: 1.1111390742332032


🚀 <font color='#d9c4b1'> Exercise: </font> Fit another model that includes only the features that you think look promising. Use cross validation and the accuracy, precision, and recall scoring metrics to determine which model is best.

In [8]:
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Select the most promising features based on the coefficients obtained from the previous exercise
X = df[['X1', 'X2', 'X3', 'X5', 'X7', 'X8', 'X9', 'X11']]

# Standardize the selected features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Fit a logistic regression model to the standardized data
lr = LogisticRegression()
lr.fit(X, y)

# Compute the cross validation scores for the fitted model
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')

# Print the cross validation scores for each fold
print('Cross validation scores:', scores)
print('Mean cross validation score:', scores.mean())

# Make predictions on the standardized data
y_pred = lr.predict(X)

# Compute the accuracy, precision, and recall of the model
accuracy = accuracy_score(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)

# Print the accuracy, precision, and recall of the model
print('Accuracy:', accuracy)
print('Precision:', precision)
print('Recall:', recall)

Cross validation scores: [0.76666667 0.83333333 0.75       0.75       0.79661017]
Mean cross validation score: 0.7793220338983051
Accuracy: 0.8060200668896321
Precision: 0.8125
Recall: 0.7536231884057971
