## Logistic Regression

In [11]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn.metrics import precision_score, recall_score

# Import the dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)

# Create a binary target variable where 1 is 'setosa' and 0 is 'not setosa'
data['Species'] = np.where(iris.target == 0, 1, 0)
data.head()


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Species
0,5.1,3.5,1.4,0.2,1
1,4.9,3.0,1.4,0.2,1
2,4.7,3.2,1.3,0.2,1
3,4.6,3.1,1.5,0.2,1
4,5.0,3.6,1.4,0.2,1


In [12]:
# Identify the independent (sepal length and width, petal length and width) and dependent variables (Species)
X = data.iloc[:,[0,1,2,3]].values
y = data.iloc[:,4].values

X = X.reshape(-1, 4)
X = preprocessing.scale(X) # scale the data so that it is easier to fit

# Split the dataset into training (75%) and testing (25%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

In [13]:
# Fit a model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Make predictions on test data
y_pred = log_reg.predict(X_test).reshape(-1,1)

In [14]:
from sklearn.metrics import confusion_matrix

classes = ['not setosa', 'setosa']  # Re-define the classes for the confusion matrix

# Create a confusion matrix
conf_mat = confusion_matrix(y_test, y_pred)
cm_df = pd.DataFrame(conf_mat, columns=classes, index=classes)
cm_df

Unnamed: 0,not setosa,setosa
not setosa,25,0
setosa,0,13


That there are no values outside of the diagonal in the above confusion matrix suggests that the model will have no difficulty predicting the species and each of the accuracy, precision and recall scores should be perfect.<br><br>

In [15]:
# Use score method to get accuracy of model
score = log_reg.score(X_test, y_test)

# Calculate precision and recall for setosa
prec = precision_score(y_test == classes.index('setosa'), y_pred == classes.index('setosa'))
rec = recall_score(y_test == classes.index('setosa'), y_pred == classes.index('setosa'))

print('Accuracy: {}'.format(score))
print('Precision:', prec)
print('Recall:', rec)

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


As predicted each of the accuracy, precision and recall scores are a perfect one so we can tell that the prediction was correct and that the model correctly predicted the species in every instance.<br><br>
I have decided not to complete the optional task as it is almost exactly the same as the original iris_logistic_regression.ipynb notebook that was provided and so something of a waste of time