<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Exercise.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Exercise: Binary classification metrics
© ExploreAI Academy

In this exercise, we train a logistic regression model and evaluate its performance by calculating overall accuracy from its confusion matrix.

## Learning objectives

By the end of this train, you should be able to:
* Train a logistic regression model
* Calculate the model's overall accuracy

## Import libraries and dataset

In an effort to conserve a particular endangered animal species, we want to be able to predict the suitability of various habitats. We have a dataset, `habitat_suitability` that contains various environmental and ecological features used to determine whether or not a habitat is suitable for the species.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [None]:
# Load dataset
habitat_df= pd.read_csv("https://raw.githubusercontent.com/Explore-AI/Public-Data/master/habitat_suitability.csv")
habitat_df.head(5)

## Exercises

Using the dataset, we want to build a classification model that will be able to classify habitats as suitable (1) or unsuitable (0).

In the code below, we prepare the dataset and train a logistic regression model, follow these preliminary steps.

In [None]:
# Prepare the data
X = habitat_df.drop('Habitat Suitability', axis=1)  # Features
y = habitat_df['Habitat Suitability']  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

### Exercise 1

**a)** 

Now that we have trained a logistic regression model on the `habitat_suitability` dataset, let's try and get information on how our model performs on new unseen data.

Hence: 

1. Use the trained logistic regression model to make predictions on the test set.
2. Import and use `confusion_matrix` from `sklearn.metrics` to generate the confusion matrix for your predictions.
3. Display the confusion matrix.

**Note:** Remember to scale the test set features to ensure consistency.

In [None]:
# Your solution here...

**b)** 

The confusion matrix is not easy to read. Let's improve on this by converting it into a dataframe with the following row and column labels `0: Unsuitable`, and `1: Suitable`. 

In [None]:
# Your solution here...

### Exercise 2

Now that we can easily interpret our confusion matrix, we want to compare the distribution of the ground truth classifications and the classifications made by the model. 

That is, we want to find out how many observations were classified as suitable (1) and unsuitable (0) habitats by the model and compare this to the counts originally in the test set.


In [None]:
# Your solution here...

### Exercise 3

From the multidimentional array `conf_matrix`, access the true positive, true negative, false positive, and false negative values and store them in the following variables `TP`, `TN`, `FP`, `FN` respectively. 

Print each value together with their label.

**Hint:** Apply your knowledge of where each of these values is located in the confusion matrix.

In [None]:
# Your solution here...

### Exercise 4

Let's now find out the overall accuracy of our model. 

**a)** Using the values from **Exercise 3**, calculate the overall accuracy using the formula: 
 
 $$Accuracy =  \frac{Correct\space predictions}{Total\space predictions}$$

**b)** Comment on the suitability of using accuracy as the sole metric for evaluating the performance of our model.

In [None]:
# Your solution here...

## Solutions

### Exercise 1

**a)** 

In [None]:
# Import the confusion_matrix function from sklearn's metrics module
from sklearn.metrics import confusion_matrix

# Scale the test dataset features using the same scaler that was applied to the training dataset
X_test_scaled = scaler.transform(X_test)

# Use the trained logistic regression model to predict the outcomes for the scaled test dataset.
y_pred = model.predict(X_test_scaled)

# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

We use the `confusion_matrix` function from `sklearn` to compare
the model's predicted values against the actual values from the test dataset.

It displays the correct and incorrect predictions across the different classes. 

**b)** 

In [None]:
# Define the labels for the confusion matrix
labels = ['0: Unsuitable ', '1: Suitable']

# Create a pandas DataFrame from the confusion matrix data and the labels defined above
matrix_df = pd.DataFrame(data=conf_matrix, index=labels, columns=labels)

# Display the resulting DataFrame
matrix_df

We create a pandas DataFrame to neatly display our previously generated confusion matrix by labeling the rows and columns according to the outcomes they represent. 

### Exercise 2

In [None]:
# Sum of each row: Ground truth totals for each class
ground_truth_totals = matrix_df.sum(axis=1)
print("Ground Truth Totals for Each Class:")
print(ground_truth_totals)

# Sum of each column: Totals for the predictions for each class
prediction_totals = matrix_df.sum(axis=0)
print("\nPrediction Totals for Each Class:")
print(prediction_totals)

We calculate the ground truth totals for each class by summing up the rows of the confusion matrix DataFrame. 

On the other hand, we calculate the totals for the predictions for each class by summing up the columns of the confusion matrix DataFrame.

By analysing the the ground truth totals can help us in understanding the class balance or imbalance inherent in the dataset while examining the prediction totals can reveal if the model has a bias towards predicting one class more than another.

### Exercise 3

In [None]:
# Extracting True Positives (TP) from the confusion matrix, located at index [1, 1]
TP = conf_matrix[1, 1]

# Extracting True Negatives (TN) from the confusion matrix, located at index [0, 0]
TN = conf_matrix[0, 0]

# Extracting False Positives (FP) from the confusion matrix, located at index [0, 1]
FP = conf_matrix[0, 1]

# Extracting False Negatives (FN) from the confusion matrix, located at index [1, 0]
FN = conf_matrix[1, 0]

print("True positive:", TP)
print("True negative:", TN)
print("False positive:", FP)
print("False negative", FN)

We extract and calculate each of the confusion matrix components which gives us insights into the types of errors the model is making as well as its successes, which is crucial for understanding how the model is performing.

### Exercise 4

**a)**

In [None]:
accuracy = (TP + TN) / (TP + TN + FP + FN)
print("Overall Accuracy:", accuracy)

Based on the given formula, the correct predictions, which are represented by the `TP` and `TN`, are divided by the total predictions which are all the values: `TP`, `TN`, `FP`, `FN`.

**b)**

From the output showing ground truth totals for each class, we see that the dataset contains significantly more instances labeled as `Unsuitable` (152) compared to `Suitable` (48), highlighting a class imbalance. 

The high accuracy `87.5%` in this context could therefore be partly due to the model's tendency to predict the majority class showing that we cannot rely on accuracy as the sole metric for evaluating our model's performance.

To get a complete picture of the model's performance, you may want to also try other evaluation approaches incorporating precision, recall, F1 score, and ROC-AUC. 

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>