In an effort to conserve a particular endangered animal species, we want to be able to predict the suitability of various habitats. We have a dataset, habitat_suitability, that contains various environmental and ecological features used to determine whether or not a habitat is suitable for the species.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [3]:
habitat_df= pd.read_csv("https://raw.githubusercontent.com/Explore-AI/Public-Data/master/habitat_suitability.csv")
habitat_df.head(5)

Unnamed: 0,Average Temperature (°C),Annual Rainfall (mm),Vegetation Density (% coverage),Predator Presence (0 or 1),Human Disturbance Index,Altitude (meters),Water Source Availability (0 or 1),Habitat Suitability
0,20.009527,1270.407873,90.142754,1,0.39275,355.433041,1,1
1,16.228576,1419.881504,58.246594,0,0.356556,64.890245,1,1
2,25.472638,991.750374,57.89806,1,0.832856,301.426259,1,0
3,34.030446,1431.824231,41.892067,1,0.044347,390.152269,1,0
4,38.334526,1018.262946,56.814597,1,0.308421,450.584113,1,0


In [4]:
# Prepare the data
X = habitat_df.drop('Habitat Suitability', axis=1)  # Features
y = habitat_df['Habitat Suitability']  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

In [5]:
# Import the confusion_matrix function from sklearn's metrics module
from sklearn.metrics import confusion_matrix

# Scale the test dataset features using the same scaler that was applied to the training dataset
X_test_scaled = scaler.transform(X_test)

# Use the trained logistic regression model to predict the outcomes for the scaled test dataset.
y_pred = model.predict(X_test_scaled)

# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

Confusion Matrix:
 [[143   9]
 [ 16  32]]


In [6]:
# Define the labels for the confusion matrix
labels = ['0: Unsuitable ', '1: Suitable']

# Create a Pandas DataFrame from the confusion matrix data and the labels defined above
matrix_df = pd.DataFrame(data=conf_matrix, index=labels, columns=labels)

# Display the resulting DataFrame
matrix_df

Unnamed: 0,0: Unsuitable,1: Suitable
0: Unsuitable,143,9
1: Suitable,16,32


In [7]:
# Sum of each row: Ground truth totals for each class
ground_truth_totals = matrix_df.sum(axis=1)
print("Ground Truth Totals for Each Class:")
print(ground_truth_totals)

# Sum of each column: Totals for the predictions for each class
prediction_totals = matrix_df.sum(axis=0)
print("\nPrediction Totals for Each Class:")
print(prediction_totals)

Ground Truth Totals for Each Class:
0: Unsuitable     152
1: Suitable        48
dtype: int64

Prediction Totals for Each Class:
0: Unsuitable     159
1: Suitable        41
dtype: int64


In [8]:
# Extracting True Positives (TP) from the confusion matrix, located at index [1, 1]
TP = conf_matrix[1, 1]

# Extracting True Negatives (TN) from the confusion matrix, located at index [0, 0]
TN = conf_matrix[0, 0]

# Extracting False Positives (FP) from the confusion matrix, located at index [0, 1]
FP = conf_matrix[0, 1]

# Extracting False Negatives (FN) from the confusion matrix, located at index [1, 0]
FN = conf_matrix[1, 0]

print("True positive:", TP)
print("True negative:", TN)
print("False positive:", FP)
print("False negative", FN)

True positive: 32
True negative: 143
False positive: 9
False negative 16


In [9]:
accuracy = (TP + TN) / (TP + TN + FP + FN)
print("Overall Accuracy:", accuracy)

Overall Accuracy: 0.875
