# Machine Learning Fundamentals: Sensitivity and Specificity
In this notebook, we will discuss two key concepts in Machine Learning: Sensitivity and Specificity. These concepts are fundamental in understanding the performance of a classification model.

## Introduction
Sensitivity and Specificity are statistical measures of the performance of a binary classification test. They measure the proportion of actual positives that are correctly identified (sensitivity) and the proportion of negatives that are correctly identified (specificity).

## Importing Libraries
First, we will import necessary libraries.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer


## Confusion Matrix
Before we can understand Sensitivity and Specificity, we need to understand the confusion matrix. The confusion matrix is a table layout that allows visualization of the performance of an algorithm. The fundamental components of a confusion matrix are true positives, true negatives, false positives, and false negatives.

## Sensitivity and Specificity
Sensitivity (also known as the true positive rate) measures the proportion of actual positives that are correctly identified. The formula for sensitivity is:

$$ Sensitivity = \frac{True Positives}{True Positives + False Negatives} $$

Specificity (also known as the true negative rate) measures the proportion of actual negatives that are correctly identified. The formula for specificity is:

$$ Specificity = \frac{True Negatives}{True Negatives + False Positives} $$

## Example: Heart Disease Dataset
We will use an example dataset related to heart disease to illustrate these concepts. This dataset contains information about patients and whether they have heart disease or not. Our task is to build a model that predicts whether a patient has heart disease based on the given features.

In [2]:
# Assuming 'heart_disease.csv' is a csv file with the heart disease data
# Load the Breast Cancer dataset
cancer_data = load_breast_cancer()

# Convert to DataFrame for easier manipulation
X = pd.DataFrame(cancer_data.data, columns=cancer_data.feature_names)
y = pd.Series(cancer_data.target)

## Building a Classification Model
We will split the dataset into a training set and a testing set. Then, we will build a logistic regression model using the training data and make predictions on the testing data.

In [3]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lr_model = LogisticRegression()
lr_model.fit(X_train, y_train)
lr_predictions = lr_model.predict(X_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


## Evaluating the Model
We can evaluate the model's performance by calculating the sensitivity and specificity from the confusion matrix.

In [4]:
def calculate_sensitivity_specificity(y_test, y_pred_test):
    cm = confusion_matrix(y_test, y_pred_test)
    sensitivity = cm[0,0] / (cm[0,0] + cm[0,1])
    specificity = cm[1,1] / (cm[1,0] + cm[1,1])
    return sensitivity, specificity

sensitivity_lr, specificity_lr = calculate_sensitivity_specificity(y_test, lr_predictions)
print('Logistic Regression Sensitivity: ', sensitivity_lr)
print('Logistic Regression Specificity: ', specificity_lr)

Logistic Regression Sensitivity:  0.9302325581395349
Logistic Regression Specificity:  0.9859154929577465


## Conclusion
Sensitivity and Specificity are important concepts in evaluating the performance of a classification model. They give us insights on how well the model is performing in correctly identifying positive and negative classes. Depending on the problem at hand, we might want to optimize our model for higher sensitivity or higher specificity.