# Day 11: Logistic Regression (Binary Classification)

Today, we will explore Logistic Regression, one of the most popular algorithms for binary classification problems. Unlike regression models that predict continuous values, logistic regression predicts the probability of a binary outcome.

# Topics Covered:
- Introduction to Binary Classification
- Theory of Logistic Regression
- Differences Between Logistic and Linear Regression
- Key Concepts: Odds Ratios and Probabilities
- Evaluating Model Performance: Accuracy, Precision, Recall, F1-Score, ROC-AUC

## 1. Introduction to Binary Classification

Binary classification is a type of classification where the target variable can take one of two possible outcomes (often referred to as "classes"). Logistic Regression is particularly useful for this because it predicts the probability of an outcome falling into a specific class

### Examples of Binary Classification:


- Spam Detection: Classifying emails as "spam" or "not spam".
- Customer Churn: Predicting whether a customer will leave a service ("churn") or stay.
- Fraud Detection: Detecting if a transaction is "fraudulent" or "legitimate".

In binary classification, the two classes are typically labeled as 0 and 1. Logistic regression estimates the probability that the target is 1, given the input features.

## 2. Theory Behind Logistic Regression

Unlike linear regression, logistic regression does not predict continuous outcomes but rather the probability of a certain class. 

It uses the logistic (sigmoid) function to convert the output of the linear model into a probability between 0 and 1.

### Logistic (Sigmoid) Function

$$
P(y=1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n)}}
$$


Where:

- $P(y=1 \mid X)$ is the probability of the positive class (e.g., customer churning).
- $\beta_0, \beta_1, ..., \beta_n$ are the coefficients of the model.
- $X_1, X_2, ..., X_n$ are the features (input variables).

### Log Odds

In logistic regression, the predicted probabilities are transformed into log-odds, which are then mapped to a probability:

$$
\log\left(\frac{P(y=1)}{1 - P(y=1)}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n
$$


This makes logistic regression a linear classifier at its core, but it outputs probabilities instead of continuous values.

## 3. Differences Between Logistic and Linear Regressio

| Logistic Regression                            | Linear Regression                               |
|------------------------------------------------|------------------------------------------------|
| Used for binary classification problems (two outcomes: 0 or 1). | Used for regression problems (predicting continuous values). |
| Outputs probabilities between 0 and 1.         | Outputs continuous numeric values.              |
| Uses the sigmoid function to map the predictions to probabilities. | Uses a straight line to fit the data points.    |
| Predictions are non-linear in nature.          | Predictions are linear in nature.               |
| Loss function: Log Loss (Cross-Entropy).       | Loss function: Mean Squared Error (MSE).        |


## 4. Key Concepts: Odds Ratios and Probabilities

### Odds Ratio: 
- This measures the odds of an event happening versus not happening. 

In logistic regression, we model the logarithm of the odds (also called log-odds).

Odds: The ratio of the probability of an event happening to the probability of it not happening.

$$ Odds = \frac{P}{(1-P)} $$

Where $ P $ is the probability of the event happening.

#### Example: 

If $ P $ = 0.8 (is chance of happening a event), 

then Odds will be 

$$ Odds = \frac{0.8}{1-0.8} = 4 $$

meaning that the Odds of happeing that event is 4 times compared to not happening 

### Probability

Logistic regression outputs a probability between 0 and 1, representing the likelihood of an event happening. If the probability is:

    - Greater than 0.5 → Predicts class 1 (event happens).
    - Less than 0.5 → Predicts class 0 (event does not happen).

### In simpler terms:

- Logistic regression transforms odds into a probability.
- Odds tell you how much more likely something is to happen compared to it not happening.
- The probability (output of the logistic model) is what you use to make the final prediction:
    - If it's over 0.5, the event is predicted to occur.
    - If it's under 0.5, the event is predicted not to occur.


### Example: Customer Churn Prediction

Let’s build a logistic regression model to predict whether a customer will churn based on several features like their monthly charges, tenure, and support tickets.

- Data: 
    - The dataset includes features like Monthly_Charges, Contract_Duration, and Total_Usage_GB to predict whether the customer will churn (Churn column: 1 for churn, 0 for no churn).
- Logistic Regression Model: 
    - We use LogisticRegression to fit the data.
- Predictions: 
    - The model predicts whether the customer will churn (y_pred).
- Probabilities: 
    - The model also outputs the probability of each prediction (y_prob).
- Accuracy: 
    - We evaluate the model's performance using accuracy score.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Example data for customer churn prediction
data = {
    'Monthly_Charges': [70, 40, 100, 60, 80, 90],
    'Contract_Duration': [12, 24, 36, 12, 24, 36],
    'Total_Usage_GB': [200, 100, 600, 300, 500, 700],
    'Churn': [0, 0, 1, 0, 1, 1]  # 1: Churn, 0: No Churn
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Features (X) and target (y)
X = df[['Monthly_Charges', 'Contract_Duration', 'Total_Usage_GB']]
y = df['Churn']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict churn on test data
y_pred = model.predict(X_test)

# Output predictions and model accuracy
print("Predicted churn outcomes:", y_pred)
print("Actual churn outcomes:   ", y_test.values)
print("Accuracy: {:.2f}".format(accuracy_score(y_test, y_pred)))

# Probability of each prediction (class probabilities)
y_prob = model.predict_proba(X_test)
print("Probabilities of predictions:", y_prob)


Predicted churn outcomes: [0 0]
Actual churn outcomes:    [0 0]
Accuracy: 1.00
Probabilities of predictions: [[9.99999820e-01 1.80351628e-07]
 [1.00000000e+00 6.50760764e-11]]


In this case:

- Predicted churn outcomes: The model predicts that neither customer will churn ([0 0]), meaning both are classified as non-churners (class 0).

- Actual churn outcomes: The actual data confirms that both customers did not churn ([0 0]), which means the model's prediction matches the real-world outcomes.

- Accuracy: The accuracy score is 1.00, meaning the model made perfect predictions on the test set (100% correct).

- Probabilities of predictions: The first customer has a probability of approximately 9.99999820e-01 (very close to 1) for not churning (class 0) and 1.80351628e-07 (very close to 0) for churning (class 1). Similarly, the second customer has a probability of 1.00000000e+00 for not churning and a very small probability (6.50760764e-11) for churning. These probabilities show high confidence in predicting no churn.

The model is highly confident that neither customer will churn, and it made perfect predictions on the test data. This kind of outcome indicates that the model is likely fitting well for this small dataset.

## 5. Evaluating Performance of a Logistic Regression Model


Before we discuss about performance metrics lets discuss about how our prediction can be classified in four outcome:

1. True Positive (TP)
- Definition: The model correctly predicted the positive class (e.g., predicted churn, and the customer actually churned).
- Example: If a model predicts that a customer will churn, and they actually do churn, this is a True Positive.

2. True Negative (TN)
- Definition: The model correctly predicted the negative class (e.g., predicted no churn, and the customer did not churn).
- Example: If a model predicts that a customer will not churn, and they indeed do not churn, this is a True Negative.

3. False Positive (FP) (also called Type I Error)
- Definition: The model predicted the positive class, but the actual outcome was negative (e.g., predicted churn, but the customer did not churn).
- Example: If a model predicts that a customer will churn, but the customer stays, this is a False Positive.

4. False Negative (FN) (also called Type II Error)
- Definition: The model predicted the negative class, but the actual outcome was positive (e.g., predicted no churn, but the customer did churn).
- Example: If a model predicts that a customer will not churn, but they actually churn, this is a False Negative.

|                    | **Predicted Churn**                                     | **Predicted No Churn**                              |
|--------------------|---------------------------------------------------------|-----------------------------------------------------|
| **Actual Churn**    | **True Positive (TP):** Customer churned and predicted to churn | **False Negative (FN):** Customer churned but predicted not to churn |
| **Actual No Churn** | **False Positive (FP):** Customer did not churn but predicted to churn | **True Negative (TN):** Customer did not churn and predicted not to churn |


To evaluate a binary classification model like logistic regression, we use several performance metrics:

- Accuracy: 
    - The proportion of correct predictions out of the total predictions made. 
    $$ Accuracy = \frac{TP+TN}{TP+TN+FP+FN} $$
    - Example
        - If the model made 100 predictions and 90 of them were correct, the accuracy is 90%.

- Precision: 
    - Precision tells us, out of all the positive predictions made, how many were actually correct.
    $$ Precision - \frac{TP}{TP+FP} $$
    - Example
        - If the model predicted 10 customers would churn, but only 7 actually did, the precision would be 70%.

- Recall(Sensitivity) : 
    - Recall tells us, out of all the actual positive cases, how many were correctly predicted by the model.
    $$ Recall =  \frac{TP}{TP+FN} $$
    - Example
        - If there were 20 customers who churned, and the model correctly identified 15 of them, the recall is 75%.

- F1 Score: 
    - The F1 score is the balance between precision and recall. It’s useful when you need to balance false positives and false negatives.
    $$ F1 score = 2.\frac{Precision.Recall}{Precision+Recall} $$
    - Example
        - If precision is 80% and recall is 60%, the F1 score helps balance them, yielding a score around 69%.

- ROC-AUC (Receiver Operating Characteristic - Area Under Curve)
    - The ROC-AUC score measures how well the model distinguishes between the two classes (0 and 1). A higher value (closer to 1) indicates better performance.
    $$$$
    - Example
        - A score of 0.90 means the model can distinguish between the two classes with 90% accuracy. 

## Conclusion of Day 11

In Day 11, we explored the fundamentals of Logistic Regression for Binary Classification. 
- We covered how logistic regression works to model the probability of an event occurring, and the key concepts of odds ratios and probabilities in binary classification problems. 
- Additionally, we learned about various evaluation metrics such as 
    - Accuracy, 
    - Precision, 
    - Recall, F1 Score, 
    - and ROC-AUC for assessing the performance of our logistic regression model. 

By the end of this day, you should be comfortable using logistic regression to solve binary classification problems and interpret its outputs effectively.

In Day 12, we will dive into Decision Trees for Regression and Classification. Decision trees are powerful, interpretable models that can handle both regression and classification tasks. We will explore how they work, their advantages, and when to use them. Furthermore, we'll introduce key concepts such as information gain, Gini impurity, and entropy. Stay tuned!