### Logistic Regression in Machine Learning

Logistic regression is a widely used machine learning algorithm that falls under the **Supervised Learning** category. It is mainly used for predicting the **categorical dependent variable** using a set of **independent variables**.

- **Output**: Unlike linear regression, which predicts continuous values, logistic regression predicts the probability of an outcome that is either **0** or **1**, which means the output is **binary** (e.g., Yes/No, True/False).

- **Probabilistic Predictions**: Instead of predicting exact values, logistic regression predicts probabilities between 0 and 1, indicating the likelihood of a certain class.

- **Logistic vs Linear Regression**: While linear regression fits a straight line, logistic regression fits an "S" shaped **logistic function** to predict binary outcomes.

- **Applications**: Logistic regression is commonly used in classification tasks, such as determining whether a cell is cancerous, or whether a mouse is obese based on its weight.

The **logistic function curve** is used to calculate the probability of an event, which is essential for classifying data into one of two categories.

### Key Points:
- Logistic regression is used for **classification** problems, not regression.
- The algorithm provides **probabilities** for binary outcomes.
- The output is constrained between **0 and 1**, making it ideal for binary classification tasks.

Below is an example of the **logistic function curve** used in logistic regression:
![logistic-regression-in-machine-learning.png](attachment:logistic-regression-in-machine-learning.png)


### Sigmoid Function Properties

The sigmoid function, denoted as $ \sigma(z) $, has the following key properties:

- As $ z \to \infty $, $ \sigma(z) \to 1 $.
- As $ z \to -\infty $, $ \sigma(z) \to 0 $.
- The value of $ \sigma(z) $ is always bounded between 0 and 1.

This behavior allows the sigmoid function to convert continuous values into probabilities within the range of 0 to 1.

### Probability of Class Prediction

In Logistic Regression, the probability of a class can be measured as:

- $ P(y=1) = \sigma(z) $
- $ P(y=0) = 1 - \sigma(z) $

Where:
- $ \sigma(z) $ is the sigmoid function applied to the linear equation $ z $.
- $ P(y=1) $ represents the probability of the data point belonging to class 1.
- $ P(y=0) $ represents the probability of the data point belonging to class 0.


**Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it is called logistic regression, but is used to classify samples; Therefore, it falls under the classification algorithm.**

### Logistic Function (Sigmoid Function)

The **sigmoid function** is a mathematical function used in **logistic regression** to map predicted values to probabilities between **0 and 1**.

- **Range**: The sigmoid function takes any real-valued input and maps it to a value between **0** and **1**, ensuring the output represents a probability.
- **S-Shape Curve**: The curve formed by the sigmoid function is "S"-shaped, indicating the transition from 0 to 1. This curve is essential for binary classification, as it predicts the likelihood of an outcome.
- **Threshold**: In logistic regression, the **threshold value** is used to classify the output. For example, if the predicted probability is above the threshold (typically 0.5), the output is classified as **1**; otherwise, it's classified as **0**.

The logistic function is defined as:

$$
f(x) = \frac{1}{1 + e^{-x}}
$$

Where:
- \( x \) is the input (weighted sum of features).
- \( e \) is the base of the natural logarithm.

### Key Points:
- The sigmoid function maps any real number into a probability between **0 and 1**.
- The curve's S-shape makes it suitable for **binary classification**.
- **Thresholding** determines the classification based on predicted probabilities.


## Assumptions for Logistic Regression:

- The dependent variable must be categorical in nature.
- The independent variable should not have multi-collinearity.

### Logistic Regression Equation

The **Logistic Regression equation** is derived from the **Linear Regression equation**. Here's how it's mathematically formulated:

1. **Linear Regression Equation**:
   The equation of a straight line is given by:
   $$
   y = \beta_0 + \beta_1 x
   $$

2. **Transforming for Logistic Regression**:
   Since the output in logistic regression must be between 0 and 1, we divide the linear regression equation by $(1 - y)$:
   $$
   \frac{y}{1 - y} = \beta_0 + \beta_1 x
   $$

3. **Logarithmic Transformation**:
   To get a range from $-\infty$ to $+\infty$, we take the logarithm of both sides:
   $$
   \log \left( \frac{y}{1 - y} \right) = \beta_0 + \beta_1 x
   $$

   This transformation is called the **log-odds** or **logit**.

4. **Final Logistic Regression Equation**:
   The final form of the logistic regression equation is:
   $$
   \log \left( \frac{y}{1 - y} \right) = \beta_0 + \beta_1 x
   $$

   - Where:
     - $ y $ is the probability of the event occurring (between 0 and 1).
     - $ \beta_0 $ is the intercept.
     - $ \beta_1 $ is the coefficient for the feature \( x \).

This equation allows logistic regression to output probabilities, which are used for classification tasks.


## Types of Logistic Regression

Logistic regression can be classified into three types based on the categories of the dependent variable:

1. **Binomial Logistic Regression**:
   - In binomial logistic regression, the dependent variable has only two possible categories.
   - Example: 0 or 1, Pass or Fail, Yes or No.

2. **Multinomial Logistic Regression**:
   - In multinomial logistic regression, the dependent variable has more than two possible unordered categories.
   - Example: "Cat", "Dog", "Sheep".

3. **Ordinal Logistic Regression**:
   - In ordinal logistic regression, the dependent variable has more than two ordered categories.
   - Example: "Low", "Medium", "High".


### How to Evaluate Logistic Regression Model?

To evaluate the performance of a Logistic Regression model, we can use the following metrics:

- **Accuracy**: It provides the proportion of correctly classified instances.
  
  $$
  \text{Accuracy} = \frac{\text{True Positives} + \text{True Negatives}}{\text{Total}}
  $$



- **Precision**: Precision focuses on the accuracy of positive predictions.
  
  $$
  \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
  $$



- **Recall** (Sensitivity or True Positive Rate): Recall measures the proportion of correctly predicted positive instances among all actual positive instances.
  
  $$
  \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
  $$



- **F1 Score**: The F1 score is the harmonic mean of precision and recall, providing a balance between the two.
  
  $$
  \text{F1 Score} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
  $$



- **Area Under the Receiver Operating Characteristic Curve (AUC-ROC)**: The ROC curve plots the true positive rate against the false positive rate at various thresholds. AUC-ROC measures the area under this curve, providing an aggregate measure of a model’s performance across different classification thresholds.



- **Area Under the Precision-Recall Curve (AUC-PR)**: Similar to AUC-ROC, AUC-PR measures the area under the precision-recall curve, providing a summary of a model’s performance across different precision-recall trade-offs.


### Precision-Recall Tradeoff and Threshold Selection

When choosing the threshold for a Precision-Recall tradeoff, we consider the following scenarios:

- **Low Precision / High Recall**:  
  In applications where we want to reduce the number of false negatives, we select a threshold that results in low precision but high recall. This is ideal when missing a positive case is more costly than incorrectly classifying a negative case as positive.  
  **Example**: In cancer diagnosis, we prioritize identifying all possible cancer patients (high recall) even if some non-cancer patients are falsely diagnosed (low precision), as it’s better to investigate further than miss a true positive case.



- **High Precision / Low Recall**:  
  In applications where we want to minimize the number of false positives, we choose a threshold that results in high precision but low recall. This ensures that positive predictions are very accurate, even if some true positives are missed.  
  **Example**: In personalized advertising, we aim for high precision when predicting which customers will react positively to an ad, as a false positive could lead to lost sales opportunities. Missing a positive reaction (low recall) is less critical in this case.


### Differences Between Linear and Logistic Regression

| **Aspect**                           | **Linear Regression**                                      | **Logistic Regression**                                      |
|--------------------------------------|------------------------------------------------------------|-------------------------------------------------------------|
| **Purpose**                          | Used to predict a continuous dependent variable.           | Used to predict a categorical dependent variable.            |
| **Type of Problem**                  | Solves regression problems.                                | Solves classification problems.                              |
| **Prediction**                       | Predicts continuous values (e.g., price, age).             | Predicts probabilities of categories (e.g., 0 or 1, Yes or No).|
| **Model Type**                       | Finds the best fit line.                                   | Finds an S-curve (sigmoid function).                         |
| **Estimation Method**                | Uses least square estimation to estimate accuracy.         | Uses maximum likelihood estimation to estimate accuracy.     |
| **Output**                           | Continuous output values.                                  | Categorical output values (0/1, True/False, etc.).           |
| **Relationship Requirement**         | Requires a linear relationship between dependent and independent variables. | Does not require a linear relationship between variables.   |
| **Collinearity**                     | Collinearity between independent variables is allowed.     | Should have little to no collinearity between independent variables. |


![linear-regression-vs-logistic-regression.png](attachment:linear-regression-vs-logistic-regression.png)