# Logistic Regression:

## 1. Purpose of Logistic Regression: Best Classification

The goal of `Logistic Regression` is to find the best classifier (decision boundary) that effectively separates two classes by estimating the probability of a sample belonging to class 1.

- Unlike Linear Regression, which predicts continuous values, Logistic Regression transforms inputs into probabilities using the **sigmoid function** and classifies them into two categories (e.g., 0 or 1).
- It works well for binary classification problems, such as spam detection, fraud detection, and disease prediction.

<img src="images/binary.ppm" width='350px'>

### Mathematical Representation
For a given set of input features $X$, the logistic function is:

$$
h(X) = \frac{1}{1 + e^{-(w^T x + b)}}
$$

where:
- $h(X)$ is the probability that the point belongs to **class 1**.
- $w$ is the **weight vector** (parameters to be learned).
- $x$ is the **feature vector**.
- $b$ is the **bias term**.

The best classifier maximizes the separation between the two classes.

---

## 3. Assumptions in Logistic Regression
1. **Binary Output**: The target variable should have only two categories (e.g., spam or not spam).

2. **No Multicollinearity**: Independent variables should not be highly correlated.
3. **Independence of Observations**: Each observation should be independent.
4. **Large Dataset Size**: Logistic Regression performs best with large, well-balanced datasets.

---

In [None]:
## 

### 2. Equations of Decision Boundary: Line and Plane

#### Equation of a Decision Boundary in 2D (Line)
In a two-dimensional space, the decision boundary is a straight line given by:

$$
w^T x + b = 0
$$

Expanding it in terms of coordinates:

$$
w_1 X_1 + w_2 X_2 + b = 0
$$

where:
- $w_1, w_2$ are the weights (coefficients) for features $X_1, X_2$.
- $b$ is the bias term.

### Equation of a Decision Boundary in Higher Dimensions (Plane/Hyperplane)
For an $n$-dimensional space, the decision boundary is a **hyperplane**:

$$
w^T x + b = w_1 X_1 + w_2 X_2 + ... + w_n X_n + b = 0
$$

- If $w^T x + b > 0$ → Class 1
- If $w^T x + b < 0$ → Class 0

---


In [None]:
## 

## Working of Logistic Regression: Distance Calculation

To classify a point correctly, we calculate its distance from the decision boundary.

### Distance from a Point to a Line (2D Space)
For a point $(X_1, X_2)$, the perpendicular distance **$d$** from the decision boundary is given by:

$$
d = \frac{|w^T x + b|}{\| w \|}
$$

### Distance from a Point to a Hyperplane (Higher Dimension)
For a point $(X_1, X_2, ..., X_n)$ in an $n$-dimensional space, the distance is:

$$
d = \frac{|w^T x + b|}{\| w \|}
$$

This distance determines whether the point is classified correctly.

---

In [None]:
## 

### Condition for Finding the Best Classifier Line or Plane

The best classifier is chosen bymaximizing the sum of correct classifications, given by the **argmax equation**:

$$
\underset{w,b}{\arg\max} \sum_{i=1}^{m} y_i (w^T x_i + b)
$$

This means:
- We compute $w^T x_i + b$ for each training sample $x_i$.
- The classifier that maximizes this summation is the best decision boundary.

---


In [None]:
## 


## Four Cases in Classification

| Case | Condition | Distance Calculation | Correct Classification? |
|------|-----------|----------------------|-------------------------|
| 1. Positive point in positive region | $w^T x + b > 0$ and actual class = 1 | Distance is positive | Correct ✅ |
| 2. Negative point in negative region | $w^T x + b < 0$ and actual class = 0 | Distance is negative | Correct ✅ |
| 3. Negative point in positive region | $w^T x + b > 0$ but actual class = 0 | Distance is positive | Incorrect ❌ |
| 4. Positive point in negative region | $w^T x + b < 0$ but actual class = 1 | Distance is negative | Incorrect ❌ |

Thus, cases 3 and 4 indicate misclassification while cases 1 and 2 indicate correct classification

---

In [None]:
## 


### Condition after training:
After training the model, the final decision boundary equation is:

$$
w^T x + b = 0
$$

This equation **divides the feature space into two regions**:
- Region 1 (+ class): $w^T x + b > 0$
- Region 0 (- class): $w^T x + b < 0$

The **classifier with the highest** $ \sum y_i (w^T x_i + b) $ is chosen as the best classifier.

---


