# Logistic Regression

Logistic regression is a popular statistical model used for binary classification problems. It aims to estimate the probability that an instance belongs to a particular class. The model assumes a linear relationship between the predictors and the log-odds of the event occurring. In this response, I'll explain the geometrical intuition behind logistic regression and provide the necessary formulas.

- **Assumption:** Classes are almost perfectly linerarly separable.

## Geometrical Intuition

In logistic regression, we can interpret the output as the probability of an instance belonging to a particular class. The decision boundary that separates the two classes is a hyperplane in the feature space. Since logistic regression is a binary classifier, this hyperplane divides the feature space into two regions—one for each class.

The decision boundary is determined by the weights (coefficients) assigned to the predictors. These weights control the orientation and tilt of the hyperplane. By adjusting the weights, logistic regression finds the best-fitting decision boundary that maximizes the likelihood of the observed data.

In logistic regression, we take $y=+1$ for positive points and $y=-1$ for negative points. Also $y=mx$ in higher dimension becomes $w^Tx+b=0$, where $w$ is normal drawn from the plane to the current point $x$ and $b$ is the intercept.

When plane passes through origin $b=0$ so the equatio becomes $w^Tx=0$

![Logistic Regression](./../../assets/logistic.jpg)

From Figure,
$distance(S_i) = \frac{w^Tx}{||w||}$

If $||w||$ is a unit vector i.e. $||w||=1$

$S_i = w^Tx_i > 0$

$S_j = w^Tx_j < 0$

$y_i * w^Tx_i > 0$ means $w$ is correctly classifying the point while $y_i * w^Tx_i < 0$ means $w$ is misclassifying the point

- **Case I:** $y_i = +ve$ and $w^Tx_i = +ve$ then $y_i * w^Tx_i > 0$ i.e. $x_i$ is correctly classified
- **Case II:** $y_i = -ve$ and $w^Tx_i = -ve$ then $y_i * w^Tx_i > 0$ i.e. $x_i$ is correctly classified
- **Case III:** $y_i = +ve$ and $w^Tx_i = -ve$ then $y_i * w^Tx_i < 0$ i.e. $x_i$ is incorrectly classified
- **Case IV:** $y_i = -ve$ and $w^Tx_i = +ve$ then $y_i * w^Tx_i < 0$ i.e. $x_i$ is incorrectly classified

So the value of $w$ is chosen shuch that it gives the maximum number of points that are correctly classified points i.e. maximum value of $\sum{y_i*w^Tx_i}$. So we need to optimize the value of $w$ and the optimal value of the normal $w$ is given by;

\begin{equation}
w^* = argmax_w (\sum{y_i*w^Tx_i})
\end{equation}