## Logistic regression

Logistic regression is a classification model that, like linear regression, uses just a small set of learned coefficients, but its output is a probability for each class instead of a raw numeric value.  

This makes it efficient at prediction time and well suited to problems like classifying iris flowers by their measurements.

#### From kNN to logistic regression

k-nearest neighbors (kNN) classifies a new point by looking up the k closest training examples and taking a majority vote of their labels.  Because it stores all training points and computes distances to each one at prediction time, kNN can become slow and memory‑hungry as the dataset grows.

Logistic regression takes the opposite approach: during training it learns a small set of parameters (coefficients and an intercept), and at prediction time it only needs those parameters, not the whole training set.  This makes each classification very fast and memory‑light, similar in spirit to linear regression where only the line’s coefficients are needed to make predictions.

#### Iris dataset setup

The Iris dataset contains 150 flowers, each described by four numeric features: sepal length, sepal width, petal length, and petal width.  Each flower also has a species label encoded as 0, 1, or 2, typically corresponding to setosa, versicolor, and virginica.

Sepals are the outer, usually larger protective parts of the flower, and petals are the inner, typically shorter and narrower parts.  The learning task is: given a flower’s four measurements, predict which of the three species it is.

#### Starting with a simpler binary problem

Before tackling all three species, it is common to simplify the task to just two classes, for example distinguishing versicolor from virginica.  If you plot petal length for these two species, you usually see that virginica petals tend to be longer than versicolor petals, with some overlap.

A very simple classifier in this 1‑D setting is a threshold rule: choose a cut‑off, such as 4.9 cm, and classify any flower with petal length greater than this threshold as virginica and any smaller as versicolor.  The key question then becomes how to choose this threshold from data in a principled way.

#### Why plain linear regression is not ideal

One naive idea is to treat the two species labels as numeric (for example, 0 for versicolor and 1 for virginica) and fit an ordinary linear regression line to “petal length vs. label”.  A threshold can then be taken where this line crosses the midpoint between the two label values (for binary labels 0 and 1, that midpoint is 0.5), which might correspond to a petal length around 4.9 cm in this example.

This can work on clean data but has problems: linear regression is sensitive to outliers, so adding even a single unusual flower with an extreme measurement can noticeably tilt the fitted line.  In such a case the decision threshold might not move much, but the line itself changes a lot, demonstrating that the model is fitting the numeric labels in a way that is not robust and not truly designed for classification.

#### Key ideas of logistic regression

Logistic regression is specifically built for classification and directly models the probability that an input belongs to a particular class.  For the binary case (versicolor vs. virginica), the model outputs a value between 0 and 1 that can be interpreted as “probability of being virginica given the measurements,” and a natural decision rule is to predict virginica when this probability is at least 0.5.

The functional form of logistic regression causes the prediction to change smoothly with the input and to saturate near probabilities 0 and 1, which makes the model less dramatically affected by extreme outliers than ordinary linear regression on raw labels.  In the kind of petal‑length example described, the resulting probability curve will still define an effective threshold near 4.9 cm, but that threshold is much less sensitive to adding a single strange data point than the linear regression line is.

#### From thresholds to probabilities and multiclass

Unlike a bare threshold rule that only says “yes or no”, logistic regression produces a full probability value, which is richer information.  Those probabilities are important when moving beyond two classes, because they allow extensions like multinomial logistic regression that can assign probabilities across three or more species such as setosa, versicolor, and virginica at once.

In multiclass settings with the Iris data, logistic regression uses the four measurements together to estimate a probability for each species, and then predicts the class with the highest probability.  This remains efficient because the model still uses only a fixed number of coefficients rather than storing all training points like kNN, while providing interpretable probabilities for each possible class.

Sources:

[1](https://www.geeksforgeeks.org/machine-learning/logistic-regression-vs-k-nearest-neighbors-in-machine-learning/)
[2](https://codesignal.com/learn/courses/foundational-machine-learning-models-with-sklearn/lessons/logistic-regression-with-the-iris-dataset)
[3](https://vitalflux.com/knn-vs-logistic-regression-differences-examples/)
[4](https://scikit-learn.org/1.5/auto_examples/linear_model/plot_iris_logistic.html)
[5](https://www.coursera.org/articles/linear-regression-vs-logistic-regression)
[6](https://www.kaggle.com/code/rahulrajpandey31/logistic-regression-from-scratch-iris-data-set)
[7](https://www.youtube.com/watch?v=FaiGt6lWeP0)
[8](https://www.rohan-paul.com/p/ml-interview-q-series-if-a-dataset)
[9](http://www.ayub.co.ke/about/index.php/blog-technology/129-what-are-the-approaches-of-handling-outliers-in-logistic-regression)
[10](https://stackoverflow.com/questions/49603548/logistic-regression-is-sensitive-to-outliers-using-on-synthetic-2d-dataset)
[11](https://www.globaltechcouncil.org/machine-learning/logistic-regression-vs-k-nearest-neighbours-vs-support-vector-machine/)