In [20]:
from sklearn.linear_model import LogisticRegression , LogisticRegressionCV
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
import numpy as np

# **Training a Binary Classifier**

In [3]:
# Load data with only two classes
iris = datasets.load_iris()
features = iris.data[:100,:]
target = iris.target[:100]

In [4]:
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

In [5]:
# Create lr  object and fit
lr = LogisticRegression()
model = lr.fit(features_standardized, target)

**Logistic Regression – Discussion**

Despite having **"regression"** in its name, **logistic regression** is actually a widely used **binary classifier** — meaning the target variable $y$ can only take two values: 0 or 1.

In logistic regression, a linear model like:

$$
z = \beta_0 + \beta_1 x
$$

is passed through a **logistic (sigmoid) function**:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

Thus, the probability that the $i$-th observation belongs to class 1 is:

$$
P(y_i = 1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}
$$

Where:

* $P(y_i = 1 \mid X)$: probability that $y_i = 1$, given data $X$
* $\beta_0, \beta_1$: parameters learned during training
* $e$: Euler's number (\~2.718)
* $x$: input feature(s)



In [6]:
# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
model.predict(new_observation)

array([1])


The **logistic function** constrains the output to the range $[0, 1]$, so it can be interpreted as a probability.

**Classification Rule**

* If $P(y_i = 1 \mid X) > 0.5$, predict **class 1**
* Otherwise, predict **class 0**

In [7]:
# View predicted probabilities
model.predict_proba(new_observation)

array([[0.17740549, 0.82259451]])

# **Training a Multiclass Classifier**

In [8]:
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

In [9]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

**Logistic Regression for Multiclass Classification**

By default, **logistic regression** is a **binary classifier**, meaning it can only distinguish between two classes.

However, two clever extensions allow it to handle **multiclass classification** problems:

---

**One-vs-Rest Logistic Regression (OvR)**

* Trains **one binary classifier per class**.
* For each classifier, it tries to distinguish:
  **“Is this observation class $k$? Yes or No?”**

The model then selects the class with the **highest probability**.

---

**Multinomial Logistic Regression (MLR)**

Instead of multiple independent classifiers, **a single model** is trained using the **softmax function**, which generalizes the sigmoid for multiclass:

$$
P(y_i = k \mid X) = \frac{e^{\beta_k \cdot x_i}}{\sum_{j=1}^{K} e^{\beta_j \cdot x_i}}
$$

Where:

* $P(y_i = k \mid X)$: Probability that the $i$-th sample is of class $k$
* $K$: Total number of classes
* $\beta_k$: Parameters for class $k$
* $x_i$: Feature vector for the $i$-th observation
* $e$: Euler's number


In [10]:
# Create one-vs-rest logistic regression object
logistic_regression = LogisticRegression(random_state=0, multi_class="multinomial")

# Train model
model = logistic_regression.fit(features_standardized, target)



In [11]:
new_observation = [[.5, .5, .5, .5]]

model.predict(new_observation)

array([1])

In [12]:
model.predict_proba(new_observation)

array([[0.0198333 , 0.74472208, 0.23544462]])

# **Reducing Variance Through Regularization**

Tune the regularization strength hyperparameter, C

In [13]:
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

In [14]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

Regularization Strength

* $\alpha$: Controls how strong the penalty is
* In **scikit-learn**, we use:

$$
C = \frac{1}{\alpha}
$$

So:

* **Larger $C$** = **weaker** regularization (more flexible model)
* **Smaller $C$** = **stronger** regularization (more constrained model)


In [15]:
# Create decision tree classifier object
logistic_regression = LogisticRegressionCV(
    penalty='l2', Cs=10, random_state=0, n_jobs=-1)

# Train model
model = logistic_regression.fit(features_standardized, target)

# **Training a Classifier on Very Large Data**

using average gradient (SAG) solver

In [16]:
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

In [17]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

In [19]:
# Create logistic regression object
logistic_regression = LogisticRegression(random_state=0, solver="sag")

# Train model
model = logistic_regression.fit(features_standardized, target)

When to Use `solver='sag'`?

* Your dataset has **tens or hundreds of thousands** of samples
* You want **efficient training**
* You're using **L2 regularization** (`penalty='l2'`)
* You're OK with using **only binary or multiclass** classification (not multinomial with L1)



# **Handling Imbalanced Classes**

In [24]:
iris = datasets.load_iris()
features = iris.data
target = iris.target

features = features[40:,:]
target = target[40:]

target = np.where((target == 0), 0, 1)

In [25]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

In [26]:
# Create decision tree classifier object
logistic_regression = LogisticRegression(random_state=0, class_weight="balanced")

# Train model
model = logistic_regression.fit(features_standardized, target)

When using `class_weight="balanced"`, scikit-learn computes the weight for class $j$ as:

$$
w_j = \frac{n}{k \cdot n_j}
$$

Where:

* $w_j$: Weight assigned to class $j$
* $n$: Total number of observations
* $n_j$: Number of observations in class $j$
* $k$: Total number of classes

So, **rare classes get higher weights**, making the model pay more attention to them during training.
