# LogisticRegression

- Logistic Regression is a classification algorithm, despite the name "regression".

- It predicts probabilities using a sigmoid.

- It predicts the probability that a given input belongs to a class.

- Works well for binary classification (e.g., spam vs not-spam), but can also be extended to multiclass classification.
  

***



- When scikit-learn trains Logistic Regression, it needs to solve an optimisation problem (minimise a cost function).

  **The Cost Function (Loss Function**)
  
  - To measure how good/bad our predictions are, we use a cost function.
    
  - For Logistic Regression, it’s usually the Log Loss (Cross-Entropy Loss)
     - If prediction matches reality --> cost is low.

     - If prediction is wrong with high confidence --> cost is very high.

  **Optimisation Problem**
  
  - we need to find the values of w (weights (coefficients) we need to learn) and b (bias (intercept)) that minimise the cost function.
 
  - Since there’s no simple formula to solve this, we rely on numerical optimisation methods (solvers).

 



***

## Solvers

- Scikit-learn uses different algorithms (solvers) to minimise the cost function:

1. Gradient Descent (sag, saga) --> gradually adjust w and b by moving in the opposite direction of the gradient.

2. Newton’s Method (newton-cg, lbfgs) --> uses second derivatives (Hessian matrix) for faster convergence.

3. Coordinate Descent (liblinear) --> optimises one weight at a time.

 - Each solver has pros/cons depending on dataset size, number of features, and type of regularisation.

**1. liblinear**

- Good for small datasets.

- Supports L1 and L2 regularisation.

- Only supports binary classification directly.

- For multiclass, you need One-vs-Rest (OvR): train one model per class vs all others.

**lbfgs**

- Handles L2 regularisation.

- Works well for large datasets and multiclass problems.

- Optimises the multinomial loss (a proper multiclass logistic regression formulation).

**newton-cg**

- Similar to lbfgs, also supports L2 regularisation and multinomial loss.

- Slower, but very accurate.

**sag (Stochastic Average Gradient)**

- Fast for large datasets with many samples and features.

- Supports L2 regularisation only.

**saga**

- Very flexible.

- Supports L1, L2, and Elastic-Net regularisation.

- Works for multiclass problems with multinomial loss.

- Scales well for large datasets.

***

## Overfitting

Overfitting happens when a model learns too much detail from the training data, including noise and random fructuations, instead of just the true underlying patterns.

So, the model performs very well on training data, but badly on new/unseen data (poor generalisation)

- Think of it like memorising exam past papers word-for-word instead of understanding concepts, you’ll fail if the exam questions change slightly.

**Signs of Overfitting**

1. Training accuracy very high, test accuracy much lower.

2. Coefficients are very large (model is too sensitive to features).

3. Model reacts strongly to small variations in data.


Overfitting is caused mainly by:

1. Too many features vs too little data

2. High model complexity

3. Noisy or unclean data

4. Training too long without control

5. Lack of regularisation

## Regularisation


- To prevent overfitting, scikit-learn adds a penalty term to the cost function

- This pushes coefficients to stay small (or zero in L1), which keeps the model simpler


**L1 regularisation (Lasso)**

- Encourages sparsity (some coefficients become exactly 0).

- Useful for feature selection.

**L2 regularisation (Ridge)**

- Shrinks coefficients but doesn’t force them to 0.

- Helps when features are correlated.

**Elastic Net**

- A mix of L1 and L2.

- Only available with the saga solver.

 By default, scikit-learn applies L2 regularisation to logistic regression.

C controls how strong the penalty is:

- Small C --> stronger regularisation (more shrinking).

- Large C --> weaker regularisation (behaves closer to no penalty).

## Multiclass Handling

Logistic Regression is naturally binary.

For multiclass:

1. Multinomial loss (softmax) --> one model that directly predicts multiple classes.
(Supported by lbfgs, newton-cg, sag, saga)

2. One-vs-Rest (OvR) --> train one classifier per class vs all others.
(Used by liblinear)

Example:

Classify flowers into 3 types (Setosa, Versicolor, Virginica)

Multinomial: one model outputs probabilities for 3 classes.

OvR: train 3 separate binary classifiers.

## class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='deprecated', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)


## `What do these parameters mean?`

- **penalty{‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’**

  - None: no penalty is added.

  - 'l2': add a L2 penalty term and it is the default choice.

  - 'l1': add a L1 penalty term.

  - 'elasticnet': both L1 and L2 penalty terms are added.

    **NOTE**
    - Some penalties may not work with some solvers. 

`*` - means every parameter after it should be a keyword argument,not be a positional argument.

- dual : bool, default=False

   - Dual (constrained) or primal (regularized)
   - Dual formulation is only implemented for l2 penalty with liblinear solver.
   - Prefer dual=False when n_samples > n_features.

- tol : float, default=1e-4
 
   - Tolerance for stopping criteria.


- C : float, default=1.0
   - Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization

- fit_intercept : bool, default=True

  - Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.