# Supervised Learning → Logistic Regression (Classification)
This notebook is part of the **ML-Methods** project.

The first sections focus on data preparation
and are intentionally repeated.

This allows each notebook to be read independently
and makes model comparisons clearer.

1. Project setup and common pipeline
2. Dataset loading
3. Train-test split
4. Feature scaling (why we do it)

----------------------------------

5. What is this model? (Intuition)
6. Model training
7. Model behavior and key parameters
8. Predictions
9. Model evaluation
10. When to use it and when not to
11. Model persistence
12. Mathematical formulation (deep dive)

-----------------------------------------------------

## How this notebook should be read

This notebook is designed to be read **top to bottom**.

Before every code cell, you will find a short explanation describing:
- what we are about to do
- why this step is necessary
- how it fits into the overall process

The goal is not just to run the code,
but to understand what is happening at each step
and be able to adapt it to your own data.

-----------------------------------------------------

## What is Logistic Regression?

Logistic Regression is a **classification model**,
even though its name contains the word “regression”.

Instead of predicting a continuous value,
Logistic Regression predicts the **probability**
that an input belongs to a given class.

-----------------------------------------------------

## Why we start with intuition

Logistic Regression looks similar to Linear Regression,
but it behaves very differently.

Understanding this difference early
helps avoid confusion when reading the code
and interpreting the results.

-----------------------------------------------------

## What you should expect from the results

With Logistic Regression, you should expect:
- probabilistic outputs
- linear decision boundaries
- strong performance on linearly separable data

It is often used as:
- a baseline classification model
- a simple and interpretable solution
- a reference point for more complex classifiers



____________________________________________________

## 1. Project setup and common pipeline

In this section we set up the common pipeline
used across classification models in this project.

Although the pipeline is similar to regression,
the evaluation and interpretation steps will differ.


In [1]:
# Common imports used across all classification models

import numpy as np
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    classification_report
)

import joblib


# ____________________________________
## 2. Dataset loading

In this section we load the dataset
used for the Logistic Regression classification task.

We use a binary classification dataset
to clearly illustrate how the model works.


In [2]:
# Load the dataset

data = load_breast_cancer(as_frame=True)

X = data.data
y = data.target


### Inputs and target

- `X` contains the input features
- `y` contains the target labels

In this dataset:
- the task is binary classification
- the target takes two possible values (0 or 1)


### Why this dataset is suitable

This dataset is well suited for Logistic Regression because:
- it is clean and well-structured
- the classes are reasonably separable
- it is commonly used as a reference for classification models

This makes it ideal for understanding
both the strengths and limitations of Logistic Regression.


# ____________________________________
## 3. Train-test split

In this section we split the dataset
into training and test sets.

This step is essential to evaluate
how well the classification model generalizes
to unseen data.
