### Why Hyperparameter Tuning is Mandatory (Beyond Accuracy)?

* Accuracy alone is **not enough** to judge a model.

* A model can:
    - Perform very well on **training data**
    - Perform poorly on **unseen (test) data**

* This happens due to:
    - Overfitting
    - Underfitting
    - Poor hyperparameter configuration

* Hyperparameter Tuning (HPT) helps to:
    - Improve **generalization**
    - Control **bias–variance tradeoff**
    - Make the model **stable across datasets**


### Bias–Variance Tradeoff (Critical Concept)

## Bias–Variance Tradeoff

Bias and Variance are two fundamental sources of error.

### High Bias
- Model is too simple
- Misses patterns
- Underfitting
- Example: Linear Regression on highly non-linear data

### High Variance
- Model is too complex
- Learns noise
- Overfitting
- Example: Decision Tree with very high depth

### Goal
Find a balance where:
- Bias is low enough to capture patterns
- Variance is low enough to generalize


### Grid Search vs Random Search (When to Use What)`

## Grid Search vs Random Search

### Grid Search
- Exhaustively tries all combinations
- Guarantees best combination in the grid
- Computationally expensive
- Suitable for:
  - Small datasets
  - Few hyperparameters

### Random Search
- Samples random combinations
- Faster than Grid Search
- Often finds good results quicker
- Suitable for:
  - Large datasets
  - Many hyperparameters

### Key Insight
Random Search is usually more efficient than Grid Search when:
- Hyperparameter space is large


### Bayesian Optimization – Why It’s Powerful

#### Bayesian Optimization (Sequential Search)

Unlike Grid or Random Search:
- Learns from previous trials
- Builds a surrogate model of performance
- Chooses next hyperparameters intelligently

Key Concepts:
- Surrogate function
- Acquisition function
- Expected Improvement (EI)

Advantages:
- Fewer iterations required
- Ideal for expensive models (DL, large datasets)

Used in libraries like:
- Optuna
- Hyperopt


### Optuna Workflow (Conceptual Understanding)

#### Optuna Hyperparameter Optimization Workflow

1. Define an objective function
   - Includes model
   - Includes hyperparameters
   - Returns evaluation metric

2. Create a study
   - Direction: minimize or maximize

3. Optimize the study
   - Number of trials
   - Sampler decides next parameters

4. Extract:
   - Best parameters
   - Best score

Optuna uses TPE (Tree-structured Parzen Estimator) internally.


### Why Linear Regression is Still Important

Despite many advanced models, Linear Regression is important because:

- Highly interpretable
- Acts as a strong baseline
- Helps understand feature relationships
- Fast to train
- Works well when assumptions are satisfied

Always start with Linear Regression before complex models.


### Linear Regression Geometry Intuition

For 1 feature:
- Equation represents a straight line
- y = mx + c

For 2 features:
- Equation represents a plane

For n features:
- Equation represents a hyperplane

Model learns the best hyperplane that minimizes error.


### How Linear Regression Learns (OLS vs GD)

### Ordinary Least Squares (OLS)
- Analytical solution
- Minimizes sum of squared errors
- Fast for small datasets

### Gradient Descent
- Iterative optimization
- Used when dataset is large
- Learning rate is critical

Both aim to minimize:
- Sum of (y_true - y_pred)^2


### Assumptions of Linear Regression

#### Assumptions of Linear Regression

1. Linearity
2. Independence of errors
3. Homoscedasticity
4. No multicollinearity
5. Normal distribution of errors

Violation of assumptions can lead to:
- Incorrect coefficients
- Misleading interpretations


In [2]:
!pip install optuna



In [3]:
import optuna

In [4]:
from sklearn.datasets import load_breast_cancer

In [5]:
dataset = load_breast_cancer()

In [6]:
import pandas as pd

In [7]:
data = pd.DataFrame(dataset.data, columns = dataset.feature_names)
data['target'] = dataset.target
data.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [8]:
# Data into Features and Target
X = data.drop('target', axis = 1)
y = data['target']

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42, stratify = y)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((455, 30), (114, 30), (455,), (114,))

In [11]:
from sklearn.tree import DecisionTreeClassifier

In [12]:
dt = DecisionTreeClassifier()
dt

In [13]:
dt.fit(X_train, y_train)

In [14]:
dt.score(X_train, y_train)

1.0

In [15]:
y_pred = dt.predict(X_test)

In [16]:
from sklearn. metrics import accuracy_score

In [17]:
accuracy_score(y_test, y_pred)

0.9210526315789473

In [18]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

In [19]:
# Objective
def objective(trial):
  # Levels of Hyperparameters
  criterion = trial.suggest_categorical('criterion', ['gini', 'entropy', 'log_loss'])
  max_depth = trial.suggest_int('max_depth', 2, 30)
  min_sam_split = trial.suggest_int('min_samples_split', 2, 50)
  min_sam_leaf = trial.suggest_int('min_samples_leaf', 1, 20)
  # Model
  dt = DecisionTreeClassifier(criterion= criterion, max_depth= max_depth, min_samples_split = min_sam_split, min_samples_leaf = min_sam_leaf)
  # Metric
  skf = StratifiedKFold(n_splits = 5, shuffle = True)
  score = cross_val_score(dt, X_train, y_train, scoring = 'accuracy', cv = skf).mean()
  return score

In [20]:
# Create a study - Random Sampler
study = optuna.create_study(study_name = 'DT_Study', direction = 'maximize', sampler = optuna.samplers.RandomSampler())
study

[I 2026-01-08 15:45:20,573] A new study created in memory with name: DT_Study


<optuna.study.study.Study at 0x26c60539430>

In [21]:
# Create a study - Grid Sampler
search_space = {'criterion': ['gini', 'entropy', 'log_loss'],
                'max_depth': [2, 5, 10, 15, 20, 25, 30],
                'min_samples_split': [2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
                'min_samples_leaf': [1, 5, 10, 15, 20]}
study = optuna.create_study(study_name = 'DT_Study', direction = 'maximize', sampler = optuna.samplers.GridSampler(search_space))
study

[I 2026-01-08 15:45:23,466] A new study created in memory with name: DT_Study


<optuna.study.study.Study at 0x26c621a2450>

In [22]:
# Optimize the study
study.optimize(objective)

[I 2026-01-08 15:45:25,183] Trial 0 finished with value: 0.9472527472527472 and parameters: {'criterion': 'log_loss', 'max_depth': 20, 'min_samples_split': 2, 'min_samples_leaf': 5}. Best is trial 0 with value: 0.9472527472527472.
[I 2026-01-08 15:45:25,227] Trial 1 finished with value: 0.9208791208791208 and parameters: {'criterion': 'log_loss', 'max_depth': 30, 'min_samples_split': 15, 'min_samples_leaf': 20}. Best is trial 0 with value: 0.9472527472527472.
[I 2026-01-08 15:45:25,274] Trial 2 finished with value: 0.9098901098901099 and parameters: {'criterion': 'log_loss', 'max_depth': 20, 'min_samples_split': 20, 'min_samples_leaf': 5}. Best is trial 0 with value: 0.9472527472527472.
[I 2026-01-08 15:45:25,315] Trial 3 finished with value: 0.9252747252747252 and parameters: {'criterion': 'entropy', 'max_depth': 10, 'min_samples_split': 35, 'min_samples_leaf': 20}. Best is trial 0 with value: 0.9472527472527472.
[I 2026-01-08 15:45:25,357] Trial 4 finished with value: 0.9252747252747

In [23]:
study.best_params

{'criterion': 'entropy',
 'max_depth': 30,
 'min_samples_split': 30,
 'min_samples_leaf': 1}

In [24]:
study.best_value

0.9472527472527474

In [25]:
dt = DecisionTreeClassifier(**study.best_params)
dt

In [26]:
dt.fit(X_train, y_train)

In [27]:
dt.score(X_train, y_train)

0.967032967032967

In [28]:
y_pred = dt.predict(X_test)

In [29]:
accuracy_score(y_pred, y_test)

0.9473684210526315

## model selection

In [30]:
from sklearn.neighbors import KNeighborsClassifier

In [31]:
# Objective
def objective(trial):
  # Levels of Hyperparameters
  model = trial.suggest_categorical('model', ['knn', 'dt'])
  if model == 'knn':
    n_neighbors = trial.suggest_int('n_neighbors', 3, 30)
    weights = trial.suggest_categorical('weights', ['uniform', 'distance'])
    clf = KNeighborsClassifier(n_neighbors=n_neighbors, weights = weights)
  else:
    criterion = trial.suggest_categorical('criterion', ['gini', 'entropy', 'log_loss'])
    max_depth = trial.suggest_int('max_depth', 2, 30)
    min_sam_split = trial.suggest_int('min_samples_split', 2, 50)
    min_sam_leaf = trial.suggest_int('min_samples_leaf', 1, 20)
    clf = DecisionTreeClassifier(criterion= criterion, max_depth= max_depth, min_samples_split = min_sam_split, min_samples_leaf = min_sam_leaf)
  # Metric
  skf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)
  score = cross_val_score(clf, X_train, y_train, scoring = 'accuracy', cv = skf).mean()
  return score

In [32]:
# Create a study - Sequential Search - TPE (DEFAULT)
study = optuna.create_study(study_name = 'DT_Study', direction = 'maximize')
study

[I 2026-01-08 15:47:15,565] A new study created in memory with name: DT_Study


<optuna.study.study.Study at 0x26c623a9e50>

In [33]:
# Optimize the study
study.optimize(objective, n_trials = 100)

[I 2026-01-08 15:47:16,086] Trial 0 finished with value: 0.9274725274725275 and parameters: {'model': 'dt', 'criterion': 'entropy', 'max_depth': 6, 'min_samples_split': 48, 'min_samples_leaf': 1}. Best is trial 0 with value: 0.9274725274725275.
[I 2026-01-08 15:47:16,469] Trial 1 finished with value: 0.9340659340659341 and parameters: {'model': 'knn', 'n_neighbors': 11, 'weights': 'uniform'}. Best is trial 1 with value: 0.9340659340659341.
[I 2026-01-08 15:47:16,546] Trial 2 finished with value: 0.9274725274725275 and parameters: {'model': 'dt', 'criterion': 'entropy', 'max_depth': 7, 'min_samples_split': 19, 'min_samples_leaf': 6}. Best is trial 1 with value: 0.9340659340659341.
[I 2026-01-08 15:47:16,584] Trial 3 finished with value: 0.9274725274725275 and parameters: {'model': 'dt', 'criterion': 'gini', 'max_depth': 8, 'min_samples_split': 18, 'min_samples_leaf': 18}. Best is trial 1 with value: 0.9340659340659341.
[I 2026-01-08 15:47:16,623] Trial 4 finished with value: 0.931868131

In [34]:
study.best_params

{'model': 'knn', 'n_neighbors': 5, 'weights': 'distance'}

In [38]:
params = study.best_params
params.pop('model')
params

{'n_neighbors': 5, 'weights': 'distance'}

In [39]:
study.best_value

0.9428571428571428

In [40]:
knn = KNeighborsClassifier(**params)
knn

In [41]:
knn.fit(X_train, y_train)

In [42]:
knn.score(X_train, y_train)

1.0

In [43]:
y_pred = knn.predict(X_test)
accuracy_score(y_test, y_pred)

0.9122807017543859