Hyperparameter tuning is the process of finding the optimal "settings" for your machine learning algorithm before the training actually begins.

Think of **parameters** (weights and biases) as what the model learns *internally* from the data. Think of **hyperparameters** as the *external* configuration knobs you turn to control how the model learns.

For **Logistic Regression**, the most critical knobs control **regularization** (preventing overfitting).

#### 1. Key Hyperparameters in Logistic Regression


* **`C` (Inverse of Regularization Strength):**
    * This is the most important setting.
    * **Small `C` (e.g., 0.01):** Strong regularization. The model simplifies the decision boundary (good for noisy data, prevents overfitting).
    * **Large `C` (e.g., 100):** Weak regularization. The model tries to fit every data point perfectly (risk of overfitting).


* **`penalty`:**
    * Decides *how* to penalize complex models.
    * **`'l2'` (Ridge):** Shrinks coefficients toward zero (default).
    * **`'l1'` (Lasso):** Can shrink coefficients *to* zero (performs feature selection).
    * **`'elasticnet'`:** A mix of both.


* **`solver`:**
    * The mathematical algorithm used to find the weights.
    * Examples: `liblinear` (good for small binary datasets), `saga` (good for large datasets and supports ElasticNet).



---

#### 2. What is "Cross Validation"

**CV stands for Cross-Validation.**
When tuning, you cannot test on your final "Test Set" (that's cheating/data leakage). Instead, the algorithm:

1. Splits your **Training Data** into  folds (e.g., 5 parts).
2. Trains on 4 parts, validates on the 5th part.
3. Repeats this 5 times and averages the score.

This ensures the hyperparameters you choose are robust and not just lucky for one specific split of data.

In [2]:
import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
# Create a dataset
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=10, random_state=55, n_classes=2)

In [4]:
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,-0.764892,0.614783,-0.682799,1.349154,-1.253422,-2.843722,-1.179356,0.871903,-2.162518,0.189528
1,-0.076614,0.7505,0.357315,1.60941,0.357707,1.475502,0.262632,2.063346,2.915461,-1.173271
2,-0.644742,0.531535,-0.850795,-1.772644,0.028788,1.232742,0.537949,-1.671866,0.015055,-0.807296
3,0.726646,0.406491,-0.255955,-2.181476,0.921972,0.806065,-1.151218,-2.206911,-0.784092,0.110835
4,-0.266286,0.537875,-0.049964,2.086994,-0.824239,-0.739098,0.408712,2.118067,0.78632,0.405554


In [5]:
y[:50]

array([0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1,
       0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
       0, 0, 0, 1, 0, 0])

In [6]:
# train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=55)

In [7]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((700, 10), (300, 10), (700,), (300,))

In [8]:
# Train the model

---

#### **1. Class Overview**

```python
sklearn.linear_model.LogisticRegression(
penalty='l2', dual=False, tol=0.0001, C=1.0, 
fit_intercept=True, intercept_scaling=1, 
class_weight=None, random_state=None, 
solver='lbfgs', max_iter=100, 
multi_class='auto', verbose=0, 
warm_start=False, n_jobs=None, l1_ratio=None)

```

---

#### **2. Critical Parameters for Tuning**

In professional practice, you will spend 90% of your time tuning these four parameters. They determine the modelâ€™s ability to generalize and its computational efficiency.

##### **A. `penalty` (Regularization Type)**

* **Options:** `None`, `'l1'`, `'l2'`, `'elasticnet'` (Default: `'l2'`)
* **Professional Context:** Used to prevent overfitting by penalizing large coefficients.
* **`l2` (Ridge):** Shrinks coefficients but keeps them all. Good for general use.
* **`l1` (Lasso):** Shrinks some coefficients to exactly zero. Useful for **feature selection** if you have many irrelevant features.
* **`elasticnet`:** A combination of L1 and L2. Requires the `l1_ratio` parameter.



##### **B. `C` (Inverse Regularization Strength)**

* **Type:** `float` (Default: `1.0`)
* **Tuning Range:** Typically logarithmic (e.g., `np.logspace(-4, 4, 20)`)
* **Professional Context:** This is the most important "knob."
* **Small `C`:** High regularization (simpler model, high bias, low variance).
* **Large `C`:** Low regularization (complex model, low bias, high variance).



##### **C. `solver` (The Optimizer)**

* **Options:** `'lbfgs'`, `'liblinear'`, `'newton-cg'`, `'newton-cholesky'`, `'sag'`, `'saga'`
* **Selection Criteria:**
* **`liblinear`:** Best for small datasets. Only supports OvR for multiclass.
* **`lbfgs`:** The default. Robust for most medium-sized problems.
* **`saga`:** The "Swiss Army Knife." Only solver that supports all penalties (L1, L2, ElasticNet) and is fast for very large datasets.



##### **D. `class_weight` (Imbalance Handling)**

* **Options:** `None`, `'balanced'`, or a `dict` (Default: `None`)
* **Professional Context:** If your dataset has 95% "No" and 5% "Yes," set this to `'balanced'`. It automatically adjusts weights inversely proportional to class frequencies, ensuring the model doesn't ignore the minority class.

---

#### **3. Secondary & Computational Parameters**

These parameters rarely affect accuracy but significantly affect **stability** and **speed**.

| Parameter | Type | Professional Insight |
| --- | --- | --- |
| **`max_iter`** | `int` | Default is 100. For large datasets or complex features, the solver often fails to converge. **Common fix:** Increase to 1,000 or 10,000. |
| **`multi_class`** | `str` | `'ovr'` (One-vs-Rest) or `'multinomial'`. `'auto'` chooses based on the solver and data. |
| **`tol`** | `float` | Tolerance for stopping criteria. If the improvement is less than `tol`, training stops. |
| **`l1_ratio`** | `float` | Only used if `penalty='elasticnet'`. `0` is equivalent to L2, `1` is equivalent to L1. |
| **`n_jobs`** | `int` | Set to `-1` to use all CPU cores during the "One-vs-Rest" multiclass phase. |

---

#### **4. Attributes (Post-Training Inspection)**

A professional doesn't just look at the accuracy; they inspect the model's internals:

* `.coef_`: The weights assigned to features. High absolute values indicate high feature importance.
* `.intercept_`: The bias term.
* `.n_iter_`: Useful to check if the model actually reached convergence or just stopped because it hit `max_iter`.

---

*In Logistic Regression there are 2 main methods used to perform hyperparameter tuning: **1. Grid Search CV**, **2. Randomized Search CV***

##### 1. Grid Search CV:"The Perfectionist"
Grid Search is a brute-force method. You define a discrete set of values for each hyperparameter, and the algorithm evaluates every single possible combination.

* *How it works:* Imagine a grid. If you have 3 options for C and 2 options for penalty, Grid Search builds $3 \times 2 = 6$ models (multiplied by the number of CV folds).

* *Pros:* Guaranteed to find the best combination within the grid you provided.

* *Cons:* Extremely computationally expensive. If you have many hyperparameters, the number of combinations grows exponentially.

* *Example Scenario:*

    * C = [0.1, 1, 10, 100] (4 values)

    * penalty = ['l1', 'l2'] (2 values)
    
    * Total Iterations: $4 \times 2 = 8$ combinations.

---

#### **1. Core Functional Parameters**

##### **`estimator` (Required)**

* **Type:** `object`
* **Description:** This is the model instance you want to tune (e.g., `LogisticRegression()`, `RandomForestClassifier()`).
* **Professional Tip:** You can also pass a `Pipeline` object here. This is highly recommended to ensure that preprocessing steps (like scaling) are inside the cross-validation loop to prevent **data leakage**.

##### **`param_grid` (Required)**

* **Type:** `dict` or `list of dictionaries`
* **Description:** The dictionary where keys are the hyperparameter names and values are the settings to try.
* **Example:** `{'C': [0.1, 1], 'penalty': ['l1', 'l2']}`.
* **Logic:** Grid Search performs a **Cartesian Product** of these values. The example above results in  unique combinations.

##### **`scoring`**

* **Type:** `str`, `callable`, or `list/dict` (Default: `None`)
* **Description:** The metric used to evaluate the performance of the cross-validated model.
* **Values:** Common strings include `'accuracy'`, `'f1'`, `'roc_auc'`, `'neg_mean_squared_error'`.
* **Professional Tip:** For imbalanced datasets, never use the default (accuracy). Always specify `'f1'` or `'precision'`.

---

#### **2. Cross-Validation & Execution Parameters**

##### **`cv`**

* **Type:** `int` or `cross-validation generator` (Default: `5`)
* **Description:** Determines the cross-validation splitting strategy.
* **Logic:** An integer specifies the number of folds in a `(Stratified)KFold`.
* **Professional Tip:** Use `cv=5` or `cv=10` as a standard. Higher values provide more robust estimates but increase training time linearly.

##### **`refit`**

* **Type:** `bool` or `str` (Default: `True`)
* **Description:** After finding the best hyperparameters using cross-validation, should the model be retrained on the **entire dataset**?
* **Professional Tip:** Always keep this `True`. It allows the final `grid_search` object to act as a model itself, so you can call `.predict()` immediately after tuning.

##### **`n_jobs`**

* **Type:** `int` (Default: `None`)
* **Description:** Number of CPU cores to run in parallel.
* **Values:** Set to `-1` to use all available processors. This drastically reduces the time spent waiting for the grid to finish.

---

#### **3. Verbosity & Robustness Parameters**

##### **`verbose`**

* **Type:** `int`
* **Description:** Controls the amount of messages printed during the search.
* **Values:** `0` (silent), `1` (shows total tasks), `2` (shows time and score for every fold).
* **Professional Tip:** Set `verbose=2` or `3` when running long grids so you can monitor progress and ensure your computer hasn't frozen.

##### **`error_score`**

* **Type:** `'raise'` or `numeric` (Default: `np.nan`)
* **Description:** If one specific combination of hyperparameters fails (e.g., you try a solver that doesn't support L1 penalty), what should the model do?
* **Professional Tip:** Setting this to `0` or `np.nan` allows the search to complete even if a few combinations error out, which is common when mixing solvers and penalties.

---

#### **4. Summary Table of Attributes (The Results)**

Once you call `.fit()`, the object generates these attributes:

| Attribute | Description |
| --- | --- |
| **`best_params_`** | Dictionary of the specific settings that gave the best score. |
| **`best_score_`** | The mean cross-validated score of the best estimator. |
| **`best_estimator_`** | The actual model instance, refitted on the whole dataset. |
| **`cv_results_`** | A massive dictionary (convertible to a Pandas DataFrame) showing the performance of **every** fold for **every** combination. |

---

Let's perform hyperparameter tuning using parameters `C`, `penalty`, `solver`

In [9]:
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.linear_model import LogisticRegression

# 1. Initialize the model
logistic = LogisticRegression(max_iter=5000) # Increased max_iter for convergence

# 2. Corrected Parameters
# Note: We use a list of dictionaries to pair compatible penalties and solvers
params = [
    {
        'penalty': ['l2'], 
        'C': [100, 10, 1.0, 0.1, 0.01],
        'solver': ['newton-cg', 'lbfgs', 'sag'] # Corrected 'ibfgs' to 'lbfgs'
    },
    {
        'penalty': ['l1', 'l2'], 
        'C': [100, 10, 1.0, 0.1, 0.01],
        'solver': ['liblinear', 'saga']
    },
    {
        'penalty': ['elasticnet'],
        'C': [100, 10, 1.0, 0.1, 0.01],
        'solver': ['saga'],
        'l1_ratio': [0.5] # Required for elasticnet
    }
]

# 3. Setup CV (Standard StratifiedKFold doesn't require 'groups')
cv = StratifiedKFold(n_splits=5)

# 4. Initialize GridSearchCV
gridcv = GridSearchCV(
    estimator=logistic, 
    param_grid=params, 
    scoring='accuracy', 
    cv=cv, 
    n_jobs=-1
)

# 5. Fit the model
gridcv.fit(X_train, y_train)

# View the results
print("Best Parameters:", gridcv.best_params_)
print("Best Accuracy:", gridcv.best_score_)
print("Best Estimator: ",gridcv.best_estimator_)

Best Parameters: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
Best Accuracy: 0.8842857142857143
Best Estimator:  LogisticRegression(C=0.1, max_iter=5000, solver='newton-cg')


In [10]:
# 6. Predict the data
y_pred = gridcv.predict(X_test)

In [11]:
# Performance metrices
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [12]:
print("Model Accuracy: ")
score = accuracy_score(y_test, y_pred)
print(score)

Model Accuracy: 
0.8933333333333333


In [13]:
print("Confusion Matrix: ")
cm = confusion_matrix(y_test, y_pred)
print(cm)

Confusion Matrix: 
[[134  15]
 [ 17 134]]


In [14]:
print('Values of Precision, Recall, and F1-score: ')
print(classification_report(y_test, y_pred))

Values of Precision, Recall, and F1-score: 
              precision    recall  f1-score   support

           0       0.89      0.90      0.89       149
           1       0.90      0.89      0.89       151

    accuracy                           0.89       300
   macro avg       0.89      0.89      0.89       300
weighted avg       0.89      0.89      0.89       300



---

##### 2. Randomized Search CV : "The Pragmatist"

Randomized Search does not try every combination. Instead, you define a range (distribution) of values, and the algorithm randomly samples a fixed number of combinations ($n\_iter$) from that space.

* *How it works:* It picks random points in the hyperparameter space to test.

* *Pros:* Much faster. It is statistically likely to find a "very good" model (close to the optimal) in a fraction of the time. It allows you to search continuous ranges for C rather than fixed steps.

* *Cons:* It might miss the absolute peak performance that Grid Search would hit if that peak lies between your random samples.

* *Example Scenario:*
    
    * C = Any number between 0.1 and 100.

    * penalty = ['l1', 'l2']

    * Total Iterations: You set n_iter=10. It will try 10 random pairs.

---

#### **1. Core Functional Parameters**

##### **`estimator` (Required)**

* **Type:** `object`
* **Description:** The model you want to tune (e.g., `LogisticRegression()`).
* **Professional Tip:** Wrap this in a `Pipeline` (e.g., `Pipeline([('scaler', StandardScaler()), ('clf', LogisticRegression())])`) to ensure that scaling and other preprocessing steps are performed correctly within each cross-validation fold.

##### **`param_distributions` (Required)**

* **Type:** `dict` or `list of dicts`
* **Description:** Dictionary where keys are parameter names and values are either **discrete lists** or **continuous distributions**.
* **Professional Tip:** For parameters like `C` in Logistic Regression, use a continuous distribution (e.g., `scipy.stats.uniform`) rather than a fixed list. This allows the model to explore values that you might not have thought to include in a discrete grid.
* *Example:* `{'C': scipy.stats.expon(scale=100), 'penalty': ['l1', 'l2']}`.



##### **`n_iter` (Critical)**

* **Type:** `int` (Default: `10`)
* **Description:** The number of parameter settings that are sampled.
* **Professional Tip:** This is your "budget" parameter. It balances runtime vs. solution quality. Increasing `n_iter` will likely lead to a better model but will take more time. Research suggests that with 60 iterations, you have a 95% probability of finding a solution within the top 5% of the optima.

##### **`scoring`**

* **Type:** `str` or `callable`
* **Description:** Strategy to evaluate the performance of the cross-validated model.
* **Professional Tip:** Always choose a metric aligned with your business goal (e.g., `'recall'` for fraud detection, `'f1'` for balanced precision/recall, or `'roc_auc'` for general classification ranking).

---

#### **2. Cross-Validation & Execution Parameters**

##### **`cv`**

* **Type:** `int` or `CV generator` (Default: `5`)
* **Description:** Determines the cross-validation splitting strategy.
* **Logic:** Usually uses `StratifiedKFold` for classification.

##### **`random_state` (Essential for Reproducibility)**

* **Type:** `int` or `RandomState instance`
* **Description:** Seed for the random number generator.
* **Professional Tip:** **Never leave this as None.** Because this algorithm relies on random sampling, your results will change every time you run the code unless you set a fixed `random_state`. This is vital for debugging and documenting model experiments.

##### **`n_jobs`**

* **Type:** `int`
* **Description:** Number of jobs to run in parallel.
* **Professional Tip:** Set to `-1` to utilize all CPU cores. Since Randomized Search involves many independent model fits, it is "embarrassingly parallel" and benefits significantly from multi-core processing.

---

#### **3. Attributes (Post-Execution Inspection)**

| Attribute | Description |
| --- | --- |
| **`best_params_`** | The combination of parameters that achieved the highest score. |
| **`best_score_`** | Mean cross-validated score of the best estimator. |
| **`cv_results_`** | A dictionary containing all information about the search, including time taken for each fit and scores for every fold. |

---

#### **Why use Randomized over Grid Search?**

1. **Efficiency:** It doesn't waste time on unimportant parameters. If one parameter has no impact on the outcome, Grid Search still repeats all combinations for it; Randomized Search moves on.
2. **Granularity:** It can find the "peak" of performance that might fall *between* the fixed steps of a grid.
3. **Scalability:** When you have 5+ hyperparameters, the number of combinations in a Grid Search becomes millions (the "Curse of Dimensionality"). Randomized Search stays constant at your `n_iter` limit.

In [17]:
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold
from sklearn.linear_model import LogisticRegression

logistic = LogisticRegression()

cv = StratifiedKFold(n_splits=5)
randomcv = RandomizedSearchCV(estimator=logistic, param_distributions=params, scoring='accuracy', cv=cv, n_jobs=-1)

randomcv.fit(X_train, y_train)

# View the results
print("Best Parameters:", randomcv.best_params_)
print("Best Accuracy:", randomcv.best_score_)
print("Best Estimator: ",randomcv.best_estimator_)

Best Parameters: {'solver': 'liblinear', 'penalty': 'l2', 'C': 0.1}
Best Accuracy: 0.8842857142857143
Best Estimator:  LogisticRegression(C=0.1, solver='liblinear')


In [18]:
# Predict the values and then find accuracy 
y_pred = randomcv.predict(X_test)

In [20]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

print('Confusion Matrix: \n', confusion_matrix(y_test, y_pred))
print('Accuracy Score: ', accuracy_score(y_test, y_pred))
print('Values of Recall, Precisio, and F1-score: \n', classification_report(y_test, y_pred))

Confusion Matrix: 
 [[134  15]
 [ 17 134]]
Accuracy Score:  0.8933333333333333
Values of Recall, Precisio, and F1-score: 
               precision    recall  f1-score   support

           0       0.89      0.90      0.89       149
           1       0.90      0.89      0.89       151

    accuracy                           0.89       300
   macro avg       0.89      0.89      0.89       300
weighted avg       0.89      0.89      0.89       300

