### Hyperparameter Tuning in Machine Learning:
- It is the process of selecting the optimal values for a machine learning model's hyperparameters.
- These are typically set before the actual training process begins and controls aspects of the learning process itself.
- Effective tuning helps the model learn better patterns, avoid overfitting (or) underfitting and achieve higher accuracy on unseen data.

### Techniques for Hyperparameter Tuning:
#### 1. Grid Search CV:
- It is a brute-force technique for hyperparameter tuning.
- It trains the model using all combination of specified hyperparameter values to find the best-performing setup.
- It is slow and uses a lot of computer power which make it hard to use with a bigdatasets (or) many settings.
- It works using below steps -
  1. Create the grid of potential values for each hyperparameter.
  2. Train the model for every combination in the grid.
  3. Evaluate each model using cross-validation.
  4. select the combination that gives a highest score.

### Tuning Logistic Regression with GridSearchCV
- We generate a sample data using make_classification.
- We define a range of C values using logarithmic scale.
- GridSearchCV - tries all combinations from param_grid and uses 5-fold cross-validation.
- It returns the best hyperparameter (c) and its corresponding validation score.

In [4]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_samples=1000, n_features=20, 
                           n_informative=10, n_classes=2, random_state=42) 
c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}

logreg = LogisticRegression()

logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

logreg_cv.fit(X, y)

print('Tuned Logistic Regression Parameters: {}'.format(logreg_cv.best_params_))
print('Best score is {}'.format(logreg_cv.best_score_))

Tuned Logistic Regression Parameters: {'C': np.float64(0.006105402296585327)}
Best score is 0.853


#### 2. Random Search CV:
- It picks random combinations of hyperparameters from the given ranges instead of checking every single combination like GridSearchCV.
  1. In each iteration it tries a new random combination of hyperparameter values.
  2. It records the model's performance for each combination.
  3. After several attempts it selects the best-performing set.

### Tuning Decision Tree with RandomizedSearchCV
- We define a range of values for each hypermater. E.g - max_depth, min_samples_leaf etc.
- Random combinations are picked and evaluated using 5-fold cross-validation.
- The best combination and score are printed.

In [8]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import randint

X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_classes=2, random_state=42)
param_dist = {
    'max_depth': [3, None],
    'max_features':randint(1, 9),
    'min_samples_leaf': randint(1, 9),
    'criterion':['gini', 'entropy']
}

tree = DecisionTreeClassifier()
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)
tree_cv.fit(X,y)

print('Tuned Decision Tree Parameter:{}'.format(tree_cv.best_params_))
print('Best score is {}'.format(tree_cv.best_score_))

Tuned Decision Tree Parameter:{'criterion': 'gini', 'max_depth': None, 'max_features': 8, 'min_samples_leaf': 8}
Best score is 0.805


#### 3. Bayesian Optimization:
- GridSearch and Random Search can be inefficient because they blindly try many hyperparameter combination, even if they are clearly not useful.
- It treats hyperparameter tuning like a mathematical optimization problem and learns from past results to decide what to try next.
  1. Build a probailistic model (surrogate function ) that predicts performance based on parameters.
  2. Update this model after each evaluation.
  3. Use the model to choose the next best set to try.
  4. Repeat until the optimal combination is found. The surrogate function models - P(score(y) \ hyperparametrs(x))
  - Here the surrogate function models the relationshp between the hyperparametrs of x and the score y.
  - By updating this model iteratively with each new evaluation Bayesian optimization makes more informed decisions.
  - Common surrogate models used in Bayesian optimization include -
    1. Gaussian Processes
    2. Random Forest Regression
    3. Tree-structured Parzen Estimators(TPE)