# ***Hyper parameter tunning***

**Hyperparameter tuning** is the process of finding the best set of hyperparameters for a machine learning model to improve its performance on a specific task. Hyperparameters are set before training and are not learned during the model's training process.

**Key Points:**
Hyperparameters are external settings that control the learning process (e.g., learning rate, number of neighbors in KNN).
**Objective:** Improve model performance and generalization by selecting the best hyperparameters.

**Goal:** Reduce overfitting or underfitting and make the model perform well on unseen data.

**Methods for Hyperparameter Tuning:**

**Grid Search:**

**Definition:** 
Try all possible combinations of a given set of hyperparameters.

Pros: Exhaustive and guarantees finding the best combination within the grid.

Cons: Computationally expensive, especially with many hyperparameters.

**Random Search:**

**Definition:** Ra
ndomly sample from a set of hyperparameter values.

Pros: More efficient than grid search, especially when hyperparameter space is large.

Cons: No guarantee of finding the best set.

**Advantages of Hyperparameter Tuning:**
Improved Model Performance

Better Generalization

Increased Accuracy

Flexibility with Different Models

Automated Techniques

**Disadvantages of Hyperparameter Tuning:**
Computationally Expensive

Overfitting During Search

Requires Large Amount of Data

Complexity in Selecting Hyperparameters

Risk of Getting Stuck in Local Optima

Time-Consuming

**Load And Explore Dataset**

In [9]:
df = load_iris()
X = df.data
y = df.target

**Define Hyperparameter Grid**

In [10]:
param_grid = {
    'n_estimators':[10, 50, 100, 200],
    'max_depth':[5, 10, 20, None],
    'min_samples_split':[2, 5, 10],
    'min_samples_leaf':[1, 2, 4]
}

# now split into the traning and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Apply Grid Search CV**

In [12]:
# Grid search cv
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train)

# Get best parameters
print('Best Parameters from Grid Search:', grid_search.best_params_)

Fitting 5 folds for each of 144 candidates, totalling 720 fits
Best Parameters from Grid Search: {'max_depth': 5, 'min_samples_leaf': 4, 'min_samples_split': 5, 'n_estimators': 100}


**Randomized search cv**

In [13]:
# Randomized search cv
random_search = RandomizedSearchCV(RandomForestClassifier(), param_grid, n_iter=10,  cv=5, n_jobs=-1, verbose=1)
random_search.fit(X_train, y_train)

# Get best parameters
print('Best Parameters from Random Search:', random_search.best_params_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best Parameters from Random Search: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_depth': 5}


**Evaluate Model Performance**

In [16]:
# Train models with best parameters
best_grid_model = grid_search.best_estimator_
best_random_model = random_search.best_estimator_

# Predictions
y_pred_grid = best_grid_model.predict(X_test)
y_pred_random = best_random_model.predict(X_test)

# Accuracy scores
acc_grid = accuracy_score(y_test, y_pred_grid)
acc_random = accuracy_score(y_test, y_pred_random)

print(f'Grid Search Accuracy: {acc_grid:.2f}')
print(f'Random Search Accuracy: {acc_random:.2f}')

Grid Search Accuracy: 1.00
Random Search Accuracy: 1.00
