## Introduction to Hyparmeters
### What are Parameters?
- Parameters are values that are learned by a machine learning model during training.
  They are adjusted to minimize the loss function and improve the model's predictions.

   These values are optimized from the data itself.

#### Examples:
- Coefficients in Linear Regression
   *E.g., the slope and intercept in a straight-line model.*

- Weights and biases in Neural Networks
   *E.g.,the connection strength between neurons.*

---
### What are Hyperparameters?
- Hyperparameters are settings defined before training begins.
  They control the learning process of the model but are not learned from the data.

   Hyperparameters are set manually or tuned using optimization methods.

#### Examples:
- Maximum depth of decision trees
  *E.g., limiting how deep a tree can go to prevent overfitting.*
- Learning rate in gradient descent
 *E.g., how big a step we take during optimization.*
- Number of estimators in ensemble models
  *E.g., how many trees to build in a Random Forest.*

---

### Why Tune Hyperparameters?
  Tuning hyperparameters is essential because:

1. Improves Model Generalization
   - Proper tuning helps the model perform well on unseen data.
2. Prevents Overfitting/Underfitting
    - For instance, a tree with no depth limit may overfit; one that's too shallow may underfit.
3. Saves Time and Resources
   - Optimal hyperparameters can lead to faster training and efficient computation.
4. Tailors Model to Data
   - Different datasets need different settings to bring out the best model performance.

___

### Common Hyperparameters (by Model Type)

####  Tree-Based Models (e.g., Decision Trees, Random Forests)
- **`max_depth`**: Controls the depth of the tree.  
- **`min_samples_split`**: Minimum number of samples required to split a node.  
- **`n_estimators`**: Number of trees in an ensemble model.  

####  Boosting Models (e.g., XGBoost, LightGBM)
- **`learning_rate`**: Determines how much each tree contributes to the overall prediction.  
- **`subsample`**: Fraction of the training data used per iteration.  
- **`max_depth`**: Limits tree complexity.  

####  Neural Networks
- **`learning_rate`**: Step size used in gradient descent to update weights.  
- **`num_layers`**: Total depth of the neural network.  
- **`batch_size`**: Number of training examples used in one forward/backward pass.  

---

### Hands-On Exercise
#### Objective:
- Train a model using default hyperparameters, evaluate its performance, and then manually adjust a few hyperparameters to observe how performance changes.

📊 Dataset:
We'll use the Iris Dataset:
- A simple multi-class classification dataset.
- Already well-balanced and optimal, so tuning might not show drastic changes.

### ⚠️ Note:
- Even though we’ll perform hyperparameter tuning, the Iris dataset might not show significant improvements due to its simplicity and balanced nature.





In [6]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

# load dataset
data = load_breast_cancer()

# feature & target
x,y = data.data, data.target

# Split dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Display some dataset info
print(f"Feature names: {data.feature_names}")
print(f"Class names: {data.target_names}")

# train model with default hyperparameters
rf_default = RandomForestClassifier(random_state=7)
rf_default.fit(x_train, y_train)

# predict
y_pred_default = rf_default.predict(x_test)
accuracy_default = accuracy_score(y_test, y_pred_default)

# display Evaluation
print(f"Default Model Accuracy Score: {accuracy_default:.4f}")
print(f"Default Classification report:\n {classification_report(y_test, y_pred_default)}")


Feature names: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Class names: ['malignant' 'benign']
Default Model Accuracy Score: 0.9649
Default Classification report:
               precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114

In [None]:
# Train random forest with adjusted hyperparameters
rf_tuned = RandomForestClassifier(
    n_estimators=200,
    max_depth=5,
    random_state=7
)
rf_tuned.fit(x_train, y_train)

# Predict
y_pred_tuned = rf_tuned.predict(x_test)
accuracy_tuned = accuracy_score(y_test, y_pred_tuned)

# Display model's evaluation 
print(f"Tuned Accuracy Score: {accuracy_tuned:.4f}")
print(f"Tuned Classification Report:\n {classification_report(y_test, y_pred_tuned)}")

Tuned Accuracy Score: 0.9649
Tuned Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



### 🔁 Hyperparameter Tuning Workflow

1. **Choose a Model**  
   Start by selecting a model.  
   👉 _Example_: `RandomForestClassifier()`

2. **Train with Default Hyperparameters**  
   Fit the model on the training data using its default settings.

3. **Tune Hyperparameters**  
   Adjust key hyperparameters to potentially improve model performance.  
   👉 _Example_: Change `n_estimators`, `max_depth`, or `min_samples_split` in Random Forest.

4. **Evaluate the Performance**  
   Test the model on validation/test data and compare metrics (e.g., accuracy, F1-score) before and after tuning.

5. **Note**:  
   - Sometimes tuning leads to better performance.  
   - Sometimes the improvement is negligible or even worse (overfitting/underfitting).  
   - It depends on the dataset and the problem you're solving.
