### What is Hyperparameter Tuning?

`Hyperparameter tuning` is the process of finding the best set of hyperparameters (configuration settings) for a machine learning algorithm.

Unlike model parameters (learned from data), hyperparameters are set before training and control how the model learns.

The goal is to improve model performance (accuracy, precision, recall, RMSE, etc.) by choosing optimal hyperparameters.

### Parameters vs Hyperparameters  

### Parameters  
- Parameters are values **learned from the training data** by the model.  
- They are updated during training using optimization algorithms (like gradient descent).  
- They represent the "knowledge" the model has gained.  

#### Examples:
- **Linear Regression:** coefficients (weights) `β0, β1, β2, ...` and the intercept.  
- **Logistic Regression:** weights associated with each feature and the bias.    
- **KNN:** (no learned parameters, since it memorizes the dataset).  
- **decision trees:** these are the actual splits and structure the model creates.
- **SVM:** support vectors, weights, and bias learned from the data.
- **Naive Bayes:** Prior Probabilities and the probability of each feature given a class.

Example: In a Linear Regression equation:  
\[
y = β_0 + β_1x_1 + β_2x_2 + ... + β_nx_n
\]  
The `β`s are the **parameters** that the algorithm learns.  

### Hyperparameters  
- Hyperparameters are set **before training** and control how the model learns.  
- They are **not learned** from the data but chosen by the data scientist (sometimes tuned using GridSearch, Random Search, etc.).  

#### Examples:
- **Logistic Regression:** `C`, `penalty`, `solver`.  
- **Decision Trees:** `max_depth`, `min_samples_split`.  
- **KNN:** `n_neighbors`, `metric`.  
- **SVM:** `C`, `kernel`, `gamma`.  

#### Common Hyperparameter Tuning Methods

1. **Grid Search**
- Tries all possible combinations of hyperparameters.
- Computationally expensive.

2. **Random Search**
- Samples random combinations of hyperparameters.
- Faster than grid search and often effective.

### Examples of Hyperparameters  

#### 1. Linear Regression  
- `fit_intercept` – whether to calculate the intercept (boolean).  
- `normalize` – whether to normalize input features (deprecated in new versions).  

#### 2. Logistic Regression  
- `C` – inverse of regularization strength (smaller values = stronger regularization).  
- `penalty` – type of regularization (`l1`, `l2`, `elasticnet`, `none`).  
- `solver` – optimization algorithm (`liblinear`, `saga`, `newton-cg`, `lbfgs`).  
- `max_iter` – maximum number of iterations for convergence.  

#### 3. Decision Trees  
- `max_depth` – maximum depth of the tree.  
- `min_samples_split` – minimum number of samples required to split a node.  
- `min_samples_leaf` – minimum number of samples required at a leaf node.  
- `max_features` – number of features to consider for the best split.  
- `criterion` – function to measure split quality (`gini`, `entropy`).  

#### 4. Random Forest  
Includes all Decision Tree hyperparameters, plus:  
- `n_estimators` – number of trees in the forest.  
- `bootstrap` – whether bootstrap samples are used when building trees.  

#### 5. K-Nearest Neighbors (KNN)  
- `n_neighbors` – number of neighbors to consider.  
- `metric` – distance metric (`euclidean`, `manhattan`, `minkowski`).  
- `weights` – how to weight neighbors (`uniform`, `distance`).  

#### 6. Support Vector Machine (SVM)  
- `C` – regularization parameter.  
- `kernel` – type of kernel (`linear`, `rbf`, `poly`, `sigmoid`).  
- `gamma` – kernel coefficient (affects influence of a single training example).  
- `degree` – degree of polynomial kernel (if `poly`).  
