# 🔧 Understanding Hyperparameters in Machine Learning Models  

## What Are Hyperparameters?  
Hyperparameters are **configurable parameters** set before a machine learning model begins training. Unlike model parameters (e.g., weights in neural networks), hyperparameters **are not learned from the data** but are instead manually specified or optimized using techniques like **grid search** or **random search**.  

## Why Do We Need Hyperparameters?  
Hyperparameters play a crucial role in determining the **performance, speed, and generalization** of a model. Choosing the right hyperparameters can:  
- Improve **accuracy** and **efficiency**  
- Prevent **overfitting** (learning noise instead of patterns)  
- Enhance **generalization** to unseen data  
- Speed up **training and inference**  

## Examples of Hyperparameters in Different Models  
Here are some common hyperparameters across different models:  

### 🏆 Decision Trees & Random Forests  
- `max_depth`: Controls tree depth to prevent overfitting  
- `min_samples_split`: Minimum samples required to split a node  
- `n_estimators` (for ensembles): Number of trees in a forest  

### 🔥 Neural Networks  
- `learning_rate`: Defines how fast the model updates weights  
- `batch_size`: Number of training samples per batch  
- `epochs`: Number of complete passes through the dataset  

### 📈 Gradient Boosting (XGBoost, LightGBM)  
- `learning_rate`: Controls the contribution of each tree  
- `n_estimators`: Number of boosting rounds  
- `max_depth`: Limits tree depth to prevent overfitting  

## What Is the "Perfect" Hyperparameter Value?  
There is **no universal perfect value** for hyperparameters. The optimal settings depend on:  
- The **dataset** size and complexity  
- The **model type** and architecture  
- The **goal** (e.g., maximizing accuracy vs. minimizing inference time)  

To find the best hyperparameters, we use:  
✅ **Grid Search**: Tests all combinations of hyperparameters  
✅ **Random Search**: Randomly samples hyperparameters for efficiency  
✅ **Bayesian Optimization**: Selects hyperparameters based on past results  

## 🔍 Conclusion  
Hyperparameters **define how a model learns**, impacting its **accuracy, speed, and generalization**. Proper tuning is essential for achieving **optimal performance** without overfitting or underfitting.  


## **1. Pruning in Classification Tree**  
Pruning helps prevent **overfitting** by reducing the size of a decision tree, leading to improved accuracy on unseen data. Without pruning, a tree may **memorize** training data rather than generalizing well to new data.  

### **Post-Pruning (Cost Complexity Pruning - CCP)**  
In post-pruning, the tree is first grown to full depth (even if it overfits) and then gradually pruned by removing nodes based on a complexity parameter α .  

#### **How CCP Works?**  
The pruning process minimizes the following equation:  

$$
\text{Total Cost} = \text{RSS} + \alpha \times \text{Number of Leaves}
$$


- **RSS (Residual Sum of Squares)** measures the error in predictions.  
- **α** is a tuning parameter that controls the trade-off between tree complexity and error.  
  - **Higher α** → More pruning → Simpler tree.  
  - **Lower α** → Less pruning → More complex tree.  
- The value for **α** can be found using cross validation.


In [1]:
# %run Parameters.ipynb

### Baseline Predictors

In [40]:
# Function to find optimal ccp_alpha
def find_optimal_alpha_base(Train):
    static_predictors = cv_parameters_base(Train)
    
    dt = DecisionTreeClassifier(random_state=1)
    path = dt.cost_complexity_pruning_path(Train[static_predictors], Train["Target"])
    ccp_alphas = path.ccp_alphas[:-1]  # Exclude the last value to avoid a single-node tree
    
    kf = KFold(n_splits=5, shuffle=True, random_state=1)
    alpha_scores = {}
    
    for alpha in ccp_alphas:
        dt = DecisionTreeClassifier(random_state=1, ccp_alpha=alpha)
        scores = cross_val_score(dt, Train[static_predictors], Train["Target"], cv=kf, scoring='accuracy')
        alpha_scores[alpha] = np.mean(scores)
    
    best_alpha = max(alpha_scores, key=alpha_scores.get)
    print(f"Best ccp_alpha: {best_alpha:.6f} with Accuracy: {alpha_scores[best_alpha]:.4f}")
    return best_alpha


### Baseline Predictors + Rolling Predictors

In [41]:
# Function to find optimal ccp_alpha
def find_optimal_alpha_roll(Train):

    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)
    
    dt = DecisionTreeClassifier(random_state=1)
    path = dt.cost_complexity_pruning_path(Train[all_predictors], Train["Target"])
    ccp_alphas = path.ccp_alphas[:-1]  # Exclude the last value to avoid a single-node tree

    
    kf = KFold(n_splits=5, shuffle=True, random_state=1)
    alpha_scores = {}
    
    for alpha in ccp_alphas:
        dt = DecisionTreeClassifier(random_state=1, ccp_alpha=alpha)
        scores = cross_val_score(dt, Train[all_predictors], Train["Target"], cv=kf, scoring='accuracy')
        alpha_scores[alpha] = np.mean(scores)
    
    best_alpha = max(alpha_scores, key=alpha_scores.get)
    print(f"Best ccp_alpha: {best_alpha:.6f} with Accuracy: {alpha_scores[best_alpha]:.4f}")
    return best_alpha

 ### Full Feature Set

In [42]:
# Function to find optimal ccp_alpha
def find_optimal_alpha_full(Train):
        # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)

    dt = DecisionTreeClassifier(random_state=1)
    path = dt.cost_complexity_pruning_path(Train[all_predictors], Train["Target"])
    ccp_alphas = path.ccp_alphas[:-1]  # Exclude the last value to avoid a single-node tree
 
    kf = KFold(n_splits=5, shuffle=True, random_state=1)
    alpha_scores = {}
    
    for alpha in ccp_alphas:
        dt = DecisionTreeClassifier(random_state=1, ccp_alpha=alpha)
        scores = cross_val_score(dt, Train[all_predictors], Train["Target"], cv=kf, scoring='accuracy')
        alpha_scores[alpha] = np.mean(scores)
    
    best_alpha = max(alpha_scores, key=alpha_scores.get)
    print(f"Best ccp_alpha: {best_alpha:.6f} with Accuracy: {alpha_scores[best_alpha]:.4f}")
    return best_alpha

## **2. C in Logistic Regression**  
In **Logistic Regression**, `C` is the **inverse of the regularization strength** (also called the **inverse of lambda** in regularization).

$$ 
C = \frac{1}{\lambda}
$$
where **λ (lambda)** is the regularization parameter.

### 🔹 What Does `C` Do?
- It **controls the trade-off** between model complexity and generalization.
- **Higher values of `C`** → Less regularization (**more complex model, risk of overfitting**).
- **Lower values of `C`** → More regularization (**simpler model, avoids overfitting**).

### 🔹 Impact of `C` Values

| `C` Value  | Effect on Model |
|------------|---------------|
| **Very Small (`C → 0.0001`)** | Strong regularization, may underfit |
| **Moderate (`C = 1.0`)** | Balanced regularization |
| **Very Large (`C → 10000`)** | Almost no regularization, may overfit |



### Baseline Predictors

In [43]:
# Function to find the optimal C for Logistic Regression
def find_optimal_C_base(Train):
    static_predictors = cv_parameters_base(Train)

    # Define a range of C values to test (logarithmically spaced)
    C_values = np.logspace(-2, 3, 20)   # Testing C from 0.0001 to 10000

    kf = KFold(n_splits=5, shuffle=True, random_state=1)
    C_scores = {}

    for C in C_values:
        lr = LogisticRegression(C=C, solver='liblinear', random_state=1)
        scores = cross_val_score(lr, Train[static_predictors], Train["Target"], cv=kf, scoring='accuracy')
        C_scores[C] = np.mean(scores)

    best_C = max(C_scores, key=C_scores.get)
    print(f"Best C: {best_C:.6f} with Accuracy: {C_scores[best_C]:.4f}")
    return best_C


### Baseline Predictors + Rolling Predictors

In [44]:
# Function to find the optimal C for Logistic Regression
def find_optimal_C_roll(Train):
   # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)

    # Define a range of C values to test (logarithmically spaced)
    C_values = np.logspace(-2, 3, 20)   # Testing C from 0.0001 to 10000

    kf = KFold(n_splits=5, shuffle=True, random_state=1)
    C_scores = {}

    for C in C_values:
        lr = LogisticRegression(C=C, solver='liblinear', random_state=1)
        scores = cross_val_score(lr, Train[all_predictors], Train["Target"], cv=kf, scoring='accuracy')
        C_scores[C] = np.mean(scores)

    best_C = max(C_scores, key=C_scores.get)
    print(f"Best C: {best_C:.6f} with Accuracy: {C_scores[best_C]:.4f}")
    return best_C

 ### Full Feature Set

In [45]:
# Function to find the optimal C for Logistic Regression
def find_optimal_C_full(Train):
   # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)
    # Define a range of C values to test (logarithmically spaced)
    C_values = np.logspace(-2, 3, 20)   # Testing C from 0.0001 to 10000

    kf = KFold(n_splits=5, shuffle=True, random_state=1)
    C_scores = {}

    for C in C_values:
        lr = LogisticRegression(C=C, solver='liblinear', random_state=1)
        scores = cross_val_score(lr, Train[all_predictors], Train["Target"], cv=kf, scoring='accuracy')
        C_scores[C] = np.mean(scores)

    best_C = max(C_scores, key=C_scores.get)
    print(f"Best C: {best_C:.6f} with Accuracy: {C_scores[best_C]:.4f}")
    return best_C

# LDA Shrinkage
## What is Shrinkage in LDA?
Shrinkage is a regularization technique used in **Linear Discriminant Analysis (LDA)** to improve the estimation of the covariance matrix. It blends the empirical covariance matrix with a more structured version, reducing overfitting and improving stability, especially when dealing with high-dimensional data.

## When and Why is Shrinkage Needed?
- When **the number of features is large** compared to the number of samples, the empirical covariance matrix can be poorly estimated.
- Shrinkage **adds regularization** to avoid overfitting and makes the model more robust.
- It is useful when **the covariance matrix is nearly singular or unstable**.
- Works **only with `solver="lsqr"` or `solver="eigen"`**, as these solvers allow regularization.

## How is Shrinkage Controlled?
The shrinkage parameter (`shrinkage`) is a value between **0 and 1**:
- `shrinkage=0`: No shrinkage (uses the empirical covariance matrix).
- `shrinkage=1`: Full shrinkage (uses a diagonalized covariance matrix).
- **Optimal values** can be found via cross-validation (`GridSearchCV`).

### Baseline Predictors

In [46]:
def find_best_shrinkage_base(Train):
    static_predictors = cv_parameters_base(Train)
    param_grid ={"shrinkage": np.linspace(0.0, 1.0, 10)}
    lda = LinearDiscriminantAnalysis(solver="lsqr")
    grid_search = GridSearchCV(lda, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])
    return grid_search.best_params_["shrinkage"]


### Baseline Predictors + Rolling Predictors

In [47]:
def find_best_shrinkage_roll(Train):
    # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)

    param_grid = {"shrinkage": np.linspace(0.0, 1.0, 10)}
    lda = LinearDiscriminantAnalysis(solver="lsqr")
    grid_search = GridSearchCV(lda, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])
    return grid_search.best_params_["shrinkage"]


 ### Full Feature Set

In [48]:
def find_best_shrinkage_full(Train):
     # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)

    param_grid = {"shrinkage": np.linspace(0.0, 1.0, 10)}
    lda = LinearDiscriminantAnalysis(solver="lsqr")
    grid_search = GridSearchCV(lda, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])
    return grid_search.best_params_["shrinkage"]


## Choosing the Best k in K-Nearest Neighbors (KNN)

### What is k in KNN?
In K-Nearest Neighbors (KNN), **k** represents the number of nearest data points used to classify a new instance. Choosing the right **k** value is crucial for balancing bias and variance in the model.

### Why Do We Need to Find the Best k?
Finding the optimal **k** is essential because:

- **Too Small k (e.g., k=1-3)**  
  - The model is **highly sensitive** to noise and outliers.  
  - Leads to **high variance**, meaning the model overfits the training data.  
  - Predictions can be unstable with slight changes in data.  

- **Too Large k (e.g., k > 20)**  
  - The model becomes **too smooth** and may **underfit** the data.  
  - Reduces sensitivity to individual data points, which can decrease accuracy.  
  - Can bias predictions toward the majority class in imbalanced datasets.  

### How to Find the Best k?
To find the optimal **k**, we use **cross-validation** (e.g., GridSearchCV) by testing multiple values of **k** and selecting the one that provides the highest accuracy.

### Key Takeaways:
✅ **A balanced k-value** prevents both overfitting (high variance) and underfitting (high bias).  
✅ **Cross-validation** helps choose the best k without relying on a single dataset split.  
✅ **Typically, k is an odd number** to avoid ties in binary classification.  
✅ **The best k varies per dataset** and should always be determined experimentally.  


### Baseline Predictors

In [49]:
def find_best_k_base(Train):
    static_predictors = cv_parameters_base(Train)
    
    # Define parameter grid for k values (searching between 1 and 20)
    param_grid = {"n_neighbors": np.arange(1, 21)}
    
    knn = KNeighborsClassifier()
    
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(knn, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])
    
    return grid_search.best_params_["n_neighbors"]


### Baseline Predictors + Rolling Predictors

In [50]:
def find_best_k_roll(Train):
    # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)
 
    # Define parameter grid for k values (searching between 1 and 20)
    param_grid = {"n_neighbors": np.arange(1, 21)}
    
    knn = KNeighborsClassifier()
    
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(knn, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])
    
    return grid_search.best_params_["n_neighbors"]

 ### Full Feature Set

In [51]:
def find_best_k_full(Train):
    # Define the feature columns for which we'll calculate rolling averages
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)

    
    # Define parameter grid for k values (searching between 1 and 20)
    param_grid = {"n_neighbors": np.arange(1, 21)}
    
    knn = KNeighborsClassifier()
    
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(knn, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])
    
    return grid_search.best_params_["n_neighbors"]

## **What is C in Support Vector Machines (SVM)?**  

In **Support Vector Machines (SVMs)**, the **C parameter** (regularization parameter) controls the trade-off between maximizing the margin and minimizing classification errors. It determines how much **misclassification is tolerated** when finding the optimal hyperplane.  

### **How C Affects the Model:**  
- **High C (Hard Margin SVM)**:  
  - Enforces strict classification with fewer misclassified points.  
  - Leads to a **smaller margin** and may **overfit** the training data.  
  - Sensitive to noise and outliers.  

- **Low C (Soft Margin SVM)**:  
  - Allows some misclassification for better generalization.  
  - Leads to a **larger margin** and helps **avoid overfitting**.  
  - More robust to noisy data.  

### **Why Do We Need to Find the Best C?**  
Choosing an inappropriate **C** can significantly affect model performance:  
✅ **Too High → Overfitting**: The model memorizes the training data but may fail on unseen data.  
✅ **Too Low → Underfitting**: The model allows too many misclassifications, reducing accuracy.  
✅ **Optimal C → Best Trade-off**: Finding the right **C** ensures the model generalizes well to new data.  

### **How to Find the Best C?**  
We use **cross-validation** (e.g., **GridSearchCV**) to test multiple values of **C** and select the one that gives the highest accuracy on validation data. This ensures the model performs well on both training and unseen data.  


### Baseline Predictors 

In [52]:
from sklearn.svm import SVC
def best_c_base_linear(Train):
    static_predictors = cv_parameters_base(Train)
    
    # Define parameter grid for C values (searching in log scale between 0.001 and 1000)
    param_grid = {"C": np.logspace(-3, 3, 10)}  
    
    svm = SVC(kernel="linear")  # Using a linear kernel
    
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])
    
    return grid_search.best_params_["C"]


### Baseline Predictors + Rolling Predictors

In [53]:
from sklearn.svm import SVC
def best_c_roll_linear(Train):
    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)
    # Define parameter grid for C values (searching in log scale between 0.001 and 1000)
    param_grid = {"C": np.logspace(-3, 3, 10)}  
    
    svm = SVC(kernel="linear")  # Using a linear kernel
    
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])
    
    return grid_search.best_params_["C"]


### Full Feature Set

In [54]:
from sklearn.svm import SVC
def best_c_full_linear(Train):
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)
    
    # Define parameter grid for C values (searching in log scale between 0.001 and 1000)
    param_grid = {"C": np.logspace(-3, 3, 10)}  
    
    svm = SVC(kernel="linear")  # Using a linear kernel
    
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])
    
    return grid_search.best_params_["C"]


## **What is the Degree \( d \) in a Polynomial Kernel for SVM?**  

In **Support Vector Machines (SVMs)** with a **polynomial kernel**, the **degree \( d \)** determines the complexity of the decision boundary. The polynomial kernel is defined as:  

$$
K(x, y) = (x \cdot y + c)^d
$$

where \( d \) is the **degree of the polynomial**, and **higher degrees** create more complex decision boundaries.

### **How the Degree \( d \) Affects the Model:**  
- **Low Degree (e.g., \( d = 2 \))** → Creates **simpler decision boundaries** and reduces the risk of overfitting.  
- **High Degree (e.g., \( d = 5 \) or more)** → Allows **complex decision boundaries** but can lead to overfitting.  
- **Very High Degree (\( d \gg 5 \))** → Can make the model too flexible, capturing noise instead of meaningful patterns.

### **Why Do We Need to Find the Best Degree \( d \)?**  
✅ **Too Low \( d \) → Underfitting**: The model may not capture the true structure of the data.  
✅ **Too High \( d \) → Overfitting**: The model may memorize training data but fail on new data.  
✅ **Optimal \( d \) → Best Generalization**: A well-chosen degree balances flexibility and robustness, leading to good performance on unseen data.

### **How to Find the Best \( d \)?**  
We use **cross-validation** (e.g., **GridSearchCV**) to test multiple values of \( d \) and find the one that provides the best accuracy. This ensures that the model generalizes well rather than just fitting the training data.  


### Baseline Predictors

In [55]:
def best_d_base(Train):
    static_predictors =  cv_parameters_base(Train)
   
    # Define parameter grid for polynomial degree (testing degrees 2 to 5) and C values
    param_grid = {
        "C": np.logspace(-3, 3, 5),  # Testing C values from 0.001 to 1000
        "degree": [2, 3, 4, 5]  # Testing polynomial degrees 2 to 5
    }
    
    svm = SVC(kernel="poly")  # Use polynomial kernel
    results = []
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])

    # Extract best C and degree
    best_C = grid_search.best_params_["C"]
    best_d = grid_search.best_params_["degree"]

    # Store results in a DataFrame
    results_df = pd.DataFrame([{"C": best_C, "degree": best_d}])

    return results_df


### Baseline Predictors + Rolling Predictors

In [56]:
def best_d_roll(Train):
    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)
   
    # Define parameter grid for polynomial degree (testing degrees 2 to 5) and C values
    param_grid = {
        "C": np.logspace(-3, 3, 5),  # Testing C values from 0.001 to 1000
        "degree": [2, 3, 4, 5]  # Testing polynomial degrees 2 to 5
    }
    
    svm = SVC(kernel="poly")  # Use polynomial kernel
    results = []
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])

    # Extract best C and degree
    best_C = grid_search.best_params_["C"]
    best_d = grid_search.best_params_["degree"]

    # Store results in a DataFrame
    results_df = pd.DataFrame([{"C": best_C, "degree": best_d}])

    return results_df


### Full Feature Set

In [57]:
def best_d_full(Train):
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)
    # Define parameter grid for polynomial degree (testing degrees 2 to 5) and C values
    param_grid = {
        "C": np.logspace(-3, 3, 5),  # Testing C values from 0.001 to 1000
        "degree": [2, 3, 4, 5]  # Testing polynomial degrees 2 to 5
    }
    
    svm = SVC(kernel="poly")  # Use polynomial kernel
    results = []
    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])

    # Extract best C and degree
    best_C = grid_search.best_params_["C"]
    best_d = grid_search.best_params_["degree"]

    # Store results in a DataFrame
    results_df = pd.DataFrame([{"C": best_C, "degree": best_d}])

    return results_df


## **Gamma (γ) in Support Vector Machines (SVM)**  

## **What is Gamma (γ)?**  
Gamma (γ) is a hyperparameter used in the **Radial Basis Function (RBF) kernel** of Support Vector Machines (SVM). It controls how far the influence of a single training example reaches, affecting the **complexity** of the decision boundary.  

## **How Does Gamma Work?**  
- A **small γ (e.g., 0.001)** means **far-reaching** influence, leading to a **smoother decision boundary** (more generalized model).  
- A **large γ (e.g., 100)** means **short-range** influence, causing the model to **memorize the training data** and potentially overfit.  

### **Effect of Gamma on Decision Boundary**  
| Gamma (γ) Value | Model Behavior |
|---------------|---------------|
| **Low (e.g., 0.001)** | Simpler decision boundary, may underfit |
| **Medium (e.g., 1.0)** | Balanced generalization and flexibility |
| **High (e.g., 100)** | Complex decision boundary, may overfit |

## **Why is Finding the Best Gamma Important?**  
- **Too small γ** → The model is too simple, leading to high bias and **underfitting**.  
- **Too large γ** → The model is too complex, leading to high variance and **overfitting**.  
- **Optimal γ** → Balances generalization and complexity, leading to **better performance on unseen data**.  



### Baseline Predictors

In [58]:
def best_g_base(Train):
    static_predictors = cv_parameters_base(Train)

    # Define parameter grid for C and gamma (for RBF)
    param_grid = {
        "C": np.logspace(-3, 3, 5),  # C values from 0.001 to 1000
        "gamma": np.logspace(-3, 3, 5)  # Gamma values from 0.001 to 1000
    }

    svm = SVC(kernel="rbf")  # Use RBF kernel

    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[static_predictors], Train["Target"])

    # Extract best C and gamma
    best_C = grid_search.best_params_["C"]
    best_gamma = grid_search.best_params_["gamma"]

    # Store results in a DataFrame
    results_df = pd.DataFrame([{"C": best_C, "gamma": best_gamma}])

    return results_df


### Baseline Predictors + Rolling Predictors

In [59]:
def best_g_roll(Train):
    all_predictors = cv_parameters_roll(Train)
    Train = roll(Train)

    # Define parameter grid for C and gamma (for RBF)
    param_grid = {
        "C": np.logspace(-3, 3, 5),  # C values from 0.001 to 1000
        "gamma": np.logspace(-3, 3, 5)  # Gamma values from 0.001 to 1000
    }

    svm = SVC(kernel="rbf")  # Use RBF kernel

    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])

    # Extract best C and gamma
    best_C = grid_search.best_params_["C"]
    best_gamma = grid_search.best_params_["gamma"]

    # Store results in a DataFrame
    results_df = pd.DataFrame([{"C": best_C, "gamma": best_gamma}])

    return results_df

### Full Feature Set

In [60]:
def best_g_full(Train):
    all_predictors = cv_parameters_full(Train)
    Train = roll(Train)

    # Define parameter grid for C and gamma (for RBF)
    param_grid = {
        "C": np.logspace(-3, 3, 5),  # C values from 0.001 to 1000
        "gamma": np.logspace(-3, 3, 5)  # Gamma values from 0.001 to 1000
    }

    svm = SVC(kernel="rbf")  # Use RBF kernel

    # Perform GridSearchCV with 5-fold cross-validation
    grid_search = GridSearchCV(svm, param_grid, scoring="accuracy", cv=5)
    grid_search.fit(Train[all_predictors], Train["Target"])

    # Extract best C and gamma
    best_C = grid_search.best_params_["C"]
    best_gamma = grid_search.best_params_["gamma"]

    # Store results in a DataFrame
    results_df = pd.DataFrame([{"C": best_C, "gamma": best_gamma}])

    return results_df