# **Distance Predictor Part Optuna**
Author: Declan Costello

Date: 10/21/2023

## **Part Optuna Description**

Here I Create hypertune parameters with Optuna

## **Table of Context**

1. Installation
2. Optuna
3. Viz

# **Installation**

The following installs the necessary packages

In [1]:
import optuna
import plotly
import matplotlib
import numpy as np
import pandas as pd
import xgboost as xgb
import sklearn.metrics
import sklearn.datasets
from xgboost import XGBRegressor
import optuna.visualization as ov
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import  RobustScaler, OneHotEncoder

# **Data Import**

In [2]:
data = pd.read_csv('FE_data.csv')

# **Optuna**

In [3]:
feature_cols = ['launch_angle','launch_speed','pfx_x','pfx_z',"release_speed","domed", "spray_angle",'is_barrel','Pop','pull_percent','home_team',"stand","p_throws",'grouped_pitch_type','fav_platoon_split_for_batter']

X = data.loc[:, feature_cols]

categorical_cols = ['home_team',"stand","p_throws",'grouped_pitch_type','fav_platoon_split_for_batter'] 
X = pd.get_dummies(X, columns=categorical_cols, drop_first=True)

target_cols = ['hit_distance_sc'] 
y = data.loc[:, target_cols]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,stratify = X[['is_barrel','domed']])

# Define the objective function for Optuna to optimize
def objective(trial):
    params = {
        "verbosity": 0,
        "objective": "reg:squarederror",  # Regression task
        "eval_metric": "rmse",  # Use RMSE as the evaluation metric
        "n_estimators": trial.suggest_int("n_estimators", 100, 1000),
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "learning_rate": trial.suggest_loguniform("learning_rate", 0.001, 0.1),
        "subsample": trial.suggest_uniform("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.5, 1.0),
        "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
        "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
        "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
        "scale_pos_weight": trial.suggest_float("scale_pos_weight", 1e-6, 10.0, log=True),
        "max_delta_step": trial.suggest_float("max_delta_step", 1e-8, 1.0, log=True),
        "min_split_loss": trial.suggest_float("min_split_loss", 1e-8, 1.0, log=True),
        "max_bin": trial.suggest_int("max_bin", 32, 512),
        "max_leaves": trial.suggest_int("max_leaves", 4, 32),
        "tweedie_variance_power": trial.suggest_float("tweedie_variance_power", 1.0, 2.0),
        "monotone_constraints": trial.suggest_categorical("monotone_constraints", [None, "(-1,1,1,0,0)"]),
    }

    if params["booster"] in ["gbtree", "dart"]:
        params["min_child_weight"] = trial.suggest_int("min_child_weight", 2, 10)
        params["eta"] = trial.suggest_loguniform("eta", 1e-8, 1.0)
        params["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
        params["grow_policy"] = trial.suggest_categorical("grow_policy", ["depthwise", "lossguide"])

    if params["booster"] == "dart":
        params["sample_type"] = trial.suggest_categorical("sample_type", ["uniform", "weighted"])
        params["normalize_type"] = trial.suggest_categorical("normalize_type", ["tree", "forest"])
        params["rate_drop"] = trial.suggest_float("rate_drop", 1e-8, 1.0, log=True)
        params["skip_drop"] = trial.suggest_float("skip_drop", 1e-8, 1.0, log=True)

    model = xgb.XGBRegressor(**params)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    mse = mean_squared_error(y_test, y_pred)
    return mse

# Create and run the Optuna study
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100, timeout=600)  # You can adjust the number of trials

# Print the best hyperparameters
print('Best trial:')
best_trial = study.best_trial
print('  Value: {:.4f}'.format(best_trial.value))
print('  Params: ')
for key, value in best_trial.params.items():
    print('    {}: {}'.format(key, value))


[I 2023-10-21 22:15:33,852] A new study created in memory with name: no-name-5a026d65-d03c-4a4c-8c77-a2e1af99217d
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.001, 0.1),
  "subsample": trial.suggest_uniform("subsample", 0.5, 1.0),
  "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.5, 1.0),
[I 2023-10-21 22:15:35,056] Trial 0 finished with value: 11926.49411481096 and parameters: {'n_estimators': 199, 'max_depth': 8, 'learning_rate': 0.0023195273357063013, 'subsample': 0.72246738371336, 'colsample_bytree': 0.9188697918335424, 'booster': 'gblinear', 'lambda': 4.185675776166643e-08, 'alpha': 0.00026792218120358385, 'scale_pos_weight': 2.1777106365586555e-06, 'max_delta_step': 0.2204688584740738, 'min_split_loss': 8.47322077472643e-08, 'max_bin': 217, 'max_leaves': 30, 'tweedie_variance_power': 1.967490784461485, 'monotone_constraints': '(-1,1,1,0,0)'}. Best is trial 0 with value: 11926.49411481096.
  "learning_rate": trial.suggest_loguniform("learning_rate

Best trial:
  Value: 11926.4941
  Params: 
    n_estimators: 199
    max_depth: 8
    learning_rate: 0.0023195273357063013
    subsample: 0.72246738371336
    colsample_bytree: 0.9188697918335424
    booster: gblinear
    lambda: 4.185675776166643e-08
    alpha: 0.00026792218120358385
    scale_pos_weight: 2.1777106365586555e-06
    max_delta_step: 0.2204688584740738
    min_split_loss: 8.47322077472643e-08
    max_bin: 217
    max_leaves: 30
    tweedie_variance_power: 1.967490784461485
    monotone_constraints: (-1,1,1,0,0)


In [4]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_booster,params_colsample_bytree,params_eta,params_gamma,...,params_monotone_constraints,params_n_estimators,params_normalize_type,params_rate_drop,params_sample_type,params_scale_pos_weight,params_skip_drop,params_subsample,params_tweedie_variance_power,state
0,0,11926.494115,2023-10-21 22:15:33.852983,2023-10-21 22:15:35.056332,0 days 00:00:01.203349,0.0002679222,gblinear,0.91887,,,...,"(-1,1,1,0,0)",199,,,,2e-06,,0.722467,1.967491,COMPLETE
1,1,47596.759319,2023-10-21 22:15:35.056943,2023-10-21 22:21:38.208518,0 days 00:06:03.151575,1.032394e-08,dart,0.777796,7.45817e-05,1.282554e-08,...,,720,tree,0.000249,weighted,0.000204,0.00108,0.828721,1.277906,COMPLETE
2,2,47596.581386,2023-10-21 22:21:38.210969,2023-10-21 22:27:09.186896,0 days 00:05:30.975927,1.238147e-08,dart,0.817482,7.333916e-08,0.8390021,...,,647,forest,0.001392,uniform,6.392154,9e-06,0.840535,1.534871,COMPLETE


# **[viz](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_contour.html)**

In [5]:
ov.plot_param_importances(study)

In [6]:
ov.plot_optimization_history(study)

In [7]:
ov.plot_timeline(study)


plot_timeline is experimental (supported from v3.2.0). The interface can change in the future.



In [8]:
ov.plot_contour(study, params=["subsample", "colsample_bytree"])

In [9]:
ov.plot_contour(study, params=["max_depth", "max_leaves"])

# **[TODO](https://optuna.org/#dashboard)**
- [Create Optuna Dashboard](https://github.com/optuna/optuna-dashboard)