# **Distance Predictor Part Optuna**
Author: Declan Costello

Date: 10/21/2023

## **Part Optuna Description**

Here I Create hypertune parameters with Optuna

## **Table of Context**

1. Installation
2. Optuna
3. Viz

# **Installation**

The following installs the necessary packages

In [1]:
import optuna
import plotly
import matplotlib
import numpy as np
import pandas as pd
import xgboost as xgb
import sklearn.metrics
import sklearn.datasets
from xgboost import XGBRegressor
import optuna.visualization as ov
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import  RobustScaler, OneHotEncoder

# **Data Import**

In [2]:
data = pd.read_csv('FE_data.csv')

# **Optuna**

In [3]:
feature_cols = ['launch_angle','launch_speed','pfx_x','pfx_z',"release_speed","domed", "spray_angle",'is_barrel','Pop','pull_percent','home_team',"stand","p_throws",'grouped_pitch_type','fav_platoon_split_for_batter']

X = data.loc[:, feature_cols]

categorical_cols = ['home_team',"stand","p_throws",'grouped_pitch_type','fav_platoon_split_for_batter'] 
X = pd.get_dummies(X, columns=categorical_cols, drop_first=True)

target_cols = ['hit_distance_sc'] 
y = data.loc[:, target_cols]

In [4]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,stratify = X[['home_team_COL','is_barrel','stand_R','p_throws_R']])

# Define the objective function for Optuna to optimize
def objective(trial):
    params = {
        "verbosity": 0,
        "objective": "reg:squarederror",  # Regression task
        "eval_metric": "rmse",  # Use RMSE as the evaluation metric
        "n_estimators": trial.suggest_int("n_estimators", 100, 1000),
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "learning_rate": trial.suggest_loguniform("learning_rate", 0.001, 0.1),
        "subsample": trial.suggest_uniform("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.5, 1.0),
        "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
        "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
        "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
        "scale_pos_weight": trial.suggest_float("scale_pos_weight", 1e-6, 10.0, log=True),
        "max_delta_step": trial.suggest_float("max_delta_step", 1e-8, 1.0, log=True),
        "min_split_loss": trial.suggest_float("min_split_loss", 1e-8, 1.0, log=True),
        "max_bin": trial.suggest_int("max_bin", 32, 512),
        "max_leaves": trial.suggest_int("max_leaves", 4, 32),
        "tweedie_variance_power": trial.suggest_float("tweedie_variance_power", 1.0, 2.0),
        "monotone_constraints": trial.suggest_categorical("monotone_constraints", [None, "(-1,1,1,0,0)"]),
    }

    if params["booster"] in ["gbtree", "dart"]:
        params["min_child_weight"] = trial.suggest_int("min_child_weight", 2, 10)
        params["eta"] = trial.suggest_loguniform("eta", 1e-8, 1.0)
        params["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
        params["grow_policy"] = trial.suggest_categorical("grow_policy", ["depthwise", "lossguide"])

    if params["booster"] == "dart":
        params["sample_type"] = trial.suggest_categorical("sample_type", ["uniform", "weighted"])
        params["normalize_type"] = trial.suggest_categorical("normalize_type", ["tree", "forest"])
        params["rate_drop"] = trial.suggest_float("rate_drop", 1e-8, 1.0, log=True)
        params["skip_drop"] = trial.suggest_float("skip_drop", 1e-8, 1.0, log=True)

    model = xgb.XGBRegressor(**params)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    mse = mean_squared_error(y_test, y_pred)
    return mse

# Create and run the Optuna study
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100, timeout=600)  # You can adjust the number of trials

# Print the best hyperparameters
print('Best trial:')
best_trial = study.best_trial
print('  Value: {:.4f}'.format(best_trial.value))
print('  Params: ')
for key, value in best_trial.params.items():
    print('    {}: {}'.format(key, value))


[I 2023-10-22 15:40:47,836] A new study created in memory with name: no-name-99c61598-51ca-4683-bef9-9dd06b174250
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.001, 0.1),
  "subsample": trial.suggest_uniform("subsample", 0.5, 1.0),
  "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.5, 1.0),
[I 2023-10-22 15:40:52,094] Trial 0 finished with value: 10738.585753134497 and parameters: {'n_estimators': 731, 'max_depth': 4, 'learning_rate': 0.0010710047639041912, 'subsample': 0.9424360104054774, 'colsample_bytree': 0.5034239920578778, 'booster': 'gblinear', 'lambda': 0.42675662598085534, 'alpha': 0.0005354464058488776, 'scale_pos_weight': 5.618742956129845e-06, 'max_delta_step': 0.20079119608673593, 'min_split_loss': 0.06099540881538169, 'max_bin': 470, 'max_leaves': 23, 'tweedie_variance_power': 1.9973805925374783, 'monotone_constraints': None}. Best is trial 0 with value: 10738.585753134497.
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.001,

Best trial:
  Value: 10738.5858
  Params: 
    n_estimators: 731
    max_depth: 4
    learning_rate: 0.0010710047639041912
    subsample: 0.9424360104054774
    colsample_bytree: 0.5034239920578778
    booster: gblinear
    lambda: 0.42675662598085534
    alpha: 0.0005354464058488776
    scale_pos_weight: 5.618742956129845e-06
    max_delta_step: 0.20079119608673593
    min_split_loss: 0.06099540881538169
    max_bin: 470
    max_leaves: 23
    tweedie_variance_power: 1.9973805925374783
    monotone_constraints: None


In [5]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_booster,params_colsample_bytree,params_eta,params_gamma,...,params_monotone_constraints,params_n_estimators,params_normalize_type,params_rate_drop,params_sample_type,params_scale_pos_weight,params_skip_drop,params_subsample,params_tweedie_variance_power,state
0,0,10738.585753,2023-10-22 15:40:47.836930,2023-10-22 15:40:52.093838,0 days 00:00:04.256908,0.000535,gblinear,0.503424,,,...,,731,,,,6e-06,,0.942436,1.997381,COMPLETE
1,1,47383.70305,2023-10-22 15:40:52.094474,2023-10-22 15:41:10.498867,0 days 00:00:18.404393,0.006502,gbtree,0.991509,4e-06,3.1e-05,...,"(-1,1,1,0,0)",873,,,,9.752522,,0.613248,1.818351,COMPLETE
2,2,47403.421196,2023-10-22 15:41:10.499465,2023-10-22 15:41:50.301493,0 days 00:00:39.802028,0.00166,dart,0.746317,0.716371,0.180983,...,"(-1,1,1,0,0)",227,tree,4e-06,uniform,0.294155,1e-06,0.721042,1.773353,COMPLETE
3,3,45337.128731,2023-10-22 15:41:50.303648,2023-10-22 15:52:00.243963,0 days 00:10:09.940315,2.2e-05,dart,0.591218,0.000188,0.000855,...,"(-1,1,1,0,0)",922,tree,0.027392,weighted,8e-06,1.2e-05,0.773255,1.663299,COMPLETE


# **[viz](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_contour.html)**

In [6]:
ov.plot_param_importances(study)

In [7]:
ov.plot_optimization_history(study)

In [8]:
ov.plot_timeline(study)


plot_timeline is experimental (supported from v3.2.0). The interface can change in the future.



In [9]:
ov.plot_contour(study, params=["subsample", "colsample_bytree"])

In [10]:
ov.plot_contour(study, params=["max_depth", "max_leaves"])

# **[TODO](https://optuna.org/#dashboard)**
- [Create Optuna Dashboard](https://github.com/optuna/optuna-dashboard)