# **Distance Predictor Part Lazy Predict**
Author: Declan Costello

Date: 10/22/2023

## **Part Lazy Predict Description**

Here I look at possible models for the final ensemble

## **Table of Context**

1. [Installation](#Installation)
2. Data Import
3. Train Test Split
4. [Lazy Predict](https://lazypredict.readthedocs.io/en/latest/usage.html#regression)

**Installation**
- The following installs the necessary packages

In [2]:
import lazypredict
import pandas as pd
from lazypredict.Supervised import LazyRegressor
from sklearn.model_selection import train_test_split

**Data Import**

In [6]:
data = pd.read_csv('FE_data.csv')

**Train Test Split**

In [8]:
feature_cols = ['launch_angle','launch_speed','pfx_x','pfx_z',"release_speed","home_team","stand","p_throws","fav_platoon_split_for_batter","grouped_pitch_type", "domed", "spray_angle",'is_barrel','Pop','pull_percent']
X = data.loc[:, feature_cols]

categorical_cols = ['home_team',"stand","p_throws",'grouped_pitch_type','fav_platoon_split_for_batter'] 
X = pd.get_dummies(X, columns=categorical_cols, drop_first=True)

target_cols = ['hit_distance_sc']
y = data.loc[:, target_cols]

X_train, X_valid, y_train, y_valid = train_test_split(X, y, train_size=0.95, test_size=0.05,random_state=42,stratify = X[['home_team_COL','is_barrel','stand_R','p_throws_R']])

**[Lazy Predict](https://lazypredict.readthedocs.io/en/latest/usage.html#regression)**

In [13]:
reg = LazyRegressor(verbose=0,ignore_warnings=False, custom_metric=None )

models,predictions = reg.fit(X_train, X_valid, y_train.values.ravel(), y_valid.values.ravel())

'tuple' object has no attribute '__name__'
Invalid Regressor(s)


 24%|██▍       | 10/42 [03:02<15:30, 29.09s/it]

GammaRegressor model failed to execute
Some value(s) of y are out of the valid range of the loss 'HalfGammaLoss'.


 26%|██▌       | 11/42 [03:02<10:28, 20.26s/it]

GaussianProcessRegressor model failed to execute
Unable to allocate 90.7 GiB for an array with shape (110346, 110346) and data type float64


 38%|███▊      | 16/42 [03:56<03:21,  7.75s/it]

KernelRidge model failed to execute
Unable to allocate 90.7 GiB for an array with shape (110346, 110346) and data type float64


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res)


QuantileRegressor model failed to execute
Unable to allocate 90.7 GiB for an array with shape (110346, 110346) and data type float64




[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005516 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2010
[LightGBM] [Info] Number of data points in the train set: 110346, number of used features: 20
[LightGBM] [Info] Start training from score 172.164356


100%|██████████| 42/42 [52:13<00:00, 74.61s/it] 


**Results**

Adjusted R-Squared with 0.98 or better

- HistGradientBoostingRegressor
- LGBMRegressor
- XGBRegressor
- MLPRegressor
- RandomForestRegressor
- ExtraTreesRegressor
- BaggingRegressor
- GradientBoostingRegressor
- DecisionTreeRegressor
- ExtraTreeRegressor

In [14]:
models

Unnamed: 0_level_0,Adjusted R-Squared,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HistGradientBoostingRegressor,0.99,0.99,12.75,1.75
LGBMRegressor,0.99,0.99,12.79,0.87
XGBRegressor,0.99,0.99,12.92,0.83
MLPRegressor,0.99,0.99,13.04,154.75
RandomForestRegressor,0.99,0.99,13.29,297.12
ExtraTreesRegressor,0.99,0.99,13.46,128.09
BaggingRegressor,0.99,0.99,13.89,29.8
GradientBoostingRegressor,0.99,0.99,14.56,47.81
DecisionTreeRegressor,0.98,0.98,19.09,4.17
ExtraTreeRegressor,0.98,0.98,19.49,1.54


In [15]:
predictions

Unnamed: 0_level_0,Adjusted R-Squared,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HistGradientBoostingRegressor,0.99,0.99,12.75,1.75
LGBMRegressor,0.99,0.99,12.79,0.87
XGBRegressor,0.99,0.99,12.92,0.83
MLPRegressor,0.99,0.99,13.04,154.75
RandomForestRegressor,0.99,0.99,13.29,297.12
ExtraTreesRegressor,0.99,0.99,13.46,128.09
BaggingRegressor,0.99,0.99,13.89,29.8
GradientBoostingRegressor,0.99,0.99,14.56,47.81
DecisionTreeRegressor,0.98,0.98,19.09,4.17
ExtraTreeRegressor,0.98,0.98,19.49,1.54
