# AutoML

PyCaret
* Main Site - https://pycaret.org/
* Docs - https://pycaret.readthedocs.io/en/latest/

## Table of Contents

* [Setup and Preprocessing](#setup)  
* [Compare Models](#compare)  
* [Create Model](#create)  
* [Tune Model](#tune)  
* [Evaluate Model](#evaluate)  
* [Finalize and Store Model](#finalize_and_store)  

## Imports and Global Settings

In [4]:
import sys
import mlflow
import pandas as pd
from sqlalchemy import create_engine
from pycaret import regression as py_reg
from pycaret import classification as py_cls

sys.path.append('../')
from passkeys import RDS_ENDPOINT, RDS_PASSWORD

# Pandas Settings
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
pd.options.display.max_info_columns = 200
pd.options.display.precision = 5


mlflow.set_tracking_uri('file:/home/jeff/Documents/Data_Science_Projects/NBA_Betting/models/AutoML')

ModuleNotFoundError: No module named 'pycaret'

## Database Connection

In [None]:
username = 'postgres'
password = RDS_PASSWORD
endpoint = RDS_ENDPOINT
database = 'nba_betting'
port = '5432'

connection = create_engine(f'postgresql+psycopg2://{username}:{password}@{endpoint}/{database}').connect()

### Datasets

In [None]:
df = pd.read_sql_table('model_training_data', connection)

### Restrict to previous years games only

In [None]:
df = df[df['league_year_end'] != 23]

In [None]:
df.sort_values('game_id', ascending=False).head()

<a id='basic_data_overview'></a>

## Basic Data Overview

In [None]:
df.info(verbose=True, show_counts=True)

In [None]:
df.head()

## PyCaret - Regression

<a id=setup></a>

### Setup and Preprocessing

In [None]:
target = ['REG_TARGET_actual_home_margin']
drop_features = ['fd_line_home', 'dk_line_home', 'covers_consensus_home', 'game_id',
                 'REG_TARGET_actual_home_margin', 'CLS_TARGET_home_margin_GT_home_spread']
main_features = ['home_team_num', 'away_team_num', 'home_spread',
                 'league_year_end', 'day_of_season', 'elo1_pre',
                 'elo2_pre', 'elo_prob1', 'elo_prob2']
rank_features = [feature for feature in list(df) if 'rank' in feature]
zscore_features = [feature for feature in list(df) if 'zscore' in feature]
                 
other_features = [feature for feature in list(df) if feature not in target + main_features + drop_features]

features_to_use = target + main_features + rank_features + zscore_features

In [None]:
model_ready_df = df[features_to_use]

In [None]:
model_ready_df.info(verbose=True)

The setup process involves a lot of options. Reference the docs below:   
https://pycaret.readthedocs.io/en/latest/api/regression.html#module-pycaret.regression

In [None]:
setup_params = {'log_experiment': True,
                'log_profile': False,
                'log_plots': False,
                'experiment_name': 'NBA_Betting_REG_Main_Features',
                'data': model_ready_df,
                'target': 'REG_TARGET_actual_home_margin',
                'train_size': 0.7,
                'preprocess': True,
                'normalize': False,        # zscore
                'transformation': False,   # yeo-johnson power transform to make data more Gaussian
                'remove_outliers': False,  # using SVD
                'remove_multicollinearity': False,
                'polynomial_features': False,
                'trigonometry_features': False,
                'feature_interaction': False,
                'feature_ratio': False,
                'feature_selection': False,
                'feature_selection_threshold': 0.8,
                'pca': False,
                'pca_components': 10,
                'numeric_features': [],
                'ignore_features': []
               }

In [None]:
nba_betting_regression = py_reg.setup(**setup_params)

<a id=compare></a>

### Compare Models

In [None]:
best_3_models = py_reg.compare_models(n_select=3)

<a id=create></a>

### Create Selected Model

In [None]:
model = py_reg.create_model('lasso')

<a id=tune></a>

### Tune Selected Model

In [None]:
tuned_model = py_reg.tune_model(model)

<a id=evaluate></a>

### Evaluate Model

https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.evaluate_model

In [None]:
py_reg.evaluate_model(tuned_model)

https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.interpret_model

In [None]:
# py_reg.interpret_model(tuned_model)

<a id=finalize_and_store></a>

### Model Finalization and Storage

In [None]:
final_model = py_reg.finalize_model(tuned_model)

In [None]:
# py_reg.save_model(final_model, '../models/AutoML/vlastd_Rank_Lasso_Reg_PyCaret')

In [None]:
# !mlflow ui

## Classification

<a id=setup></a>

### Setup and Preprocessing

In [None]:
target = ['CLS_TARGET_home_margin_GT_home_spread']
drop_features = ['fd_line_home', 'dk_line_home', 'covers_consensus_home', 'game_id',
                 'REG_TARGET_actual_home_margin', 'CLS_TARGET_home_margin_GT_home_spread']
main_features = ['home_team_num', 'away_team_num', 'home_spread',
                 'league_year_end', 'day_of_season', 'elo1_pre',
                 'elo2_pre', 'elo_prob1', 'elo_prob2']
rank_features = [feature for feature in list(df) if 'rank' in feature]
zscore_features = [feature for feature in list(df) if 'zscore' in feature]
                 
other_features = [feature for feature in list(df) if feature not in target + main_features + drop_features]

features_to_use = target + main_features

In [None]:
model_ready_df = df[features_to_use]

In [None]:
model_ready_df.info()

The setup process involves a lot of options. Reference the docs below:   
https://pycaret.readthedocs.io/en/latest/api/regression.html#module-pycaret.regression

In [None]:
setup_params = {'log_experiment': True,
                'log_profile': False,
                'log_plots': False,
                'experiment_name': 'NBA_Betting_CLS_Main_Features',
                'data': model_ready_df,
                'target': 'CLS_TARGET_home_margin_GT_home_spread',
                'train_size': 0.7,
                'preprocess': True,
                'normalize': False,        # zscore
                'transformation': False,   # yeo-johnson power transform to make data more Gaussian
                'remove_outliers': False,  # using SVD
                'remove_multicollinearity': False,
                'polynomial_features': False,
                'trigonometry_features': False,
                'feature_interaction': False,
                'feature_ratio': False,
                'feature_selection': False,
                'feature_selection_threshold': 0.8,
                'pca': False,
                'pca_components': 10,
                'numeric_features': ['league_year_end'],
                'ignore_features': []
               }

In [None]:
nba_betting_classification = py_cls.setup(**setup_params)

<a id=compare></a>

### Compare Models

In [None]:
best_3_models = py_cls.compare_models(n_select=3)

<a id=create></a>

### Create Selected Model

In [None]:
model = py_cls.create_model('lr')

<a id=tune></a>

### Tune Selected Model

In [None]:
tuned_model = py_cls.tune_model(model)

<a id=evaluate></a>

### Evaluate Model

https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.evaluate_model

In [None]:
py_cls.evaluate_model(tuned_model)

https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.interpret_model

In [None]:
# py_cls.interpret_model(tuned_model)

<a id=finalize_and_store></a>

### Model Finalization and Storage

In [None]:
final_model = py_cls.finalize_model(tuned_model)

In [None]:
# py_cls.save_model(final_model, '../models/AutoML/vlastd_rank_LR_CLS_PyCaret')

In [None]:
#!mlflow ui