# AutoGluon
AutoGluon is a fully automated machine learning (ML) toolkit that simplifies the process of developing and fine-tuning machine learning models. Designed for both beginners and experienced ML practitioners, AutoGluon enables users to achieve high-quality model performance with minimal effort and domain knowledge. At its core, AutoGluon automates model selection, hyperparameter tuning, and ensemble creation, allowing for the efficient handling of various types of data, including tabular, image, and text datasets. By abstracting the complexity of underlying algorithms, AutoGluon facilitates rapid prototyping and deployment of ML applications, making advanced ML techniques accessible and practical for a wide range of applications. 

## Setup

In [None]:
import sys
import os

# Get the current working directory
current_working_directory = os.getcwd()

# Go up one level from the current working directory
parent_directory = os.path.join(current_working_directory, '..')

# Add the parent directory to sys.path
sys.path.append(parent_directory)

os.getcwd()

In [None]:
%pip install autogluon
%pip install scikit-learn

In [None]:
%load_ext autoreload

In [None]:
%autoreload 

from sklearn.metrics import accuracy_score, classification_report
import autogluon.core as ag
from autogluon.tabular import TabularDataset, TabularPredictor

from src.features.post_processor import save_predictions
from src.features.ml_service import  prepare_data, prepare_test_data
from src.config import TARGET_FEATURES

## Load data

In [None]:
x_train, _, x_test, y_train, _, y_test = prepare_data(validation_size=0, test_size=0.1)
for target_feature_name in TARGET_FEATURES:
    x_train[target_feature_name] = y_train

data = TabularDataset(x_train)

## Train model

AutoGloun does not require any hyperparameters to be set. It will automatically select the best model and hyperparameters based on the data.
Nor does it need any tuning data. It will automatically split the data into training and validation sets. AutoGloun will split the data more intelligently to fit its needs.

**Evaluation metrics:**
* 'f1' (for binary classification)
* 'roc_auc' (for binary classification)
* 'log_loss' (for classification)
* 'mean_absolute_error' (for regression)
* 'median_absolute_error' (for regression) 
* You can also define your own custom metric function, see examples in the folder: autogluon/core/metrics/
See autoGluon documentation for more details: [AutoGluon Documentation](https://auto.gluon.ai/scoredebugweight/tutorials/tabular_prediction/tabular-quickstart.html)

One should also look at parameters: `num_bag_folds`, `num_bag_sets` and `num_stack_levels` parameters. These parameters can help to improve the model's performance.

To see all possible parameters for the .fit() method, see the [AutoGluon documentation .fit()](https://auto.gluon.ai/scoredebugweight/api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.fit)

In [None]:
# Initialize the AutoGluon TabularPredictor
time_limit = 24*60*60 # Set this to longest time you are willing to wait (in seconds)
metric = 'roc_auc'
predictor = TabularPredictor(label=target_feature_name, eval_metric=metric).fit(data, time_limit=time_limit, presets='best_quality')

### Loading pre-trained model
AutoGluon provides a simple way to load a pre-trained model.

In [None]:
if predictor is None:
    # TODO Correct model path to the one that was saved
    model_path = "AutogluonModels/ag-20240326_133920/"
    predictor = TabularPredictor.load(model_path)
predictor.fit_summary()

## Make predictions

In [None]:
# Evaluate on the test set
y_test_pred = predictor.predict(x_test)
test_accuracy = accuracy_score(y_test, y_test_pred)
print("Test Accuracy: ", test_accuracy)
print("Test Classification Report:\n", classification_report(y_test, y_test_pred))
# predictor.leaderboard(x_test, silent=True)


## Save model

In [None]:
x_test = prepare_test_data()
final_predictions = predictor.predict(x_test)

In [None]:
# Save the final predictions as a CSV file
save_predictions(final_predictions, f'predictions_auto_gluon')