## AutoML Training

### Steps to perform AutoML Model Training on Your Data

To train the models on your data, follow these four steps:

1. **Read the Data**  
   Load your dataset for processing and model training.

2. **Split the Data into X and y**  
   Separate the features (X) and the target variable (y) for training.

3. **Specify the Training Config**  
   Set up the configuration parameters required for training.

4. **Initialize AutoML class and call the `fit` function**  
   Pass the training config to the AutoML class and your data to the `fit` function to start the training process.


In [None]:
import pandas as pd
from automl import AutoML

# 1. Read the data
train_data = pd.read_csv('data/train_titanic.csv')

# Split the data into X and y(target)
X, y = train_data.drop('Survived', axis=1), train_data['Survived']

# 3. Specify your configs according to your needs
# example config below
configs = {

    "ignore_columns_for_training": ['Name'],
    "fit_numerical_to_categorical": ['Pclass'],
    "preproc_steps": [
        {
            'step': 'impute',
            'method': 'mean',
            'columns_to_include': ['Age', 'Fare']
        },
        {
            'step': 'impute',
            'method': 'mode',
            'columns_to_include': ['Sex', 'Pclass']
        },
        {
            'step': 'encode',
            'method': 'one_hot',
            'columns_to_include': ['Sex', 'Pclass']
        },
        {
            'step': 'scale',
            'method': 'standard',
            'columns_to_include': ['Age', 'Fare']
        },
        {
            'step': 'scale',
            'method': 'min_max',
            'columns_to_include': ['Age', 'Fare']
        },
        {
            'step': 'outlier',
            'method': 'robust',
            'columns_to_include': ['Age', 'Fare']
        },
        {
            'step': 'skew',
            'method': 'yeo_johnson',
            'columns_to_include': ['Age', 'Fare']
        }
    ],

    'include_features': 4,
    'validation_split_size': 0.5,
    'cv_folds': 7,
    'task': 'Classification',
    'ensemble': False,
    'stacking': True,
    'tune': True,
    'include_models': ['LogisticRegression', 'DecisionTree', 'KNN', 'XGBoost'],
    'focus': 'recall',

    'experiment_name': 'titanic-experiment'
}

# 4. Call AutoML fit function
# for training the train parameter would be by default True
automl = AutoML(configs, train = True)
automl.fit(X, y)

After this check Mlflow to see the logged artifacts and compare the performance of different models.

Based on the performance select the model using which you want to do the inference / predictions.

## AutoML Inference

### Steps to perform AutoML Inference on Your Data

To perform inference, follow these three steps:

1. **Read the Data**  
   Load your dataset and make sure it has the same data that has been used during training as well.

2. **Specify the Inference Config**  
   Specify just the experiment name and model name. AutoML fetches the corresponding model artifacts from that and does the inference.

4. **Call the AutoML `predict` / `predict_proba` Function**  
   Pass the inference config to AutoML class and pass the data to `predict` / `predict_proba` function to do the  inference.


In [None]:
# 1. Read the inference data
inferenced_data = pd.read_csv('data/test_titanic.csv')

# 2. Specify the inference config. For inference these two fields are alone enough. Get these from Mlflow
configs = {
    "experiment_name": "titanic-experiment",
    "model_name": "logistic "
}

# 3. Do the inference. 
# For inference set the train parameter to False
automl = AutoML(configs, train = False)
predictions = automl.predict(inferenced_data)
print(predictions)