# MLOps Training

This notebook give a exemple on how to use MLOps to training a ML model

### MLOpsTrainingClient

It's where you can manage your trainining experiments

In [1]:
from mlops_codex.training import MLOpsTrainingClient

In [None]:
# Start the client. We are reading the credentials in the NEOMARIL_TOKEN env variable

client = MLOpsTrainingClient()
client

## MLOpsTrainingExperiment

It's where you can create a training experiment to find the best model

#### Custom training

With Custom training, you have to create the training function. For you, as a data scientist, it's common to re-run the entire notebook, over and over. To avoid creating the same experiment repeatedly, the `force = False` parameter will disallow it. If you wish to create a new experiment with the same attributes, turn `force = True`.

If you have two equal experiments and pass `force = False`, the first created experiment will be chosen.

In [None]:
# Creating a new training experiment
training = client.create_training_experiment(
    experiment_name='Teste notebook',   # Experiment name, this is how you find your model in MLFLow
    model_type='Classification',        # Model type. Can be Classification, Regression or Unsupervised
    group='test1',                  # This is the default group. Create a new one when using for a new project,
    # force=True                        # Forces to create a new experiment with the same attributes
)

In [None]:
training

In [None]:
# With the experiment class we can create multiple model runs
PATH = './samples/train/'

run = training.run_training(
    run_name='First test', # Run name
    train_data=PATH+'dados.csv', # Path to the file with training data
    source_file=PATH+'app.py', # Path of the source file
    requirements_file=PATH+'requirements.txt', # Path of the requirements file, 
    # env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    training_reference='train_model', # The name of the entrypoint function that is going to be called inside the source file 
    training_type='Custom',
    python_version='3.9', # Can be 3.8 to 3.10
    wait_complete=True
)

In [None]:
run.get_status()

In [None]:
run.execution_info()

In [None]:
# When the run is finished you can download the model file
run.download_result()

In [None]:
# or promote promete it to a deployed model

PATH = './samples/syncModel/'

model = run.promote_model(
    model_name='Teste notebook promoted custom', # model_name
    model_reference='score', # name of the scoring function
    source_file=PATH+'app.py', # Path of the source file
    schema=PATH+'schema.json', # Path of the schema file, but it could be a dict
    # env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    operation="Sync" # Can be Sync or Async
)

In [None]:
model

#### AutoML

With AutoML you just need to upload the data and some configuration

In [None]:
PATH = './samples/autoML/'

run = training.run_training(
    run_name='First test', # Run name
    training_type='AutoML',
    train_data=PATH+'dados.csv', # Path to the file with training data
    conf_dict=PATH+'conf.json', # Path of the configuration file
    wait_complete=True
)

In [None]:
run

In [None]:
run.get_status()

In [None]:
# Promote a AutoML model is a lot easier

PATH = './samples/autoML/'
MODEL_PATH = './samples/syncModel/'

model = run.promote_model(
    model_name='Teste notebook promoted autoML', # model_name
    operation="Async", # Can be Sync or Async,
    input_type="json",
    schema=PATH+'schema.json'
)

In [None]:
model

#### Complete flow in MLOps

In this next cells, you'll learn how to use the MLOPs platform from end-to-end

Import and load training and preprocessing clients

In [None]:
from mlops_codex.preprocessing import MLOpsPreprocessingClient
from mlops_codex.training import MLOpsTrainingClient

t_client = MLOpsTrainingClient()

p_client = MLOpsPreprocessingClient()

Create a new training experiment

In [None]:
# Creating a new training experiment
training = t_client.create_training_experiment(
    experiment_name='Teste notebook',
    model_type='Classification',
    group='<insert_group>',
)

PATH = './samples/completeFlow/customTrain/'

run = training.run_training(
    run_name='First test',
    train_data=PATH+'base_completa.parquet',
    source_file=PATH+'app.py',
    requirements_file=PATH+'requirements.txt',
    training_reference='train_model',
    training_type='Custom',
    python_version='3.9',
    wait_complete=True
)

Promote the experiment to a model

In [None]:
PATH = './samples/completeFlow/model/'

model = run.promote_model(
    model_name='Teste notebook promoted custom',
    model_reference='score',
    source_file=PATH+'app.py',
    schema=PATH+'schema.parquet',
    operation="Async",
    input_type="parquet",
    wait_complete=True
)

In [None]:
model.set_token("TOKEN")

In [None]:
model.get_logs()

In [None]:
model.info()

Create a new preprocessing

In [None]:
PATH = "./samples/asyncPreprocessingMultiple/"

schemas = [
    ("base_cadastral", PATH+'base_cadastral.csv'),
    ("base_pagamentos", PATH+'base_pagamentos.csv'),
    ("base_info", PATH+'base_info.csv'),
]

preprocess = p_client.create(
    preprocessing_name='test_preprocessing',
    preprocessing_reference='build_df',
    source_file=PATH+'app.py',
    requirements_file=PATH+'requirements.txt',
    schema=schemas,
    python_version='3.9',
    operation="Async",
    group='<insert_group>',
    wait_complete=True

)

You can predict your data using preprocessed data. If you do it, a preprocessed dataset will be automatically installed on your computer. It will be called `preprocessed_data.parquet`

In [None]:
execution1 = model.predict(
    data=schemas,
    preprocessing=preprocess,
    group_token="TOKEN",
    wait_complete=True
)

However, you can predict data directly, without a preprocessing:

In [None]:
PATH = "./samples/completeFlow/model/"
execution2 = model.predict(
    data=PATH + "input.parquet"
)

In [None]:
model.execution_info(execution_id=1)

After aprove your model, register a monitoring

In [None]:
PATH = "./samples/completeFlow/monitoring/"
model.register_monitoring(
    preprocess_reference="build_df",
    shap_reference="get_shap",
    configuration_file=PATH+"configuration.json",
    preprocess_file=PATH+"preprocess_async.py",
    wait_complete=True
)