# MLOps Training

This notebook give a exemple on how to use MLOps to training a ML model

### MLOpsTrainingClient

It's where you can manage your trainining experiments

In [1]:
from mlops_codex.training import MLOpsTrainingClient

### Initializing the MLOpsTrainingClient
In this cell, we are initializing the `MLOpsTrainingClient` which will be used to manage our training experiments.

In [2]:
client = MLOpsTrainingClient()
client

March 19, 2025 | INFO: __init__ Loading .env
March 19, 2025 | INFO: __init__ Successfully connected to MLOps


Codex version 2.2.8

## MLOpsTrainingExperiment

It's where you can create a training experiment to find the best model

#### Custom training

With Custom training, you have to create the training function. For you, as a data scientist, it's common to re-run the entire notebook, over and over. To avoid creating the same experiment repeatedly, the `force = False` parameter will disallow it. If you wish to create a new experiment with the same attributes, turn `force = True`.

If you have two equal experiments and pass `force = False`, the first created experiment will be chosen.

In [3]:
# Creating a new training experiment
training = client.create_training_experiment(
    experiment_name='experiment',
    model_type='Classification',
    group='<group>',
    force=False
)

March 19, 2025 | INFO: create_training_experiment Trying to load experiment...
March 19, 2025 | INFO: create_training_experiment Could not find experiment. Creating a new one...
March 19, 2025 | INFO: __register_training Training 'Te1cbbc0f00744739c548ce805215487caff0389f3ad47dc9df62d056144ec0c' created
March 19, 2025 | INFO: __init__ Loading .env
March 19, 2025 | INFO: __init__ Successfully connected to MLOps


If you wish to load a training, but you don't remember the Training hash, you can easily check using the `.list()` method. There are three modes available:
- `dict`: Default. It returns a list of dictionaries containing information about all the trainings
- `count`: Return the number of training you have
- `log`: Doesn't return anything, It will only show information about the trainings in MLOps plataform

In [3]:
result = client.list()
print(result)

[{'TrainingHash': 'Te1cbbc0f00744739c548ce805215487caff0389f3ad47dc9df62d056144ec0c', 'ExperimentName': 'experiment', 'ExperimentsQuantity': 0, 'GroupName': 'datarisk', 'ModelType': 'Classification', 'LastModificationDate': '2025-03-19T14:04:52.55242+00:00', 'RegisteredAt': '2025-03-19T14:04:52.55242+00:00'}, {'TrainingHash': 'T7c73dbd7ae245c2a0604943ae4f27f246c1573b34d9446a98785d8b050d1d93', 'ExperimentName': 'Teste', 'ExperimentsQuantity': 1, 'GroupName': 'datarisk', 'ModelType': 'Classification', 'LastModificationDate': '2025-03-19T13:54:15.285787+00:00', 'RegisteredAt': '2025-03-19T13:51:28.83022+00:00'}, {'TrainingHash': 'T21e6f6f3e784b9f89c5715217af0c38e417077cee4240e28021016881d96fb4', 'ExperimentName': 'custom Train1', 'ExperimentsQuantity': 1, 'GroupName': 'datarisk', 'ModelType': 'Classification', 'LastModificationDate': '2025-03-19T12:40:37.494559+00:00', 'RegisteredAt': '2025-03-19T12:38:18.716037+00:00'}, {'TrainingHash': 'T93c469877dd414e9f68f8d8cbe6d03f8555dac3bcb2413689

In [4]:
qt = client.list(mode='count')
print(qt)

4


In [5]:
client.list(mode='log')

- ExperimentName: experiment
  ExperimentsQuantity: 0
  GroupName: datarisk
  LastModificationDate: '2025-03-19T14:04:52.55242+00:00'
  ModelType: Classification
  RegisteredAt: '2025-03-19T14:04:52.55242+00:00'
  TrainingHash: Te1cbbc0f00744739c548ce805215487caff0389f3ad47dc9df62d056144ec0c
- ExperimentName: Teste
  ExperimentsQuantity: 1
  GroupName: datarisk
  LastModificationDate: '2025-03-19T13:54:15.285787+00:00'
  ModelType: Classification
  RegisteredAt: '2025-03-19T13:51:28.83022+00:00'
  TrainingHash: T7c73dbd7ae245c2a0604943ae4f27f246c1573b34d9446a98785d8b050d1d93
- ExperimentName: custom Train1
  ExperimentsQuantity: 1
  GroupName: datarisk
  LastModificationDate: '2025-03-19T12:40:37.494559+00:00'
  ModelType: Classification
  RegisteredAt: '2025-03-19T12:38:18.716037+00:00'
  TrainingHash: T21e6f6f3e784b9f89c5715217af0c38e417077cee4240e28021016881d96fb4
- ExperimentName: custom Train1
  ExperimentsQuantity: 2
  GroupName: datarisk
  LastModificationDate: '2025-03-19T12:36

Or load a training

In [3]:
# Creating a new training experiment
training = client.get_training(training_hash="<TRAINING_HASH>", group='<GROUP>')

March 19, 2025 | INFO: __init__ Loading .env
March 19, 2025 | INFO: __init__ Successfully connected to MLOps


In [4]:
training

MLOpsTrainingExperiment(name="custom Train1", 
                                                        group="datarisk", 
                                                        training_id="T93c469877dd414e9f68f8d8cbe6d03f8555dac3bcb241368971ad7ef38f15e4",
                                                        model_type=Classification
                                                        )

Now, create your training!

In [None]:
# With the experiment class we can create multiple model runs
PATH = './samples/train/'

run = training.run_training(
    run_name='First test',
    training_type='Custom',
    train_data=PATH + 'dados.csv',
    requirements_file=PATH + 'requirements.txt',
    source_file=PATH + 'app.py',
    python_version='3.9',
    training_reference='train_model',
    wait_complete=True
)

To get information about your training executions, you can use the `training.executions()` method. There are three modes available:

- `dict`: Default. It returns the dictionary with content about your training
- `count`: Returns how much executions are associated with the training hash
- `log`: Does not return anything, it only logs the information

In [5]:
# result = training.executions(mode='dict')
result = training.executions()

In [6]:
print(result)

{'TrainingHash': 'T93c469877dd414e9f68f8d8cbe6d03f8555dac3bcb241368971ad7ef38f15e4', 'ExperimentName': 'custom Train1', 'ExperimentsQuantity': 2, 'GroupName': 'datarisk', 'ModelType': 'Classification', 'LastModificationDate': '2025-03-19T12:36:01.263111+00:00', 'RegisteredAt': '2025-03-19T12:30:14.757279+00:00'}


In [7]:
qt = training.executions(mode='count')

In [8]:
print(qt)

2


In [9]:
training.executions(mode='log')

ExperimentName: custom Train1
ExperimentsQuantity: 2
GroupName: datarisk
LastModificationDate: '2025-03-19T12:36:01.263111+00:00'
ModelType: Classification
RegisteredAt: '2025-03-19T12:30:14.757279+00:00'
TrainingHash: T93c469877dd414e9f68f8d8cbe6d03f8555dac3bcb241368971ad7ef38f15e4



#### AutoML

With AutoML you just need to upload the data and some configuration

In [None]:
PATH = './samples/autoML/'

run2 = training.run_training(
    run_name='First test',
    training_type='AutoML',
    conf_dict=PATH + "conf.json",
    train_data=PATH + 'dados.csv',
    wait_complete=True
)