# MLOps Training

This notebook give a exemple on how to use MLOps to training a ML model

### MLOpsTrainingClient

It's where you can manage your trainining experiments

In [1]:
from mlops_codex.training import MLOpsTrainingClient

### Initializing the MLOpsTrainingClient
In this cell, we are initializing the `MLOpsTrainingClient` which will be used to manage our training experiments.

In [2]:
client = MLOpsTrainingClient()
client

May 7, 2025 | INFO: __init__ Loading .env
May 7, 2025 | INFO: __init__ Successfully connected to MLOps


API version 1.0 
 Token="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlFnc0JWQ0I5WFc0V1YtSkVCVkJiZyJ9.eyJodHRwczovL25lb21hcmlsLmRhdGFyaXNrLm5ldC9uZW9tYXJpbC1ncm91cCI6ImRhdGFyaXNrIiwiaHR0cHM6Ly9uZW9tYXJpbC5kYXRhcmlzay5uZXQvZW1haWwiOiJqb2hucmM2OTJAZ21haWwuY29tIiwiaHR0cHM6Ly9uZW9tYXJpbC5kYXRhcmlzay5uZXQvdGVuYW50IjoiZGF0YXJpc2siLCJodHRwczovL25lb21hcmlsLmRhdGFyaXNrLm5ldC90ZW5hbnQtYWN0aXZlIjp0cnVlLCJodHRwczovL25lb21hcmlsLmRhdGFyaXNrLm5ldC91c2VyLXBsYW4iOiJUcmlhbCIsImh0dHBzOi8vbmVvbWFyaWwuZGF0YXJpc2submV0L3VzZXItYWN0aXZlIjp0cnVlLCJodHRwczovL25lb21hcmlsLmRhdGFyaXNrLm5ldC9yb2xlIjoibWFzdGVyIiwiaXNzIjoiaHR0cHM6Ly9kZXYtbWszbzdsYXp4bGUzMGh3cS51cy5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NjU0OTRlMWFkOTUzN2FlMGFhZDZjNGE5IiwiYXVkIjpbImh0dHBzOi8vZGV2LW1rM283bGF6eGxlMzBod3EudXMuYXV0aDAuY29tL2FwaS92Mi8iLCJodHRwczovL2Rldi1tazNvN2xhenhsZTMwaHdxLnVzLmF1dGgwLmNvbS91c2VyaW5mbyJdLCJpYXQiOjE3NDY2NDMzMDMsImV4cCI6MTc0NjY1NDEwMywic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCBhZGRyZXNzIHBob25lIHJlYWQ6Y3VycmVudF91c2VyIHVwZGF0ZTpjdX

## MLOpsTrainingExperiment

It's where you can create a training experiment to find the best model

#### Custom training

With Custom training, you have to create the training function. For you, as a data scientist, it's common to re-run the entire notebook, over and over. To avoid creating the same experiment repeatedly, the `force = False` parameter will disallow it. If you wish to create a new experiment with the same attributes, turn `force = True`.

If you have two equal experiments and pass `force = False`, the first created experiment will be chosen.

In [3]:
# Creating a new training experiment
training = client.create_training_experiment(
    experiment_name='experiment',
    model_type='Classification',
    group='datarisk',
)

May 7, 2025 | INFO: create_training_experiment Trying to load experiment...
May 7, 2025 | INFO: __get_repeated_thash Found experiment with same attributes...
May 7, 2025 | INFO: __init__ Loading .env
May 7, 2025 | INFO: __init__ Successfully connected to MLOps


In [4]:
training

MLOpsTrainingExperiment(name="experiment", 
                                                        group="datarisk", 
                                                        training_id="T6cc61022f2640698545be2b931489921f29c9bae8844bc694361ee1a1d14918",
                                                        model_type=Classification
                                                        )

In [None]:
# With the experiment class we can create multiple model runs
PATH = './samples/train/'

run = training.run_training(
    run_name='First test',
    training_type='Custom',
    train_data=PATH + 'dados.csv',
    requirements_file=PATH + 'requirements.txt',
    source_file=PATH + 'app.py',
    python_version='3.9',
    training_reference='train_model',
    wait_complete=True
)

May 7, 2025 | INFO: __upload_training Result
DatasetHash: D4b0221dd4ea48039e78aaecfe9516fcb77852751c1045bfb28d5c622ffa5733
ExecutionId: 3
Message: Training files have been uploaded! Use the id '3' to execute the train experiment.

May 7, 2025 | INFO: __execute_training Model training starting - Hash: T6cc61022f2640698545be2b931489921f29c9bae8844bc694361ee1a1d14918
May 7, 2025 | INFO: __init__ Loading .env
May 7, 2025 | INFO: __init__ Successfully connected to MLOps
May 7, 2025 | INFO: __init__ Loading .env
Waiting the training run..

In [None]:
run.status

In [None]:
run.model_type

##### Copying a training

In [None]:
copy_run = run.copy_execution(
    train_data=PATH + 'dados.csv',
    requirements_file=PATH + 'requirements.txt',
    source_file=PATH + 'app.py',
    python_version='3.9',
    training_reference='train_model',
    wait_complete=True
)

In [None]:
copy_run.execution_id

In [None]:
copy_run.status

### Promote training

In [None]:
PATH = './samples/asyncModel/'
model = run.promote(
    source_file_path=PATH + 'app.py',
    schema_path=PATH + 'schema.csv',
    operation="Async",
    model_name="AsyncModel",
    input_type=".csv",
    model_reference="score",
    wait_complete=True
)

#### AutoML

With AutoML you just need to upload the data and some configuration

In [None]:
PATH = './samples/autoML/'

run2 = training.run_training(
    run_name='First test',
    training_type='AutoML',
    conf_dict=PATH + "conf.json",
    train_data=PATH + 'dados.csv',
    wait_complete=True
)

#### External Training

Besides the autoML and custom training, you can perform a training on your own machine and upload the files!

Look the example bellow



In [None]:
PATH = './samples/externalUpload/'

run3 = training.run_training(
    run_name='First test',
    training_type="External",
    features_file=PATH + 'features.parquet',
    target_file=PATH + 'target.parquet',
    output_file=PATH + 'predictions.parquet',
    metrics_file=PATH + 'metrics.json',
    parameters_file=PATH + 'params.json',
    requirements_file=PATH + 'requirements.txt',
    model_file=PATH + 'model.pkl',
    python_version="3.9",
    wait_complete=True
)

April 30, 2025 | INFO: validate Validating external training execution...
April 30, 2025 | INFO: __init__ Loading .env
April 30, 2025 | INFO: __init__ Successfully connected to MLOps
April 30, 2025 | INFO: register_execution Training Execution '12' created for First test
April 30, 2025 | INFO: send_file Features for execution was created from file!
April 30, 2025 | INFO: send_file Dataset hash = D80b29eba7fd4a89887bc45d98d660d1de8f530ade5b47629b6437d96f574d5d
April 30, 2025 | INFO: send_file Target for execution was created from file!
April 30, 2025 | INFO: send_file Dataset hash = D4f3ff1032d84e0d89bd29e82232e9a361110aacb20f4b21a3be01c8f34894d5
April 30, 2025 | INFO: send_file Output for execution was created from file!
April 30, 2025 | INFO: send_file Dataset hash = D95f28a782a8449ba63d5526521de0f2ea281b6ba2b341eab213bf53bab96ff8
April 30, 2025 | INFO: send_file Metrics for execution 12 was created from file!
April 30, 2025 | INFO: send_file Parameters for execution 12 was created fr

In [None]:
run3.status

'Succeeded'

---

#### Interactive External Training

However, if you wish something more interactive, take a look in the example bellow.

In [None]:
from mlops_codex.training import MLOpsTrainingClient
client = MLOpsTrainingClient()
training = client.create_training_experiment(
    experiment_name='Teste',
    model_type='Classification',
    group='<group>'
)

In [None]:
import pandas as pd
from lightgbm import LGBMClassifier
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score

In [None]:
base_path = './samples/train/'
df = pd.read_csv(base_path+"/dados.csv")
X = df.drop(columns=['target'])
y = df[["target"]]

In [None]:
import matplotlib.pyplot as plt

plt.scatter(df["mean_radius"], df["mean_texture"])

# Configurar o título do gráfico
plt.title("Relação entre mean_radius e mean_texture")

# Configurar os rótulos dos eixos
plt.xlabel("mean_radius")
plt.ylabel("mean_texture")

fig = plt.gcf()

# Exibir o gráfico
plt.show()


In [None]:
pipe = make_pipeline(SimpleImputer(), LGBMClassifier(force_col_wise=True))
pipe.fit(X, y)

In [None]:
with training.log_train(name='Teste 2', X_train=X, y_train=y) as logger:
    logger.save_model(pipe)

    model_output = pd.DataFrame({"pred": pipe.predict(X), "proba": pipe.predict_proba(X)[:,1]})

    logger.save_model_output(model_output)

    logger.save_plot(fig=fig, filename="test-image")

    auc = cross_val_score(pipe, X, y, cv=5, scoring="roc_auc")
    f_score = cross_val_score(pipe, X, y, cv=5, scoring="f1")
    logger.save_metric(name='auc', value=auc.mean())
    logger.save_metric(name='f1_score', value=f_score.mean())
