# Neomaril Training

This notebook give a exemple on how to use Neomaril to training a ML model

### NeomarilTrainingClient

It's where you can manage your trainining experiments

In [1]:
from neomaril_codex.training import NeomarilTrainingClient

In [2]:
# Start the client. We are reading the credentials in the NEOMARIL_TOKEN env variable

client = NeomarilTrainingClient()
client

October 17, 2024 | INFO: __init__ Loading .env
October 17, 2024 | INFO: __init__ Successfully connected to Neomaril


API version 1.0 - NeomarilTrainingClient(url="http://localhost:7070/api", Token="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlFnc0JWQ0I5WFc0V1YtSkVCVkJiZyJ9.eyJodHRwczovL25lb21hcmlsLmRhdGFyaXNrLm5ldC9uZW9tYXJpbC1ncm91cCI6ImRhdGFyaXNrIiwiaHR0cHM6Ly9uZW9tYXJpbC5kYXRhcmlzay5uZXQvZW1haWwiOiJuZW9tYXJpbC1jaUBkYXRhcmlzay5pbyIsImh0dHBzOi8vbmVvbWFyaWwuZGF0YXJpc2submV0L3RlbmFudCI6ImRhdGFyaXNrIiwiaHR0cHM6Ly9uZW9tYXJpbC5kYXRhcmlzay5uZXQvdGVuYW50LWFjdGl2ZSI6dHJ1ZSwiaHR0cHM6Ly9uZW9tYXJpbC5kYXRhcmlzay5uZXQvdXNlci1hY3RpdmUiOnRydWUsImh0dHBzOi8vbmVvbWFyaWwuZGF0YXJpc2submV0L3JvbGUiOiJtYXN0ZXIiLCJpc3MiOiJodHRwczovL2Rldi1tazNvN2xhenhsZTMwaHdxLnVzLmF1dGgwLmNvbS8iLCJzdWIiOiJhdXRoMHw2NTY0Y2M0NTlkYzAzODhlNDVlMDQzZTciLCJhdWQiOlsiaHR0cHM6Ly9kZXYtbWszbzdsYXp4bGUzMGh3cS51cy5hdXRoMC5jb20vYXBpL3YyLyIsImh0dHBzOi8vZGV2LW1rM283bGF6eGxlMzBod3EudXMuYXV0aDAuY29tL3VzZXJpbmZvIl0sImlhdCI6MTcyOTE5NDY3NywiZXhwIjoxNzI5MjA1NDc3LCJzY29wZSI6Im9wZW5pZCBwcm9maWxlIGVtYWlsIGFkZHJlc3MgcGhvbmUgcmVhZDpjdXJyZW50X3VzZXIgdXBkYXRlOmN1cnJlbn

## NeomarilTrainingExperiment

It's where you can create a training experiment to find the best model

#### Custom training

With Custom training, you have to create the training function. For you, as a data scientist, it's common to re-run the entire notebook, over and over. To avoid creating the same experiment repeatedly, the `force = False` parameter will disallow it. If you wish to create a new experiment with the same attributes, turn `force = True`.

If you have two equal experiments and pass `force = False`, the first created experiment will be chosen.

In [3]:
# Creating a new training experiment
training = client.create_training_experiment(
    experiment_name='Teste notebook',   # Experiment name, this is how you find your model in MLFLow
    model_type='Classification',        # Model type. Can be Classification, Regression or Unsupervised
    group='test1',                  # This is the default group. Create a new one when using for a new project,
    # force=True                        # Forces to create a new experiment with the same attributes
)

October 17, 2024 | INFO: create_training_experiment Trying to load experiment...
October 17, 2024 | INFO: create_training_experiment Could not find experiment. Creating a new one...
October 17, 2024 | INFO: __create New Training 'Teste notebook' inserted.
October 17, 2024 | INFO: __init__ Loading .env
October 17, 2024 | INFO: __init__ Successfully connected to Neomaril


In [4]:
training

NeomarilTrainingExperiment(name="Teste notebook", 
                                                        group="test1", 
                                                        training_id="T240e260811942339393afdf4bf06dbaf66b22c5862c4103b241181ebc2e9dcd",
                                                        model_type=Classification
                                                        )

In [5]:
# With the experiment class we can create multiple model runs
PATH = './samples/train/'

run = training.run_training(
    run_name='First test', # Run name
    train_data=PATH+'dados.csv', # Path to the file with training data
    source_file=PATH+'app.py', # Path of the source file
    requirements_file=PATH+'requirements.txt', # Path of the requirements file, 
    # env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    training_reference='train_model', # The name of the entrypoint function that is going to be called inside the source file 
    training_type='Custom',
    python_version='3.9', # Can be 3.8 to 3.10
    wait_complete=True
)

October 17, 2024 | INFO: __upload_training Result
ExecutionId: 12
Message: Training files have been uploaded! Use the id '12' to execute the train experiment.

October 17, 2024 | INFO: __execute_training Model training starting - Hash: T240e260811942339393afdf4bf06dbaf66b22c5862c4103b241181ebc2e9dcd
October 17, 2024 | INFO: __init__ Loading .env
October 17, 2024 | INFO: __init__ Successfully connected to Neomaril
October 17, 2024 | INFO: __init__ Loading .env
Waiting the training run........

In [6]:
run.get_status()

{'ExecutionId': '12',
 'Status': 'Succeeded',
 'Message': 'Training succeeded, successfully generated artifacts.'}

In [7]:
run.execution_info()

Deployable: true
Description: ''
ExecutionId: 12
ExecutionState: Succeeded
ExperimentName: Teste notebook
GroupName: test1
ModelType: Classification
RunAt: '2024-10-17T19:52:03.647118+00:00'
RunData:
  outputPath: /app/store/datarisk/test1/T240e260811942339393afdf4bf06dbaf66b22c5862c4103b241181ebc2e9dcd/12/output/12.zip
  paramsAndMetrics:
    metrics:
      auc: 0.9926209713058715
      f1_score: 0.9761809612166342
      training_accuracy_score: 1.0
      training_f1_score: 1.0
      training_log_loss: 0.00047178298806016674
      training_precision_score: 1.0
      training_recall_score: 1.0
      training_roc_auc: 1.0
      training_score: 1.0
    parameters:
      cols_with_missing: '0'
      lgbmclassifier: LGBMClassifier()
      lgbmclassifier__boosting_type: gbdt
      lgbmclassifier__class_weight: None
      lgbmclassifier__colsample_bytree: '1.0'
      lgbmclassifier__importance_type: split
      lgbmclassifier__learning_rate: '0.1'
      lgbmclassifier__max_depth: '-1'
      

In [8]:
# When the run is finished you can download the model file
run.download_result()

2024-04-22 17:32:50.098 | INFO     | neomaril_codex.base:download_result:408 - Output saved in ./output.zip


In [9]:
# or promote promete it to a deployed model

PATH = './samples/syncModel/'

model = run.promote_model(
    model_name='Teste notebook promoted custom', # model_name
    model_reference='score', # name of the scoring function
    source_file=PATH+'app.py', # Path of the source file
    schema=PATH+'schema.json', # Path of the schema file, but it could be a dict
    # env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    operation="Sync" # Can be Sync or Async
)

2024-04-22 17:32:50.290 | INFO     | neomaril_codex.training:__upload_model:492 - Model 'Teste notebook promoted custom' promoted from T508b769c7ca49a4a5e60d96fab003cace7533032858439f9b584cc026be02e1 - Hash: "Md68e3180ce2498cbfb88ca59b6c94ddec7eebc47dac4c9591e34ea88d191b04"
2024-04-22 17:32:52.204 | INFO     | neomaril_codex.training:__host_model:557 - Model host in process - Hash: Md68e3180ce2498cbfb88ca59b6c94ddec7eebc47dac4c9591e34ea88d191b04
2024-04-22 17:32:52.206 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-04-22 17:32:52.211 | INFO     | neomaril_codex.base:__init__:31 - Successfully connected to Neomaril


In [10]:
model

NeomarilModel(name="Teste notebook promoted custom", group="groupname", 
                                status="Building",
                                model_id="Md68e3180ce2498cbfb88ca59b6c94ddec7eebc47dac4c9591e34ea88d191b04",
                                operation="Sync",
                                )

#### AutoML

With AutoML you just need to upload the data and some configuration

In [11]:
PATH = './samples/autoML/'

run = training.run_training(
    run_name='First test', # Run name
    training_type='AutoML',
    train_data=PATH+'dados.csv', # Path to the file with training data
    conf_dict=PATH+'conf.json', # Path of the configuration file
    wait_complete=True
)

2024-04-22 17:33:37.256 | INFO     | neomaril_codex.training:__upload_training:915 - {"ExecutionId":2,"Message":"Training files have been uploaded! Use the id \u00272\u0027 to execute the train experiment."}
2024-04-22 17:33:37.419 | INFO     | neomaril_codex.training:__execute_training:939 - Model training starting - Hash: T508b769c7ca49a4a5e60d96fab003cace7533032858439f9b584cc026be02e1
2024-04-22 17:33:37.440 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-04-22 17:33:37.445 | INFO     | neomaril_codex.base:__init__:31 - Successfully connected to Neomaril
2024-04-22 17:33:37.447 | INFO     | neomaril_codex.base:__init__:279 - Loading .env


Waiting the training run................

In [12]:
run

NeomarilTrainingExecution(name="First test",
                                        exec_id="2", status="Succeeded")

In [13]:
run.get_status()

{'ExecutionId': '2',
 'Status': 'Succeeded',
 'Message': 'wasbs://mlflow-dev@datariskmlops.blob.core.windows.net/artifacts/1/250f70714e5d4a6f9fa3b55c6b9aaf43/artifacts'}

In [14]:
# Promote a AutoML model is a lot easier

PATH = './samples/autoML/'
MODEL_PATH = './samples/syncModel/'

model = run.promote_model(
    model_name='Teste notebook promoted autoML', # model_name
    operation="Async", # Can be Sync or Async,
    input_type="json",
    schema=PATH+'schema.json'
)

2024-04-22 17:41:28.830 | INFO     | neomaril_codex.training:__upload_model:492 - Model 'Teste notebook promoted autoML' promoted from T508b769c7ca49a4a5e60d96fab003cace7533032858439f9b584cc026be02e1 - Hash: "M0fa3683553e41c4a1a99290c2451ff190785d06b90646e6afe7ffa352c00193"
2024-04-22 17:41:29.135 | INFO     | neomaril_codex.training:__host_model:557 - Model host in process - Hash: M0fa3683553e41c4a1a99290c2451ff190785d06b90646e6afe7ffa352c00193
2024-04-22 17:41:29.137 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-04-22 17:41:29.140 | INFO     | neomaril_codex.base:__init__:31 - Successfully connected to Neomaril


In [15]:
model

NeomarilModel(name="Teste notebook promoted autoML", group="groupname", 
                                status="Building",
                                model_id="M0fa3683553e41c4a1a99290c2451ff190785d06b90646e6afe7ffa352c00193",
                                operation="Async",
                                )