# Neomaril Training

This notebook give a exemple on how to use Neomaril to training a ML model

### NeomarilTrainingClient

It's where you can manage your trainining experiments

In [1]:
# Import the client
from neomaril_codex.training import NeomarilTrainingClient

In [2]:
# Start the client. We are reading the credentials in the NEOMARIL_TOKEN env variable

client = NeomarilTrainingClient()
client

2023-05-29 10:34:19.211 | INFO     | neomaril_codex.base:__init__:87 - Loading .env
2023-05-29 10:34:19.362 | INFO     | neomaril_codex.base:__init__:99 - Successfully connected to Neomaril


NeomarilTrainingClient(url="http://localhost:7070/api", version="1.0")

## NeomarilTrainingExperiment

It's where you can create a training experiment to find the best model

#### Custom training

With Custom training you have to create the training function.

In [3]:
# Creating a new training experiment
training = client.create_training_experiment('Teste notebook Training custom', # Experiment name, this is how you find your model in MLFLow
                                            'Classification', # Model type. Can be Classification, Regression or Unsupervised
                                            'Custom', # Training type. Can be Custom or AutoML
                                            group='datarisk' # This is the default group. Create a new one when using for a new project
                                            )

2023-05-29 10:34:20.241 | INFO     | neomaril_codex.training:create_training_experiment:719 - New Training 'Teste notebook Training custom' inserted.


In [4]:
training

NeomarilTrainingExperiment(name="Teste notebook Training custom", 
                                                        group="datarisk", 
                                                        training_id="Td2d7ca1a7f84110b25bd81f9429a2d026bf16e58d3a4bf59ef55a0fa101c160",
                                                        training_type="Custom",
                                                        model_type=Classification
                                                        )

In [5]:
# With the experiment class we can create multiple model runs
PATH = './samples/train/'

run = training.run_training('First test', # Run name
                            PATH+'dados.csv', # Path to the file with training data
                            source_file=PATH+'app.py', # Path of the source file
                            requirements_file=PATH+'requirements.txt', # Path of the requirements file, 
#                           env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
#                           extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
                            training_reference='train_model', # The name of the entrypoint function that is going to be called inside the source file 
                            python_version='3.9', # Can be 3.7 to 3.10
                            wait_complete=True
)

2023-05-29 10:34:20.493 | INFO     | neomaril_codex.training:__upload_training:448 - {"ExecutionId":1,"Message":"Training files have been uploaded. Use the execution id '1' to check its status."}
2023-05-29 10:34:20.958 | INFO     | neomaril_codex.training:__execute_training:472 - Model training starting - Hash: Td2d7ca1a7f84110b25bd81f9429a2d026bf16e58d3a4bf59ef55a0fa101c160


Wating the training run.......

In [6]:
run.get_status()

{'trainingExecutionId': '1',
 'Status': 'Succeeded',
 'Message': '{\n    "artifacts": "wasbs://mlflow-dev@datariskmlops.blob.core.windows.net/artifacts/1/7d054f99a84c4baabd2db7305ddcd895/artifacts",\n    "mlflow_run_id": "7d054f99a84c4baabd2db7305ddcd895"\n}'}

In [7]:
run.execution_data

{'TrainingHash': 'Td2d7ca1a7f84110b25bd81f9429a2d026bf16e58d3a4bf59ef55a0fa101c160',
 'ExperimentName': 'Teste notebook Training custom',
 'GroupName': 'datarisk',
 'ModelType': 'Classification',
 'TrainingType': 'Custom',
 'ExecutionId': 1,
 'ExecutionState': 'Succeeded',
 'RunData': {'metrics': [{'key': 'auc',
    'value': 0.9938884816218586,
    'timestamp': 1685367433767,
    'step': 0},
   {'key': 'f1_score',
    'value': 0.9818934131003096,
    'timestamp': 1685367433767,
    'step': 0}],
  'params': [{'key': 'shape', 'value': '(569, 30)'},
   {'key': 'cols_with_missing', 'value': '0'},
   {'key': 'missing_distribution',
    'value': "{'mean_missings': nan, 'std_missings': nan, 'min_missings': nan, '25%_missings': nan, '50%_missings': nan, '75%_missings': nan, 'max_missings': nan}"},
   {'key': 'target_proportion',
    'value': '{(1,): 0.6274165202108963, (0,): 0.37258347978910367}'},
   {'key': 'pipeline_steps', 'value': 'simpleimputer, xgbclassifier'},
   {'key': 'hyperparam_si

In [8]:
# When the run is finished you can download the model file
run.download_result()

# or promote promete it to a deployed model

PATH = './samples/syncModel/'

model = run.promote_model('Teste notebook promoted custom', # model_name
                            'score', # name of the scoring function
                            PATH+'app.py', # Path of the source file
                            PATH+'schema.json', # Path of the schema file, but it could be a dict
#                           env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
#                           extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
                            operation="Sync" # Can be Sync or Async
)

2023-05-29 10:37:23.247 | INFO     | neomaril_codex.base:download_result:376 - Output saved in ./output_1.zip
2023-05-29 10:37:23.719 | INFO     | neomaril_codex.training:__upload_model:163 - Model 'Teste notebook promoted custom' promoted from Td2d7ca1a7f84110b25bd81f9429a2d026bf16e58d3a4bf59ef55a0fa101c160 - Hash: "Mba26915630b46c69960b0fd46c13bfc4eeab5196dfd47ea853687585fb69cd0"
2023-05-29 10:37:24.510 | INFO     | neomaril_codex.training:__host_model:225 - Model host in process - Hash: Mba26915630b46c69960b0fd46c13bfc4eeab5196dfd47ea853687585fb69cd0
2023-05-29 10:37:24.511 | INFO     | neomaril_codex.model:__init__:66 - Loading .env


In [9]:
model

NeomarilModel(name="Teste notebook promoted custom", group="datarisk", 
                                status="Building",
                                model_id="Mba26915630b46c69960b0fd46c13bfc4eeab5196dfd47ea853687585fb69cd0",
                                operation="Sync",
                                schema={
  "mean_radius": 17.99,
  "mean_texture": 10.38,
  "mean_perimeter": 122.8,
  "mean_area": 1001.0,
  "mean_smoothness": 0.1184,
  "mean_compactness": 0.2776,
  "mean_concavity": 0.3001,
  "mean_concave_points": 0.1471,
  "mean_symmetry": 0.2419,
  "mean_fractal_dimension": 0.07871,
  "radius_error": 1.095,
  "texture_error": 0.9053,
  "perimeter_error": 8.589,
  "area_error": 153.4,
  "smoothness_error": 0.006399,
  "compactness_error": 0.04904,
  "concavity_error": 0.05373,
  "concave_points_error": 0.01587,
  "symmetry_error": 0.03003,
  "fractal_dimension_error": 0.006193,
  "worst_radius": 25.38,
  "worst_texture": 17.33,
  "worst_perimeter": 184.6,
  "worst_area":

#### AutoML

With AutoML you just need to upload the data and some configuration

In [10]:
# Creating a new training experiment
training = client.create_training_experiment('Teste notebook Training AutoML', # Experiment name
                                            'Classification', # Model type. Can be Classification, Regression or Unsupervised
                                            'AutoML', # Training type. Can be Custom or AutoML
                                            group='datarisk' # This is the default group. Create a new one when using for a new project
                                            )

PATH = './samples/autoML/'

run = training.run_training('First test', # Run name
                            PATH+'dados.csv', # Path to the file with training data
                            conf_dict=PATH+'conf.json', # Path of the configuration file
                            wait_complete=True
)

2023-05-29 10:37:24.621 | INFO     | neomaril_codex.training:create_training_experiment:719 - New Training 'Teste notebook Training AutoML' inserted.
2023-05-29 10:37:24.660 | INFO     | neomaril_codex.training:__upload_training:448 - {"ExecutionId":2,"Message":"Training files have been uploaded. Use the execution id '2' to check its status."}
2023-05-29 10:37:24.845 | INFO     | neomaril_codex.training:__execute_training:472 - Model training starting - Hash: T4c725a84d984525abdc3769ff4026a0b2cebd878135418ca8b9ad217abe9a31


Wating the training run.......................

In [11]:
run

NeomarilTrainingExecution(exec_id="2", status="Succeeded")

In [12]:
run.get_status()

{'trainingExecutionId': '2',
 'Status': 'Succeeded',
 'Message': '{\n    "artifacts": "wasbs://mlflow-dev@datariskmlops.blob.core.windows.net/artifacts/2/f04ef381a29a4dbb96d0ff1cb0b61544/artifacts",\n    "mlflow_run_id": "f04ef381a29a4dbb96d0ff1cb0b61544"\n}'}

In [13]:
run

NeomarilTrainingExecution(exec_id="2", status="Succeeded")

In [14]:
# Promote a AutoML model is a lot easier

PATH = './samples/syncModel/'

model = run.promote_model('Teste notebook promoted autoML', # model_name
                            operation="Async" # Can be Sync or Async
)

2023-05-29 10:48:30.126 | INFO     | neomaril_codex.training:__upload_model:163 - Model 'Teste notebook promoted autoML' promoted from T4c725a84d984525abdc3769ff4026a0b2cebd878135418ca8b9ad217abe9a31 - Hash: "Macbee7fdf434dbd8f6e20bf71a6d41f64cdb629ec7b419e9e4e5c8a4c7e79df"
2023-05-29 10:48:30.149 | INFO     | neomaril_codex.training:__host_model:225 - Model host in process - Hash: Macbee7fdf434dbd8f6e20bf71a6d41f64cdb629ec7b419e9e4e5c8a4c7e79df
2023-05-29 10:48:30.152 | INFO     | neomaril_codex.model:__init__:66 - Loading .env


In [15]:
model

NeomarilModel(name="Teste notebook promoted autoML", group="datarisk", 
                                status="Building",
                                model_id="Macbee7fdf434dbd8f6e20bf71a6d41f64cdb629ec7b419e9e4e5c8a4c7e79df",
                                operation="Async",
                                schema={}
                                )