# MLOpsPreprocessing

This notebook give a exemple on how to use MLOps to deploy a preprocessing

## Imports

In [1]:
from mlops_codex.preprocessing import MLOpsPreprocessingClient
from mlops_codex.model import MLOpsModelClient

## MLOpsPreprocessingClient

In [2]:
client = MLOpsPreprocessingClient()

January 29, 2025 | INFO: __init__ Loading .env
January 29, 2025 | INFO: __init__ Successfully connected to MLOps
January 29, 2025 | INFO: __init__ Loading .env
January 29, 2025 | INFO: __init__ Successfully connected to MLOps


## Creating sync pre processing

In [3]:
PATH = './samples/syncPreprocessing/'

In [4]:
sync_preprocessing = client.create(
    preprocessing_name='Teste preprocessing Sync', # model_name
    preprocessing_reference='process', # name of the scoring function
    source_file=PATH+'app.py', # Path of the source file
    requirements_file=PATH+'requirements.txt', # Path of the requirements file, 
    schema=PATH+'schema.json', # Path of the schema file, but it could be a dict (only required for Sync models)
    # env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    python_version='3.9', # Can be 3.8 to 3.10
    operation="Sync", # Can be Sync or Async
    group='groupname' # Model group (create one using the client)
)

January 29, 2025 | INFO: __upload_preprocessing Script was registered! - Hash: "S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81" with response {"Hash":"S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81","Message":"Script was registered!"}
January 29, 2025 | INFO: __host_preprocessing Preprocessing host in process - Hash: S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81
January 29, 2025 | INFO: handle_common_errors Failed get status for preprocessing hash S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81 hash.
January 29, 2025 | ERROR: handle_common_errors Failed get status for preprocessing hash S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81 hash.
Waiting for deploy to be ready.January 29, 2025 | INFO: handle_common_errors Failed get status for preprocessing hash S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81 hash.
January 29, 2025 | ERROR: handle_common_errors Failed get status for prepr

In [None]:
sync_preprocessing.set_token('29d9d82e09bb4c11b9cd4ce4e36e6c58')

In [8]:
result = sync_preprocessing.run(
    data={'variable' : 100}
)
result

January 29, 2025 | INFO: handle_common_errors Preprocessing hash S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81 not found.
January 29, 2025 | ERROR: handle_common_errors Preprocessing hash S060050b11fe4022a2dc6b8cfd61d1bb2bbb7b6dfa41430b8a41c66b7dbbca81 not found.


{'mean_radius': 1000,
 'mean_texture': 0,
 'mean_perimeter': 0,
 'mean_area': 0,
 'mean_smoothness': 0,
 'mean_compactness': 0,
 'mean_concavity': 0,
 'mean_concave_points': 0,
 'mean_symmetry': 0,
 'mean_fractal_dimension': 0,
 'radius_error': 0,
 'texture_error': 0,
 'perimeter_error': 0,
 'area_error': 0,
 'smoothness_error': 0,
 'compactness_error': 0,
 'concavity_error': 0,
 'concave_points_error': 0,
 'symmetry_error': 0,
 'fractal_dimension_error': 0,
 'worst_radius': 0,
 'worst_texture': 0,
 'worst_perimeter': 0,
 'worst_area': 0,
 'worst_smoothness': 0,
 'worst_compactness': 0,
 'worst_concavity': 0,
 'worst_concave_points': 0,
 'worst_symmetry': 0,
 'worst_fractal_dimension': 0}

## Creating async pre processing

In [10]:
PATH = './samples/asyncPreprocessing/'

async_preprocessing = client.create(
    preprocessing_name='Teste preprocessing Async', # preprocessing_name
    preprocessing_reference='build_df', # name of the scoring function
    source_file=PATH+'app.py', # Path of the source file
    requirements_file=PATH+'requirements.txt', # Path of the requirements file, 
    # env=PATH+'.env',  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'input.csv'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    schema=PATH+'schema.csv',
    python_version='3.9', # Can be 3.8 to 3.10
    operation="Async", # Can be Sync or Async
    group='groupname', # Model group (create one using the client)
    input_type='csv',
    wait_for_ready=True
)

January 29, 2025 | INFO: __upload_preprocessing Script was registered! - Hash: "S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93" with response {"Hash":"S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93","Message":"Script was registered!"}
January 29, 2025 | INFO: __host_preprocessing Preprocessing host in process - Hash: S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93
January 29, 2025 | INFO: handle_common_errors Failed get status for preprocessing hash S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93 hash.
January 29, 2025 | ERROR: handle_common_errors Failed get status for preprocessing hash S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93 hash.
Waiting for deploy to be ready.January 29, 2025 | INFO: handle_common_errors Failed get status for preprocessing hash S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93 hash.
January 29, 2025 | ERROR: handle_common_errors Failed get status for prepr

In [11]:
async_preprocessing.set_token('29d9d82e09bb4c11b9cd4ce4e36e6c58')

January 29, 2025 | INFO: set_token Token for group datarisk added.


In [17]:
execution = async_preprocessing.run(data=PATH+'input.csv')

January 29, 2025 | INFO: handle_common_errors Preprocessing hash S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93 not found.
January 29, 2025 | ERROR: handle_common_errors Preprocessing hash S58671145d72475b9f8930ce1a8ade4fd9886434d07d498eb02f50591eb12e93 not found.
January 29, 2025 | INFO: run Execution '7' started to generate 'Dc39fce439a546a999be185115e2507acfbb4ae76c3b4e428953584352e6a559'. Use the id to check its status.


KeyboardInterrupt: 

In [14]:
execution.get_status()

{'ExecutionId': '6',
 'Status': 'Succeeded',
 'Message': '[/app/store/datarisk/datasets/D71a7cfa1e9344009a2e2c2427fb28a70b1bf3eefd754173a8fac04e08c4a395/processed_data.parquet]'}

In [16]:
execution.wait_ready()
execution.download_result()

KeyboardInterrupt: 

## Access created pre processing

In [None]:
client.search_preprocessing()

In [None]:
preprocessing = client.get_preprocessing(preprocessing_id='Sa79236b3dfc4f22a502e816a07dab382cee6327a5334c5bbba13c456233b8c4', group='groupname')

## Access created executions

In [None]:
old_execution = async_preprocessing.get_preprocessing_execution(exec_id='2')

execution_4.download_result()

## Using preprocessing with models

In [None]:
model_client = MLOpsModelClient()

#### Sync Model

In [None]:
sync_model = model_client.get_model(group='groupname', model_id='M7abe6af98484948ad63f3ad03f25b6496a93f06e23c4ffbaa43eba0f6a1bb91')

sync_model.set_token('29d9d82e09bb4c11b9cd4ce4e36e6c58')

data = {
 "mean_radius": 17.99,
 "mean_texture": 10.38,
 "mean_perimeter": 122.8,
 "mean_area": 1001.0,
 "mean_smoothness": 0.1184,
 "mean_compactness": 0.2776,
 "mean_concavity": 0.3001,
 "mean_concave_points": 0.1471,
 "mean_symmetry": 0.2419,
 "mean_fractal_dimension": 0.07871,
 "radius_error": 1.095,
 "texture_error": 0.9053,
 "perimeter_error": 8.589,
 "area_error": 153.4,
 "smoothness_error": 0.006399,
 "compactness_error": 0.04904,
 "concavity_error": 0.05373,
 "concave_points_error": 0.01587,
 "symmetry_error": 0.03003,
 "fractal_dimension_error": 0.006193,
 "worst_radius": 25.38,
 "worst_texture": 17.33,
 "worst_perimeter": 184.6,
 "worst_area": 2019.0,
 "worst_smoothness": 0.1622,
 "worst_compactness": 0.6656,
 "worst_concavity": 0.7119,
 "worst_concave_points": 0.2654,
 "worst_symmetry": 0.4601,
 "worst_fractal_dimension": 0.1189
}

sync_model.predict(data=data, preprocessing=sync_preprocessing)

#### Async Model

In [None]:
async_model = model_client.get_model(group='groupname', model_id='Me6ebaa539cb4a738a66fc52fc34b5422a8c6ae3942b4ca1868624cfda964db3')

PATH = './samples/asyncModel/'

async_model.set_token('29d9d82e09bb4c11b9cd4ce4e36e6c58')

execution = async_model.predict(data=PATH+'input.csv', preprocessing=async_preprocessing)
execution.wait_ready()

In [None]:
execution.download_result()

-----

## New preprocessing

We're rebuilding the process module. The main feature is the end multiples datasets to MLOps server. Check the code below

In [19]:
PATH = "./samples/asyncPreprocessingMultiple/"

schemas = [
    ("base_cadastral", PATH+'base_cadastral.csv'),
    ("base_pagamentos", PATH+'base_pagamentos.csv'),
    ("base_info", PATH+'base_info.csv'),
]

preprocess = client.create(
    preprocessing_name='test_preprocessing', # model_name
    preprocessing_reference='build_df', # name of the scoring function
    source_file=PATH+'app.py', # Path of the source file
    requirements_file=PATH+'requirements.txt', # Path of the requirements file,
    schema=schemas, # Path of the schema file, but it could be a dict (only required for Sync models)
    # env=PATH+'.env'  #  File for env variables (this will be encrypted in the server)
    # extra_files=[PATH+'utils.py'], # List with extra files paths that should be uploaded along (they will be all in the same folder)
    python_version='3.9', # Can be 3.8 to 3.10
    operation="Async", # Can be Sync or Async
    group='groupname', # Model group (create one using the client)
    wait_for_ready=True
)

January 29, 2025 | INFO: create Creating preprocessing for preprocessing hash
 Preprocessing hash = S66b0c6caa524f74829a1af9d6015e1e9d7df0bddf4a4e60b515aaac201ab4a0
January 29, 2025 | INFO: create Created dataset hash D83813d67d2c4c68afd340585650dca4f8d46846c1e048dd8af7be9d0d1548c0 with name base_cadastral
January 29, 2025 | INFO: create Created dataset hash D89d8a9c251a461d82dddecd31a9b3a9c5107cdc2c0b4c4d99b010f9d8d6c4e0 with name base_pagamentos
January 29, 2025 | INFO: create Created dataset hash D60dfe4564ad462389a675d25ce32487b36b6e6745024b40ae61b57a838c1088 with name base_info
January 29, 2025 | INFO: create Schema files uploaded
January 29, 2025 | INFO: create Script file uploaded
January 29, 2025 | INFO: create Requirements file uploaded
January 29, 2025 | INFO: create Hosting preprocessing script
Waiting for preprocessing script to finish.....January 29, 2025 | INFO: wait 
Preprocessing script finished successfully
January 29, 2025 | INFO: create Successfully hosted preprocess

In [2]:
inputs = [
    ("base_cadastral", PATH+'base_cadastral.csv'),
    ("base_pagamentos", PATH+'base_pagamentos.csv'),
    ("base_info", PATH+'base_info.csv'),
]

run = preprocess.run(
    data=inputs,
    wait_complete=True
)

January 27, 2025 | INFO: register_execution Registered execution for preprocessing hash Sabf4c60b7a54759bf205e3eb3325e55deabf943a6a54b8cbf23557e60bfd937
 Message = Preprocess Execution '16' created
January 27, 2025 | INFO: run Preprocessing script execution Sabf4c60b7a54759bf205e3eb3325e55deabf943a6a54b8cbf23557e60bfd937 is registered. Execution ID = 16
January 27, 2025 | INFO: run Uploaded input file ('base_cadastral', './samples/asyncPreprocessingMultiple/base_cadastral.csv') - Output Hash D17861f1574f419a9350185dc01dd1658cd932d271234f8e98407ead4834311a
January 27, 2025 | INFO: run Uploaded input file ('base_pagamentos', './samples/asyncPreprocessingMultiple/base_pagamentos.csv') - Output Hash De1f49c6b5b84d1d9a42bc144d2948a5b87cd849bb8449d792b16c3a90112889
January 27, 2025 | INFO: run Uploaded input file ('base_info', './samples/asyncPreprocessingMultiple/base_info.csv') - Output Hash D7e77efaa50e4eda9293be071dada3115038a38638f04f5b93e3e3d5c64ecbf1
January 27, 2025 | INFO: run Start

In [None]:
run.download()