# Model Training
We simulate (poorly!) a ML model training and we persist the resulting model in a joblib file.

In [1]:
import joblib
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1, stratify = y)
svc_linear = SVC(kernel="linear", probability=True)
svc_linear.fit(X_train, y_train)

y_pred = svc_linear.predict(X_test)
accuracy_value = accuracy_score(y_test, y_pred)
print("accuracy:", accuracy_value)

accuracy: 0.9777777777777777


In [2]:
with open("model.joblib", 'wb') as fo:  
   joblib.dump(svc_linear, fo)

%ls -lA model*

-rw-r--r--@ 1 mmortari  staff  3299 Jun 17 10:22 model.joblib


# OCI Artifact
Let's leverage OCI-Artifact and OCI-Dist to warehouse our ML model and its metadata.

In [3]:
from omlmd.helpers import Helper

omlmd = Helper()
omlmd.push("localhost:8080/matteo/ml-artifact:latest", "model.joblib", name="Model Example", author="John Doe", license="Apache-2.0", accuracy=accuracy_value)

Successfully pushed localhost:8080/matteo/ml-artifact:latest


| Zot | Quay |
| --- | --- |
| ![](./imgs/Screenshot%202024-06-07%20at%2018.12.04.png) | ![](./imgs/Screenshot%202024-06-12%20at%2010.02.44.png) |

Demonstrate _pull_ with **vanilla** OCI-compliant clients

In [4]:
from oras.provider import Registry

oras_registry = Registry(insecure=True)
oras_registry.pull(target="localhost:8080/matteo/ml-artifact:latest", outdir="tmp/a")

%ls -lA tmp/a

total 24
-rw-r--r--@ 1 mmortari  staff  3299 Jun 17 10:22 model.joblib
-rw-r--r--@ 1 mmortari  staff   269 Jun 17 10:22 model_metadata.oml.json
-rw-r--r--@ 1 mmortari  staff   187 Jun 17 10:22 model_metadata.oml.yaml


Demonstrate _custom pull_, filtering to download only ML artifact and nothing else

In [5]:
omlmd.pull(target="localhost:8080/matteo/ml-artifact:latest", outdir="tmp/b", media_types=["application/x-mlmodel"])

%ls -lA tmp/b

total 8
-rw-r--r--@ 1 mmortari  staff  3299 Jun 17 10:22 model.joblib


Demonstrate custom fetch of metadata layer (following OCI-Artifact conventions)

In [6]:
print(omlmd.get_config(target="localhost:8080/matteo/ml-artifact:latest"))

{"reference":"localhost:8080/matteo/ml-artifact:latest", "config": {
    "name": "Model Example",
    "description": null,
    "author": "John Doe",
    "customProperties": {
        "license": "Apache-2.0",
        "accuracy": 0.9777777777777777
    },
    "uri": null,
    "model_format_name": null,
    "model_format_version": null
} }


# Crawl OCI-Artifacts

Demonstrator of client-side crawling.
This is only a demonstrator, working on analogous concept server-side (beyond OCI specification, but integrating with it).

In [7]:
# data prep (simulated): store in OCI 3 tags, with different `accuracy` metadata
omlmd.push("localhost:8080/matteo/ml-artifact:v1", "model.joblib", accuracy=.85, name="Model Example", author="John Doe", license="Apache-2.0")
omlmd.push("localhost:8080/matteo/ml-artifact:v2", "model.joblib", accuracy=.90, name="Model Example", author="John Doe", license="Apache-2.0")
omlmd.push("localhost:8080/matteo/ml-artifact:v3", "model.joblib", accuracy=.95, name="Model Example", author="John Doe", license="Apache-2.0")

Successfully pushed localhost:8080/matteo/ml-artifact:v1
Successfully pushed localhost:8080/matteo/ml-artifact:v2
Successfully pushed localhost:8080/matteo/ml-artifact:v3


| Zot | Quay |
| --- | --- |
| ![](./imgs/Screenshot%202024-06-07%20at%2018.12.29.png) | ![](./imgs/Screenshot%202024-06-12%20at%2010.07.10.png) |

In [8]:
crawl_result = omlmd.crawl([
    "localhost:8080/matteo/ml-artifact:v1",
    "localhost:8080/matteo/ml-artifact:v2",
    "localhost:8080/matteo/ml-artifact:v3"
])

Demonstrate integration of crawling results with querying (in this case using jQ)

> Of the crawled ML OCI artifacts, which one exhibit the max accuracy?

In [9]:
import jq
jq.compile( "max_by(.config.customProperties.accuracy).reference" ).input_text(crawl_result).first()

'localhost:8080/matteo/ml-artifact:v3'