# 🫡 Save the Model

We need to save this model so that we can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage.

>NOTE: Don't run all the cells all-in-one shot without changing the cluster specific variables.

### Install the required packages and define a function for the upload

If `pip` gives an Error, don't worry about it. Things will just run fine regardless.

In [None]:
!pip -q install boto3 botocore model-registry=="0.2.9"

#### 🙏 thanks to data connections, S3 bucket credentials are available in the Notebook!

In [None]:
import os
import boto3
import botocore

aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
region_name = os.environ.get('AWS_DEFAULT_REGION')
bucket_name = os.environ.get('AWS_S3_BUCKET')

if not all([aws_access_key_id, aws_secret_access_key, endpoint_url, region_name, bucket_name]):
    raise ValueError("One or data connection variables are empty.  "
                     "Please check your data connection to an S3 bucket.")

session = boto3.session.Session(aws_access_key_id=aws_access_key_id,
                                aws_secret_access_key=aws_secret_access_key)

s3_resource = session.resource(
    's3',
    config=botocore.client.Config(signature_version='s3v4'),
    endpoint_url=endpoint_url,
    region_name=region_name)

bucket = s3_resource.Bucket(bucket_name)


def upload_directory_to_s3(local_directory, s3_prefix):
    num_files = 0
    local_directory = os.path.abspath(local_directory) 
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            file_path = os.path.join(root, filename)
            relative_path = os.path.relpath(file_path, local_directory) 
            s3_key = f"{s3_prefix}/{relative_path.replace(os.path.sep, '/')}"
            print(f"Uploading {file_path} -> {s3_key}")
            bucket.upload_file(file_path, s3_key)
            num_files += 1
    return num_files


def list_objects(prefix):
    filter = bucket.objects.filter(Prefix=prefix)
    for obj in filter.all():
        print(obj.key)

## 🕵️‍♀️ Check the bucket

In your S3 bucket, under the `models` upload prefix, run the `list_object` command. As best practice, to avoid mixing up model files, keep only one model and its required files in a given prefix (like `jukebox`) or directory. This practice allows you to download and serve a directory with all the files that a model requires. 

If this is the first time running the code, this cell will have no output.

If you've already uploaded your model, you should see this output: `models/jukebox/1/model.onnx`

In [None]:
list_objects("models")

## 👩‍💻 Upload and check again

Use the function to upload the `models` folder in a rescursive fashion:

In [None]:
local_models_directory = "models/jukebox"

if not os.path.isdir(local_models_directory):
    raise ValueError(f"The directory '{local_models_directory}' does not exist.  "
                     "Did you finish training the model in the previous notebook?")

num_files = upload_directory_to_s3("models", "models")

if num_files == 0:
    raise ValueError("No files uploaded.  Did you finish training and "
                     "saving the model to the \"models\" directory?  "
                     "Check for \"models/jukebox/1/model.onnx\"")


To confirm this worked, run the `list_objects` function again:

In [None]:
list_objects("models")

# 🤩 Kubeflow Registry

We need a metadata registry for storing information such as version, author, and model location of the models we are building.

We are using Kubeflow model registry as a canonical data source by storing such information.

Here are some reasons to use a registry (_from Kubeflow website_):

- Track models available on storage: once the model is stored, it can then be tracked in the Kubeflow Model Registry for managing its lifecycle. The Model Registry can catalog, list, index, share, record, organize this information. This allows the Data Scientist to compare different versions and revert to previous versions if needed.

- Track and compare performance: View key metrics like accuracy, recall, and precision for each model version. This helps identify the best-performing model for deployment.

- Create lineage: Capture the relationships between data, code, and models. This enables the Data Scientist to understand the origin of each model and reproduce specific experiments.

- Collaborate: Share models and experiment details with the MLOps Engineer for deployment preparation. This ensures a seamless transition from training to production.

An instance of the registry is available in your dev environment as well. 


In [None]:
from model_registry import ModelRegistry
from model_registry.exceptions import StoreError

‼️⚠️ ¡IMPORTANT! ⚠️‼️

Add your user name and cluster domain (apps.xxx) that are shared with you before 

we need them for the model registry URL.

In [None]:
# Add your user name and cluster domain (apps.xxx) that are shared with you before

username = "<USER_NAME>"
cluster_domain = "<CLUSTER_DOMAIN>"

In [None]:
# Set up the model registry connection
model_registry_url = f"https://{username}-registry-rest.{cluster_domain}"
author_name = username

registry = ModelRegistry(server_address=model_registry_url, port=443, author=author_name, is_secure=False)

In [None]:
# Model details we want to register
registered_model_name = "jukebox"
version = "0.0.1"
s3_endpoint_url = endpoint_url.split("https://")[-1]
model_path = "/jukebox/1/model.onnx"

# Check if the model has been registered already and otherwise register it
try:
    rm = registry.register_model(
        registered_model_name,
        f"s3://{s3_endpoint_url}/{model_path}",
        model_format_name="onnx",
        model_format_version="1",
        version=version,
        description=f"Dense Neural Network trained on music data",
        metadata={
            "accuracy": 0.3,
            "license": "apache-2.0"
        }
    )
    print(f"Model and version registered successfully as:\n{rm}")
except StoreError:
    rmver = registry.get_model_version(registered_model_name, version)
    print(f"Model and version already exists:\n{rmver}")

### Quiz Time 🤓

In [None]:
import sys
import os
sys.path.append(os.path.abspath('../.dontlookhere/'))
from quiz2 import *

In [None]:
quiz_versioning()

In [None]:
# Print the general info of registered model
model = registry.get_registered_model("jukebox")
print("Registered Model:", model, "with ID", model.id)

In [None]:
# Print the version info of registered model
version = registry.get_model_version("jukebox", "0.0.1")
print("Model Version:", version, "with ID", version.id)

In [None]:
# Print the artifact info of registered model
art = registry.get_model_artifact("jukebox", "0.0.1")
print("Model Artifact:", art, "with ID", art.id)

### 🥁 Next Step

Now that you've saved the model to s3 storage & registry, you can refer to the model by using a data connection and serve the model as an API.

Go back to the instructions https://rhoai-mlops.github.io/lab-instructions/#/1-when-the-music-starts/4-inner-data-science-loop?id=model-serving to view the model in Model Registry UI first.