# Save the Model and the Evaluation Data

To save this model and the evaluation data so that you can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage.

## Install the required packages and define a function for the upload

In [None]:
!pip install python-dotenv==1.0.1



In [11]:
import os
from google.cloud import storage
from dotenv import load_dotenv

# Load the .env file
load_dotenv()

# Set up the GCS credentials and bucket name
gcs_credentials_path = os.environ.get('GS_SERVICE_ACCOUNT_KEY')
bucket_name = os.environ.get('GS_BUCKET_NAME')

if not all([gcs_credentials_path, bucket_name]):
    raise ValueError("One or more data connection variables are empty. "
                     "Please check your data connection to a GCS bucket.")

# Initialize the GCS client
storage_client = storage.Client.from_service_account_json(gcs_credentials_path)
bucket = storage_client.bucket(bucket_name)

def upload_file_to_gs(local_file, gcs_blob_name):
    blob = bucket.blob(gcs_blob_name)
    blob.upload_from_filename(local_file)
    print(f"{local_file} -> {gcs_blob_name}")

def upload_directory_to_gs(local_directory, gcs_prefix):
    num_files = 0
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            file_path = os.path.join(root, filename)
            relative_path = os.path.relpath(file_path, local_directory)
            gcs_blob_name = os.path.join(gcs_prefix, relative_path)
            upload_file_to_gcs(file_path, gcs_blob_name)
            num_files += 1
    return num_files

def list_objects(prefix):
    blobs = storage_client.list_blobs(bucket_name, prefix=prefix)
    for blob in blobs:
        print(blob.name)


## Verify the upload

In your S3 bucket, under the `models` upload prefix, run the `list_object` command. As best practice, to avoid mixing up model files, keep only one model and its required files in a given prefix or directory. This practice allows you to download and serve a directory with all the files that a model requires. 

If this is the first time running the code, this cell will have no output.

If you've already uploaded your model, you should see this output: `models/fraud/1/model.onnx`


In [12]:
list_objects("models")

models/


If you've already uploaded your model, you should see this output: `scaler.pkl` and `test_data.pkl`

In [13]:
list_objects("artifact")

## Upload and check again

In [14]:
# Compress files: models/fraud/1/model.onnx, artifact/scaler.pkl and artifact/scaler.pkl using python's zipfile module
import zipfile
import os

# Define the files to be compressed
files_to_compress = [
    "models/fraud/1/model.onnx",
    "artifact/scaler.pkl",
    "artifact/test_data.pkl"
]

# Define the name of the compressed file
compressed_file_name = "models/evaluation_kit.zip"

# Create a zip file and add the files to it
with zipfile.ZipFile(compressed_file_name, 'w') as zipf:
    for file in files_to_compress:
        zipf.write(file)

# Verify the compressed file
if os.path.exists(compressed_file_name):
    print(f"Files compressed successfully. Compressed file: {compressed_file_name}")
    # Upload the compressed file to GS
    upload_file_to_gs(compressed_file_name, compressed_file_name)
else:
    print("Failed to compress files.")

Files compressed successfully. Compressed file: models/evaluation_kit.zip
models/evaluation_kit.zip -> models/evaluation_kit.zip


Use the function to upload the `models` and `artifact` folders in a rescursive fashion:

In [15]:
local_models_directory = "models"

if not os.path.isdir(local_models_directory):
    raise ValueError(f"The directory '{local_models_directory}' does not exist.  "
                     "Did you finish training the model in the previous notebook?")

num_files = upload_directory_to_gs("models", "models")

if num_files == 0:
    raise ValueError("No files uploaded.  Did you finish training and "
                     "saving the model to the \"models\" directory?  "
                     "Check for \"models/fraud/1/model.onnx\"")

local_artifacts_directory = "artifact"

if not os.path.isdir(local_artifacts_directory):
    raise ValueError(f"The directory '{local_artifacts_directory}' does not exist.  "
                     "Did you finish training the model in the previous notebook?")

num_files = upload_directory_to_gs(local_artifacts_directory, local_artifacts_directory)

if num_files == 0:
    raise ValueError("No files uploaded.  Did you finish training and "
                     "saving the model to the \"artifacts\" directory?")


models/evaluation_kit.zip -> models/evaluation_kit.zip
models/fraud/1/model.onnx -> models/fraud/1/model.onnx


NameError: name 'upload_directory_to_s3' is not defined

To confirm this worked, run the `list_objects` function again:

In [None]:
list_objects("models")
list_objects("artifact")

### Next Step

Now that you've saved the model to s3 storage, you can refer to the model by using the same data connection to serve the model as an API.
