# Load the MLFlow model locally and try predictions

## Prerequisites

1. You need to have run successfully the training notebook related to this model, available in this same folder, where at the end of the notebook, after training the model, it downloads the 'artifacts' with the MLFlow model folder ("./artifact_downloads/outputs/mlflow-model").

2. Create a conda environment with the 'conda.yaml' file provided within the "mlflow-model" folder, doing like the following:
   1. if you are running this notebook on a windows machine, Please remove "Pycocotools" and "recordclass" lines from conda.yaml and have c++ build tools( https://visualstudio.microsoft.com/visual-cpp-build-tools/ ) installed before running the below steps

   1. (base) /> conda env create --file conda.yaml --name automl-model-image-multicls-cls-env
   
   1. (base) /> conda activate automl-model-image-multicls-cls-env
   
   1. (automl-model-image-multicls-cls-env) /> conda install jupyter nb_conda

3. Run Jupyter and make sure you are using the related 'automl-model-image-multicls-cls-env' Kernel.

4. Run this notebook.

If the MLFlow model files were downloaded successfully by the training notebook, you should see the files here.

In [5]:
import os

# Local dir where you have downloaded and saved the artifacts
local_dir = "./"

mlflow_model_dir = os.path.join(local_dir, "blip-lavis")

# Show the contents of the MLFlow model folder
os.listdir(mlflow_model_dir)

# You should see a list of files such as the following:
# ['artifacts', 'conda.yaml', 'MLmodel', 'python_env.yaml', 'python_model.pkl', 'requirements.txt']

['code',
 'conda.yaml',
 'LICENSE',
 'MLmodel',
 'python_env.yaml',
 'python_model.pkl',
 'requirements.txt']

In [6]:
# Change to a different location if you downloaded data at a different location
dataset_parent_dir = "./data"
dataset_name = "fridgeObjects"

os.listdir(os.path.join(dataset_parent_dir, dataset_name, "milk_bottle"))

['100.jpg',
 '101.jpg',
 '65.jpg',
 '66.jpg',
 '67.jpg',
 '68.jpg',
 '69.jpg',
 '70.jpg',
 '71.jpg',
 '72.jpg',
 '73.jpg',
 '74.jpg',
 '75.jpg',
 '76.jpg',
 '77.jpg',
 '78.jpg',
 '79.jpg',
 '80.jpg',
 '81.jpg',
 '82.jpg',
 '83.jpg',
 '84.jpg',
 '85.jpg',
 '86.jpg',
 '87.jpg',
 '88.jpg',
 '89.jpg',
 '90.jpg',
 '91.jpg',
 '92.jpg',
 '93.jpg',
 '94.jpg',
 '95.jpg',
 '96.jpg',
 '97.jpg',
 '98.jpg',
 '99.jpg']

### Load the test data into a Pandas DataFrame

Load some test images into a Pandas DataFrame in order to try some predictions with it.

In [7]:
test_image_paths = [
    os.path.join(dataset_parent_dir, dataset_name, "can", "1.jpg"),
    os.path.join(dataset_parent_dir, dataset_name, "carton", "33.jpg"),
    os.path.join(dataset_parent_dir, dataset_name, "milk_bottle", "99.jpg"),
    os.path.join(dataset_parent_dir, dataset_name, "water_bottle", "120.jpg"),
]

# Prepare sample data for image embeddings

In [8]:
import pandas as pd
import base64


def read_image(image_path):
    with open(image_path, "rb") as f:
        return f.read()

images = [
    base64.encodebytes(read_image(image_path)).decode('utf-8')
    for image_path in test_image_paths
]
image_data = [[img, ""] for img in images]
test_df_image = pd.DataFrame(
    data=image_data,
    columns=["image", "text"],
)
test_df_image

Unnamed: 0,image,text
0,/9j/4AAQSkZJRgABAQEASABIAAD/4XlnRXhpZgAASUkqAA...,
1,/9j/4AAQSkZJRgABAQEASABIAAD/4XvwRXhpZgAASUkqAA...,
2,/9j/4AAQSkZJRgABAQEASABIAAD/4aYSRXhpZgAASUkqAA...,
3,/9j/4AAQSkZJRgABAQEASABIAAD/4Y+MRXhpZgAASUkqAA...,


# Prepare sample data for text embeddings

In [9]:
import pandas as pd
import base64


def read_image(image_path):
    with open(image_path, "rb") as f:
        return f.read()

text_list = [
    "text 1", "text 2", "text 3", "text 4"
]
text_data = [["", text] for text in text_list]
test_df_text = pd.DataFrame(
    data=text_data,
    columns=["image", "text"],
)
test_df_text

Unnamed: 0,image,text
0,,text 1
1,,text 2
2,,text 3
3,,text 4


In [10]:
# Prepare sample data for image + text embeddings
combine_data = [[images[i], text_list[i]] for i in range(0, len(image_data))]
test_df_combine = pd.DataFrame(
    data=combine_data,
    columns=["image", "text"],
)
test_df_combine

Unnamed: 0,image,text
0,/9j/4AAQSkZJRgABAQEASABIAAD/4XlnRXhpZgAASUkqAA...,text 1
1,/9j/4AAQSkZJRgABAQEASABIAAD/4XvwRXhpZgAASUkqAA...,text 2
2,/9j/4AAQSkZJRgABAQEASABIAAD/4aYSRXhpZgAASUkqAA...,text 3
3,/9j/4AAQSkZJRgABAQEASABIAAD/4Y+MRXhpZgAASUkqAA...,text 4


## Load the best model in memory

Load the model using MLflow flavor. Check MLmodel under the downloaded folder (artifact_downloads/outputs/mlflow-model). For this particular example (and for AutoML for Images scenario), MLmodel file will describe python_function flavor. We show how to load model using pyfunc flavor. For more information on MLflow flavors, visit: https://www.mlflow.org/docs/latest/models.html#storage-format

Loading the models locally assume that you are running the notebook in an environment compatible with the model. The list of dependencies that is expected by the model is specified in the MLFlow model produced by AutoML (in the 'conda.yaml' file within the mlflow-model folder).

In [None]:
import sys
sys.path.append(os.path.join(mlflow_model_dir,"code"))
from blip_embeddings_mlflow_wrapper import BLIPEmbeddingsMLFlowModelWrapper
mlflow_model_wrapper = BLIPEmbeddingsMLFlowModelWrapper(task_type="embeddings")
import pickle
with open(os.path.join(mlflow_model_dir,"python_model.pkl"), 'wb') as f:
    pickle.dump(mlflow_model_wrapper, f)

In [None]:
import mlflow.pyfunc
# Way #1: Get the MLFlow model from the downloaded MLFlow model files
pyfunc_model = mlflow.pyfunc.load_model(mlflow_model_dir)

In [None]:
# get image embeddings
result = pyfunc_model.predict(test_df_image)
result

In [None]:
# get text embeddings
result = pyfunc_model.predict(test_df_text)
result

In [None]:
# Get combined embeddings
result = pyfunc_model.predict(test_df_combine)
result

# Test invalid input handling

In [None]:
test_df_invalid = test_df_combine.copy()

# test_df_invalid['text'].iloc[0] = "" # some text but not all

# test_df_invalid['image'].iloc[0] = "" # some image but not all

# empty dataframe
test_df_invalid['image'].iloc[0:4] = ""
test_df_invalid['text'].iloc[0:4] = ""

test_df_invalid

In [None]:
# Get combined embeddings
result = pyfunc_model.predict(test_df_invalid)
result