<h1 style=\"text-align: center; font-size: 50px;\">🎥 Advanced Recommender Systems with Tensorflow MLflow Integration</h1>

## Notebook Overview
- Start Execution
- User Constants
- Install and Import Libraries
- Configure Settings
- Verify Assets
- Loading Data
- Memory-Based Collaborative Filtering
- Logging Model to MLflow
- Fetching the Latest Model Version from MLflow
- Loading the Model and Running Inference

## Start Execution

In [1]:
import logging
import time

# Configure logger
logger: logging.Logger = logging.getLogger("register_model_logger")
logger.setLevel(logging.INFO)
logger.propagate = False  # Prevent duplicate logs from parent loggers

# Set formatter
formatter: logging.Formatter = logging.Formatter(
    fmt="%(asctime)s - %(levelname)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

# Configure and attach stream handler
stream_handler: logging.StreamHandler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)

In [2]:
start_time = time.time()  

logger.info("Notebook execution started.")

2025-07-16 12:58:07 - INFO - Notebook execution started.


## User Constants

In [3]:
MOVIE_ID = 5
RATING = 3.5

## Install and Import Libraries

In [4]:
# ------------------------ Data Manipulation ------------------------
import numpy as np
import pandas as pd

# # ------------------------ Statistical and Machine Learning tools ------------------------
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.metrics import mean_squared_error
from math import sqrt
import scipy.sparse as sp
from scipy.sparse.linalg import svds

# ------------------------ Deep learning framework ------------------------
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard

# ------------------------ System Utilities ------------------------
import os
import warnings
import datetime
from pathlib import Path

# ------------------------ Visualization Libraries ------------------------
import matplotlib.pyplot as plt

# ------------------------ MLflow Integration ------------------------
import mlflow
from mlflow import MlflowClient
from mlflow.types.schema import Schema, ColSpec
from mlflow.models import ModelSignature

2025-07-16 12:58:08.788409: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-07-16 12:58:08.801098: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752670688.813373     339 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752670688.817183     339 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-16 12:58:08.832428: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

## Configure Settings

In [5]:
# Suppress Python warnings
warnings.filterwarnings("ignore")

In [6]:
# ------------------------- Paths -------------------------
DATA_PATH = "/home/jovyan/datafabric/tutorial/"
LOG_DIR = "/phoenix/tensorboard/tensorlogs/"
OUTPUT_DIR = "../model_artifacts"
ARTIFACT_PATH = "movie_recommender_model"
# Name of the MLflow experiment for tracking performance and metrics
EXPERIMENT_NAME = "MovieRecommenderExperiment"
RUN_NAME = "Movie_Recommender_Run"
MODEL_NAME = "Movie_Recommender_Model"

## Verify Assets

In [7]:
# Check whether the Dataset file exists
is_dataset_available = Path(DATA_PATH).exists()

# Log the configuration status of the dataset
if is_dataset_available:
    logger.info("The Dataset is properly configured.")
else:
    logger.info(
        "The Dataset is not properly configured. Please create and download the required assets "
        "in your project on AI Studio."
    )

2025-07-16 12:58:10 - INFO - The Dataset is properly configured.


## Loading Data

In [8]:
asset_folder = DATA_PATH

In [9]:
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(f"{asset_folder}ml-100k/u.data", sep='\t', names=column_names)

In [10]:
movie_titles = pd.read_csv(f"{asset_folder}Movie_Id_Titles.csv")

In [11]:
df = pd.merge(df,movie_titles,on='item_id')

In [12]:
display(df.sample(2))
display(df.shape)

Unnamed: 0,user_id,item_id,rating,timestamp,title
62513,500,77,3,883875793,"Firm, The (1993)"
52039,303,1509,1,879544435,Getting Even with Dad (1994)


(100000, 5)

In [13]:
train_data, test_data = train_test_split(df, test_size=0.25)

## Memory-Based Collaborative Filtering

In [14]:
n_users = df.user_id.nunique()
n_items = df.item_id.nunique()
#Create two user-item matrices, one for training and another for testing
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
    train_data_matrix[line[1]-1, line[2]-1] = line[3]  

test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
    test_data_matrix[line[1]-1, line[2]-1] = line[3]

In [15]:
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
item_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')

In [16]:
def predict(ratings, similarity, type='user'):
    """
    Predicts ratings using collaborative filtering based on user or item similarity.

    Parameters:
        ratings (array): A matrix where each row represents a user and each column represents an item.
        similarity (array): A similarity matrix representing relationships between users or items.
        type (str): Defines the type of prediction. Defaults to 'user'.

    Returns:
        array: A matrix of predicted ratings.
    """
    try:
        if type == 'user':
            mean_user_rating = ratings.mean(axis=1)
            #You use np.newaxis so that mean_user_rating has same format as ratings
            ratings_diff = (ratings - mean_user_rating[:, np.newaxis]) 
            pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
        elif type == 'item':
            pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])     
        return pred
    except Exception as e:
        logger.error(f"Error predicting ratings: {str(e)}")
        raise

In [17]:
item_prediction = predict(train_data_matrix, item_similarity, type='item')
user_prediction = predict(train_data_matrix, user_similarity, type='user')

### SVD

In [18]:
def rmse(prediction, ground_truth):
    """
    Computes the Root Mean Square Error (RMSE) between predicted values and ground truth values.

    Parameters:
        prediction (array-like): Predicted values.
        ground_truth (array-like): Actual values.

    Returns:
        float: The RMSE value.
    """
    try:
        prediction = prediction[ground_truth.nonzero()].flatten() 
        ground_truth = ground_truth[ground_truth.nonzero()].flatten()
        return sqrt(mean_squared_error(prediction, ground_truth))
    except Exception as e:
            logger.error(f"Error computing rmse: {str(e)}")
            raise

In [19]:
#get SVD components from train matrix. Choose k.
u, s, vt = svds(train_data_matrix, k = 20)
s_diag_matrix=np.diag(s)
X_pred = np.dot(np.dot(u, s_diag_matrix), vt)
logger.info('User-based CF MSE: ' + str(rmse(X_pred, test_data_matrix)))

2025-07-16 12:58:14 - INFO - User-based CF MSE: 2.720641027318057


## Logging Model to MLflow

In [20]:
def normalize_ratings(ratings, min_rating=1, max_rating=5):
    """
    Normalizes rating values

    Parameters:
        ratings (array): A matrix where each row represents a user and each column represents an item.
        min_rating (int): Minimal movie rating. Defaults to 1.
        max_rating (int, optional): Maximal movie rating . Defaults to 5.

    Returns:
        array: Normalized values.
    """
    ratings = np.array(ratings)
    if ratings.max() == ratings.min():
        return np.full(ratings.shape, (max_rating + min_rating) / 2)
    normalized_ratings = (ratings - ratings.min()) / (ratings.max() - ratings.min()) * (max_rating - min_rating) + min_rating
    return np.clip(normalized_ratings, min_rating, max_rating)  

class MovieRecommender(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
        """
        Loads training data from the MLflow context.
        """
        try:
            self.train_data_matrix = np.load(context.artifacts["train_data_matrix"])
            self.movie_titles = pd.read_csv(context.artifacts["movie_titles_path"])
            self.n_users, self.n_items = self.train_data_matrix.shape
        except Exception as e:
            logger.error(f"Error loading context: {str(e)}")
            raise

    def predict(self, context, model_input):
        """
        Performs prediction using specified prediction method.

        Parameters:
            context: MLflow context.
            model_input: Input data

        Returns:
            array: Model prediction output.
        """
        try:
            movie_id = int(model_input['movie_id'][0])
            rating = float(model_input['rating'][0])

            user_ratings = np.zeros(self.n_items)
            user_ratings[movie_id - 1] = rating

            ratings = np.vstack([self.train_data_matrix, user_ratings])

            item_similarity = pairwise_distances(ratings.T, metric='cosine')

            mean_item_rating = ratings[:-1].mean(axis=0)  
            ratings_diff = (ratings[:-1] - mean_item_rating)
            pred = mean_item_rating + item_similarity.dot(ratings_diff.T).T[-1]

            user_pred_normalized = normalize_ratings(pred)

            movie_titles = self.movie_titles['title'].tolist()

            predictions_with_titles = list(zip(movie_titles, user_pred_normalized))

            ordenaded_list = sorted(predictions_with_titles, reverse = True)

            best_five_movies = ordenaded_list[:5]

            return best_five_movies
        
        except Exception as e:
            logger.error(f"Error performing prediction: {str(e)}")
            raise

    @classmethod
    def log_model(cls, train_data_matrix_path, movie_titles_path):
        """
        Log the Recommender model to MLflow with model artifacts and signatures.

        Parameters:
            train_data_matrix_path (array): Path to training data.
            movie_titles_path (array): Path to movie titles.
        """
        try:
            input_schema = Schema([
                ColSpec("long", "movie_id"),
                ColSpec("double", "rating")
            ])
            output_schema = Schema([
                ColSpec("string", "movie_title"),
                ColSpec("double", "prediction")
            ])
            signature = ModelSignature(inputs=input_schema, outputs=output_schema)

            mlflow.pyfunc.log_model(
                artifact_path=ARTIFACT_PATH ,
                python_model=cls(),
                artifacts={
                    "train_data_matrix": train_data_matrix_path,
                    "movie_titles_path": movie_titles_path
                },
                signature=signature,
                pip_requirements=["mlflow", "pandas", "scikit-learn", "numpy"]
            )
        except Exception as e:
            logger.error(f"Error logging model: {str(e)}")
            raise

output_dir = OUTPUT_DIR
os.makedirs(output_dir, exist_ok=True)
train_data_matrix_path = os.path.join(output_dir, "train_data_matrix.npy")
np.save(train_data_matrix_path, train_data_matrix)
movie_titles_path = os.path.join(output_dir, "movie_titles.csv")
movie_titles.to_csv(movie_titles_path, index=False)

In [21]:
logger.info(f'Starting the experiment: {EXPERIMENT_NAME}')

# Set the MLflow experiment name
mlflow.set_tracking_uri("/phoenix/mlflow")
mlflow.set_experiment(experiment_name=EXPERIMENT_NAME)

# Start an MLflow run
with mlflow.start_run(run_name=RUN_NAME) as run:
    user_rmse = rmse(user_prediction, test_data_matrix)
    item_rmse = rmse(item_prediction, test_data_matrix)
    svd_rmse = rmse(X_pred, test_data_matrix)
    
    mlflow.log_metric("User_based_CF_RMSE", user_rmse)
    mlflow.log_metric("Item_based_CF_RMSE", item_rmse)
    mlflow.log_metric("User_based_CF_MSE_SVD", svd_rmse)
    # Print the artifact URI for reference
    logging.info(f"Run's Artifact URI: {run.info.artifact_uri}")

    # Log the model to MLflow
    MovieRecommender.log_model(train_data_matrix_path, movie_titles_path)

    # Register the logged model in MLflow Model Registry
    mlflow.register_model(
        model_uri=f"runs:/{run.info.run_id}/{MODEL_NAME}", 
        name=MODEL_NAME
    )

logger.info(f'Registered the model: {MODEL_NAME}')

2025-07-16 12:58:14 - INFO - Starting the experiment: MovieRecommenderExperiment


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Registered model 'Movie_Recommender_Model' already exists. Creating a new version of this model...
Created version '8' of model 'Movie_Recommender_Model'.
2025-07-16 12:58:17 - INFO - Registered the model: Movie_Recommender_Model


## Fetching the Latest Model Version from MLflow

In [22]:
# Initialize the MLflow client
client = MlflowClient()

# Retrieve the latest version of the model
model_metadata = client.get_latest_versions(MODEL_NAME, stages=["None"])
latest_model_version = model_metadata[0].version  # Extract the latest model version

# Fetch model information, including its signature
model_info = mlflow.models.get_model_info(f"models:/{MODEL_NAME}/{latest_model_version}")

# Print the latest model version and its signature
logger.info(f"Latest Model Version: {latest_model_version}")
logger.info(f"Model Signature: {model_info.signature}")

2025-07-16 12:58:17 - INFO - Latest Model Version: 8
2025-07-16 12:58:17 - INFO - Model Signature: inputs: 
  ['movie_id': long (required), 'rating': double (required)]
outputs: 
  ['movie_title': string (required), 'prediction': double (required)]
params: 
  None



## Loading the Model and Running Inference

In [23]:
model = mlflow.pyfunc.load_model(model_uri=f"models:/{MODEL_NAME}/{latest_model_version}")
df_input = pd.DataFrame({
    'movie_id': [MOVIE_ID],
    'rating': [RATING],
})
prediction = model.predict(df_input)
logger.info(prediction)

2025-07-16 12:58:17 - INFO - [('Á köldum klaka (Cold Fever) (1994)', 4.646959466729642), ('unknown', 4.246263010290732), ('Zeus and Roxanne (1997)', 4.636585145483254), ("Young Poisoner's Handbook, The (1995)", 3.9477992567162175), ('Young Guns II (1990)', 2.1466161165198754)]


In [24]:
end_time: float = time.time()
elapsed_time: float = end_time - start_time
elapsed_minutes: int = int(elapsed_time // 60)
elapsed_seconds: float = elapsed_time % 60

logger.info(f"⏱️ Total execution time: {elapsed_minutes}m {elapsed_seconds:.2f}s")
logger.info("✅ Notebook execution completed successfully.")

2025-07-16 12:58:17 - INFO - ⏱️ Total execution time: 0m 9.59s
2025-07-16 12:58:17 - INFO - ✅ Notebook execution completed successfully.


Built with ❤️ using [**Z by HP AI Studio**](https://zdocs.datascience.hp.com/docs/aistudio/overview).