# Evaluating Performance of the Movie Recommendation Model

In this file, I test the performance of the Movie Recommendation Model presented in the previous file titled "Matrix Factorization Movie Recommendation System."

I do this by splitting the dataset into training data and testing data. The rest of the code is the same as the original model.

In [2]:
import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Load the datasets
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

# Split the data into training and testing sets
train_data, test_data = train_test_split(ratings, test_size=0.2, random_state=42)

# Create the training user-item matrix
train_user_item_matrix = train_data.pivot(index='userId', columns='movieId', values='rating').fillna(0)

# Create the testing user-item matrix
test_user_item_matrix = test_data.pivot(index='userId', columns='movieId', values='rating').fillna(0)

# Reindex the test matrix to have the same columns as the train matrix
test_user_item_matrix = test_user_item_matrix.reindex(columns=train_user_item_matrix.columns, fill_value=0)


In [3]:
# Fit the SVD model on the training data
svd = TruncatedSVD(n_components=50, random_state=42)
svd.fit(train_user_item_matrix)

# Get the user and item factors for the training data
train_user_factors = svd.transform(train_user_item_matrix)
train_item_factors = svd.components_

# Reconstruct the matrix for the training data
reconstructed_train_matrix = np.dot(train_user_factors, train_item_factors)


In [4]:
# Reconstruct the matrix for the test data
reconstructed_test_matrix = np.dot(svd.transform(test_user_item_matrix.fillna(0)), svd.components_)

# Function to calculate and print evaluation metrics
def evaluate_model(test_user_item_matrix, reconstructed_matrix):
    # Flatten the matrices for evaluation
    test_actual = test_user_item_matrix.values.flatten()
    test_predicted = reconstructed_matrix.flatten()

    # Calculate evaluation metrics
    mse = mean_squared_error(test_actual, test_predicted)
    mae = mean_absolute_error(test_actual, test_predicted)

    print(f"Mean Squared Error: {mse}")
    print(f"Mean Absolute Error: {mae}")

# Evaluate the model using the test set
evaluate_model(test_user_item_matrix, reconstructed_test_matrix)


Mean Squared Error: 0.02537051458897461
Mean Absolute Error: 0.023424500077415658


## Performance Metrics

#### Mean Squared Error (MSE):
* Measures the average of the squared differences between the predicted ratings and the actual ratings.
* In the context of a movie recommendation system:
 * Lower MSE indicates better performance. It means that, on average, the squared differences between predicted ratings and the actual ratings are smaller, implying that predictions are closer to the actual values.
 * Higher MSE implies that predictions are further off from the actual ratings.

#### Mean Absolute Error (MAE):
* Measures the average of the absolute differences between the predicted ratings and the actual ratings.
* In the context of a movie recommendation system:
 * Lower MAE indicates better performance. It means that, on average, the absolute differences between predicted ratings and the actual ratings are smaller, suggesting predictions are more accurate.
 * Higher MAE implies that predictions are less accurate.


In [6]:
def get_recommendations(user_id, user_item_matrix, reconstructed_matrix, movies, num_recommendations=3):
    user_index = user_id - 1  # assuming user IDs start at 1
    user_ratings = reconstructed_matrix[user_index]
    sorted_movie_indices = np.argsort(user_ratings)[::-1]
    recommended_movie_indices = [idx for idx in sorted_movie_indices if user_item_matrix.iloc[user_index, idx] == 0]
    top_recommendations = recommended_movie_indices[:num_recommendations]

    recommended_movie_titles = []
    for movie_id in top_recommendations:
        movie_title = movies[movies['movieId'] == movie_id].title.values
        if len(movie_title) > 0:  # Ensure the movie title exists
            recommended_movie_titles.append(movie_title[0])
    return recommended_movie_titles

# Function to get top latent factors contributing to a recommendation
def get_top_factors(user_id, movie_id, user_factors, item_factors, num_factors=3):
    user_index = user_id - 1
    movie_index = movie_id - 1
    user_vector = user_factors[user_index]
    item_vector = item_factors[:, movie_index]
    factor_contributions = user_vector * item_vector
    top_factors = np.argsort(factor_contributions)[-num_factors:][::-1]
    return top_factors, factor_contributions[top_factors]

In [8]:
# Example usage: Get recommendations for multiple users and understand recommendations
for i in range(1, 3):
    user_id = i
    recommendations = get_recommendations(user_id, train_user_item_matrix, reconstructed_train_matrix, movies, num_recommendations=3)
    print(f"\nRecommendations for user {user_id}:\n{recommendations}\n")

    for movie_title in recommendations:
        movie_id = movies[movies['title'] == movie_title].movieId.values[0]
        top_factors, contributions = get_top_factors(user_id, movie_id, train_user_factors, train_item_factors)
        print(f"Top factors for recommendation of '{movie_title}':")
        print(f"Factors: {top_factors}")
        print(f"Contributions: {contributions}\n")



Recommendations for user 1:
['Sunset Park (1996)', 'Inkwell, The (1994)', 'Ghost in the Shell (Kôkaku kidôtai) (1995)']

Top factors for recommendation of 'Sunset Park (1996)':
Factors: [29 31 33]
Contributions: [0.01374063 0.01127923 0.01110957]

Top factors for recommendation of 'Inkwell, The (1994)':
Factors: [18  6 31]
Contributions: [0.02010761 0.01885013 0.00912894]

Top factors for recommendation of 'Ghost in the Shell (Kôkaku kidôtai) (1995)':
Factors: [ 6  0 40]
Contributions: [0.13790542 0.06609049 0.05594322]


Recommendations for user 2:
['Scout, The (1994)', 'Frankie Starlight (1995)', 'Cabin Boy (1994)']

Top factors for recommendation of 'Scout, The (1994)':
Factors: [2 4 0]
Contributions: [0.78887295 0.49266802 0.41820295]

Top factors for recommendation of 'Frankie Starlight (1995)':
Factors: [ 4  8 33]
Contributions: [0.05340896 0.04266334 0.01562789]

Top factors for recommendation of 'Cabin Boy (1994)':
Factors: [ 1  3 40]
Contributions: [0.02028922 0.01977889 0.01

# Understanding the Recommendations
### Factors:

These numbers (e.g., [29, 31, 33] for 'Sunset Park (1996)') represent the top latent factors that have the highest contributions to the user's preference for this movie. Latent factors are underlying features that the matrix factorization algorithm extracts from the user-item interaction matrix. They capture patterns such as genre preferences, actors, directors, or any other nuanced traits that influence user ratings.

### Contributions:

These values (e.g., [0.01374063, 0.01127923, 0.01110957]) indicate how strongly each corresponding latent factor influences the recommendation for that movie. Higher values mean that the factor has a greater influence on the recommendation.

# Explanation of User 1's Recommendations

### 1. Sunset Park (1996):

Factors: [29, 31, 33]

Contributions: [0.01374063, 0.01127923, 0.01110957]

Interpretation: These factors suggest that the user's preference for 'Sunset Park (1996)' is influenced by specific latent traits captured by factors 29, 31, and 33. These could represent the user's inclination towards sports movies, specific actors, or certain movie styles.

### 2. Inkwell, The (1994):

Factors: [18, 6, 31]

Contributions: [0.02010761, 0.01885013, 0.00912894]

Interpretation: The recommendation is driven by factors 18, 6, and 31, indicating the user's interest in coming-of-age stories, particular time periods, or cultural settings depicted in the movie.

### 2. Ghost in the Shell (Kôkaku kidôtai) (1995):

Factors: [ 6, 0, 40]

Contributions: [0.13790542, 0.06609049, 0.05594322]

Interpretation: The significant influence of factors 6, 0, and 40 suggests that the user's preference for this movie is likely due to an interest in sci-fi, animation style, or the philosophical themes explored in the film.

# Conclusion

By analyzing the top latent factors and their contributions, you can gain insights into the specific aspects of a movie that resonate most with a user's preferences. This helps in understanding the underlying reasons for each recommendation and tailoring the recommendation system to better match user tastes.