In [1]:
import os
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
from scipy.sparse.linalg import svds
from sklearn.metrics import mean_squared_error, precision_score, recall_score, f1_score
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import make_pipeline
from pyspark.sql import SparkSession
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Flatten, Concatenate, Dense, Dropout
from tensorflow.keras.regularizers import l2

### Data Loading
We start by loading the preprocessed user-item matrix saved from the data cleaning step. This data will be split into training and testing sets for model development.

In [2]:
# Read the CSV file into a DataFrame
#Load the user-item matrix
user_item = os.path.join("..", "Data", "Cleaned-Data", 'user_item_filtered.csv')
user_item_df = pd.read_csv(user_item)

# Convert the DataFrame to a sparse matrix
user_item_filtered = csr_matrix(user_item_df.values)

### Train/Test Split
We split the user-item interaction matrix into training and testing subsets. This allows us to train our models on a portion of the data and evaluate their performance on unseen data.

In [3]:
# Split the user-item interaction matrix into train and test subsets
train_data, test_data = train_test_split(user_item_filtered, test_size=0.2, random_state=42)

### Model 1: Singular Value Decomposition (SVD)
The first model we develop is based on SVD. SVD is a popular technique for matrix factorization in recommendation systems, especially useful for capturing latent features in sparse matrices.

In [4]:
# Perform SVD on the training data
u, s, vt = svds(train_data, k=100)

# Convert singular values to a diagonal matrix
s_diag = np.diag(s)

# Predict ratings for the test data
predicted_ratings = np.dot(np.dot(u, s_diag), vt)

# Get the non-zero indices of the test data
test_nonzero_indices = test_data.nonzero()

# Get the actual ratings from the test data
actual_ratings = np.array(test_data[test_nonzero_indices].data).ravel()

# Get the predicted ratings for the non-zero indices
predicted_ratings_nonzero = predicted_ratings[test_nonzero_indices[0], test_nonzero_indices[1]]

# Ensure predicted_ratings_nonzero is a 1D array
predicted_ratings_nonzero = predicted_ratings_nonzero.ravel()

# Debug: print lengths to check consistency
print("Length of actual ratings:", actual_ratings.shape)
print("Length of predicted ratings:", predicted_ratings_nonzero.shape)

# Evaluate the model using RMSE
mse = mean_squared_error(actual_ratings, predicted_ratings_nonzero)
rmse = np.sqrt(mse)
print("RMSE:", rmse)

Length of actual ratings: (1824,)
Length of predicted ratings: (1824,)
RMSE: 0.9996157395847921


That was a really bad model considering the RMSE is so close to 1. Maybe some other metrics would help us here? 

In [5]:
# Compute precision, recall, and F1-score for SVD
precision_svd = precision_score(actual_ratings, predicted_ratings_nonzero.round())
recall_svd = recall_score(actual_ratings, predicted_ratings_nonzero.round())
f1_svd = f1_score(actual_ratings, predicted_ratings_nonzero.round())

print("SVD Evaluation:")
print("Precision:", precision_svd)
print("Recall:", recall_svd)
print("F1-score:", f1_svd)

SVD Evaluation:
Precision: 0.0
Recall: 0.0
F1-score: 0.0


  _warn_prf(average, modifier, msg_start, len(result))


In [6]:
# Create a Truncated SVD instance
svd = TruncatedSVD(n_components=100, random_state=42)

In [7]:
# Perform SVD on the training data
lsa = make_pipeline(svd)
train_data_reduced = lsa.fit_transform(train_data)

# Transform the test data using the same model
test_data_reduced = lsa.transform(test_data)

# Compute predictions (for binary you might need to threshold these)
predicted_ratings_l2 = np.dot(train_data_reduced, svd.components_)

In [8]:
# Get the non-zero indices of the test data
test_nonzero_indices = test_data.nonzero()

# Extract actual ratings for the non-zero indices
actual_ratings = np.array(test_data[test_nonzero_indices]).ravel()

# Extract predicted ratings for the non-zero indices
predicted_ratings_nonzero_l2 = predicted_ratings_l2[test_nonzero_indices].ravel()

# Evaluate the model using RMSE
mse = mean_squared_error(actual_ratings, predicted_ratings_nonzero_l2)
rmse = np.sqrt(mse)
print("RMSE:", rmse)

RMSE: 0.999647555897552


That was a really bad model considering the RMSE is so close to 1. Maybe some other metrics would help us here? (Same explanaton as the other one.) 

In [9]:
# Compute precision, recall, and F1-score for Truncated SVD
precision_tsvd = precision_score(actual_ratings, predicted_ratings_nonzero_l2.round())
recall_tsvd = recall_score(actual_ratings, predicted_ratings_nonzero_l2.round())
f1_tsvd = f1_score(actual_ratings, predicted_ratings_nonzero_l2.round())


print("\nTruncated SVD Evaluation:")
print("Precision:", precision_tsvd)
print("Recall:", recall_tsvd)
print("F1-score:", f1_tsvd)


Truncated SVD Evaluation:
Precision: 0.0
Recall: 0.0
F1-score: 0.0


  _warn_prf(average, modifier, msg_start, len(result))


It seems as if both the SVD and TSVD are encountering a divide by zero error, this indicates that both models are not predicting any positive interactions correctly. This might be due to the sparsity of the user-item matrix. 

### Model 2: Alternating Least Squares (ALS)
The next model we explore is ALS. ALS is designed to handle implicit feedback by assigning confidence levels to interactions, making it well-suited for large-scale recommendation systems.

In [10]:
# Create a Spark session
spark = SparkSession.builder.appName("CollaborativeFiltering").config("spark.executor.memory", "8g").config("spark.driver.memory", "8g").getOrCreate()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/05/25 14:54:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [11]:
# Suppress log warnings
spark.sparkContext.setLogLevel("ERROR")

In [12]:
# Convert the user-item interaction matrix to a Spark DataFrame
user_item_df_spark = spark.createDataFrame(
    [(i, j, float(user_item_df.iloc[i, j])) for i in range(user_item_df.shape[0]) for j in range(user_item_df.shape[1])],
    ["user", "item", "rating"]
)

#Needed to reduce the size of the matrix to account for hardware constraints on my machine
user_item_df_spark_sample = user_item_df_spark.sample(fraction=0.1, seed=42)

In [13]:
# Split the data into training and testing sets
#(training_data_spark, test_data_spark) = user_item_df_spark.randomSplit([0.8, 0.2])

(training_data_spark, test_data_spark) = user_item_df_spark_sample.randomSplit([0.8, 0.2])

In [14]:
# Create an ALS model
als = ALS(maxIter=10, regParam=0.01, userCol="user", itemCol="item", ratingCol="rating",
          coldStartStrategy="drop")

In [15]:
# Train the model
model = als.fit(training_data_spark)

                                                                                

In [16]:
# Generate predictions on the test data
predictions = model.transform(test_data_spark)

In [17]:
# Evaluate the model
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))



Root-mean-square error = 0.019916126083892014


                                                                                

In [18]:
# Generate top-k recommendations for each user
user_recs = model.recommendForAllUsers(10)
user_recs.show(truncate=False)



+----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|user|recommendations                                                                                                                                                                                                                   |
+----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|12  |[{3038, 7.1372866E-20}, {2833, 2.4407156E-20}, {3590, 5.903527E-21}, {4213, 2.2550454E-21}, {943, 1.3167236E-21}, {2896, 9.147783E-22}, {1781, 4.6416783E-22}, {374, 4.5766635E-22}, {2672, 3.3773645E-22}, {3532, 2.3147842E-22}]|
|22  |[{1176, 1.8060043E-20}, {2833, 8.592545E-21}, {4466, 3.458



In [19]:
# Build the parameter grid
param_grid = (ParamGridBuilder().addGrid(als.rank, [10, 50, 100]).addGrid(als.regParam, [0.01, 0.1, 1.0]).addGrid(als.maxIter, [10, 20]).build())

In [20]:
# Create a cross-validator
cross_validator = CrossValidator(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator,numFolds=5)

In [21]:
# Run cross-validation to find the best model
cv_model = cross_validator.fit(training_data_spark)

                                                                                

In [22]:
# Get the best model from cross-validation
best_model = cv_model.bestModel

In [23]:
# Generate predictions on the test data
best_predictions = best_model.transform(test_data_spark)

In [24]:
# Evaluate the best model
best_rmse = evaluator.evaluate(best_predictions)
print("Best RMSE: ", best_rmse)

                                                                                

Best RMSE:  0.019916126083892014


[Stage 11465:>                                                      (0 + 2) / 2]                                                                                

In [25]:
# Generate top-k recommendations for each user using the best model
best_user_recs = best_model.recommendForAllUsers(10)
best_user_recs.show(truncate=False)



+----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|user|recommendations                                                                                                                                                                                                                   |
+----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|12  |[{3038, 7.1372866E-20}, {2833, 2.4407156E-20}, {3590, 5.903527E-21}, {4213, 2.2550454E-21}, {943, 1.3167236E-21}, {2896, 9.147783E-22}, {1781, 4.6416783E-22}, {374, 4.5766635E-22}, {2672, 3.3773645E-22}, {3532, 2.3147842E-22}]|
|22  |[{1176, 1.8060043E-20}, {2833, 8.592545E-21}, {4466, 3.458

                                                                                

In [26]:
# Convert predictions to Pandas DataFrame for easier evaluation
predictions_df = best_predictions.toPandas()

                                                                                

In [27]:
# Round predictions to get binary values
predictions_df['prediction'] = predictions_df['prediction'].round().astype(int)

In [28]:
# Compute precision, recall, and F1-score
precision = precision_score(predictions_df['rating'], predictions_df['prediction'], zero_division=1)
recall = recall_score(predictions_df['rating'], predictions_df['prediction'], zero_division=1)
f1 = f1_score(predictions_df['rating'], predictions_df['prediction'], zero_division=1)

print("Precision: ", precision)
print("Recall: ", recall)
print("F1-Score: ", f1)

Precision:  1.0
Recall:  0.0
F1-Score:  0.0


In [29]:
# Stop the Spark session
spark.stop()

### Model 3: Deep Learning
The final model we implement is a deep learning model using TensorFlow. Deep learning models can capture complex, non-linear relationships in the data, providing a flexible approach to learning user preferences.

In [30]:
#Attempting with deep learning
# Prepare the data for the deep learning model
train_data_nonzero = train_data.nonzero()
test_data_nonzero = test_data.nonzero()

train_user = train_data_nonzero[0]
train_item = train_data_nonzero[1]
train_rating = train_data[train_user, train_item].A1

test_user = test_data_nonzero[0]
test_item = test_data_nonzero[1]
test_rating = test_data[test_user, test_item].A1

In [31]:
# Define the deep learning model
num_users = user_item_df.shape[0]
num_items = user_item_df.shape[1]

user_input = Input(shape=(1,))
item_input = Input(shape=(1,))

user_embedding = Embedding(num_users, 50)(user_input)
item_embedding = Embedding(num_items, 50)(item_input)

user_vector = Flatten()(user_embedding)
item_vector = Flatten()(item_embedding)

concat = Concatenate()([user_vector, item_vector])

dense = Dense(128, activation='relu', kernel_regularizer=l2(0.01))(concat)
dropout = Dropout(0.5)(dense)
output = Dense(1)(dropout)

model = Model(inputs=[user_input, item_input], outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')

In [32]:
# Implement early stopping
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

In [33]:
# Train the model
model.fit([train_user, train_item], train_rating, epochs=50, batch_size=64, validation_split=0.1, callbacks=[early_stopping])

Epoch 1/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 1.3745 - val_loss: 0.1430
Epoch 2/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.1023 - val_loss: 0.0102
Epoch 3/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0200 - val_loss: 8.5628e-04
Epoch 4/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0141 - val_loss: 3.0396e-04
Epoch 5/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0140 - val_loss: 2.8840e-04
Epoch 6/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0137 - val_loss: 2.0031e-04
Epoch 7/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0135 - val_loss: 2.4602e-04
Epoch 8/50
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0126 - val_loss: 5.4395e-04
Epoch 9/50
[1m1

<keras.src.callbacks.history.History at 0x2a0cb5610>

In [34]:
# Predict ratings for the test data
predicted_ratings = model.predict([test_user, test_item])

[1m57/57[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 334us/step


In [35]:
# Evaluate the model using RMSE
rmse = np.sqrt(mean_squared_error(test_rating, predicted_ratings))
print("Deep Learning Model RMSE:", rmse)

Deep Learning Model RMSE: 0.018339885321576637


In [36]:
# Compute precision, recall, and F1-score
predicted_ratings_rounded = np.round(predicted_ratings).astype(int)

precision_dl = precision_score(test_rating, predicted_ratings_rounded, zero_division=1)
recall_dl = recall_score(test_rating, predicted_ratings_rounded, zero_division=1)
f1_dl = f1_score(test_rating, predicted_ratings_rounded, zero_division=1)

print("Deep Learning Model Evaluation:")
print("Precision:", precision_dl)
print("Recall:", recall_dl)
print("F1-score:", f1_dl)

Deep Learning Model Evaluation:
Precision: 1.0
Recall: 1.0
F1-score: 1.0


### Conclusion
We summarize the performance of each model, highlight their strengths and weaknesses, and outline the next steps for improving our recommendation system.

Summary of Model Evaluations

In this project, we developed and evaluated four different models to predict user preferences for books, music, and movies. Each model’s performance was assessed using various metrics, and the results are summarized below.

1. Singular Value Decomposition (SVD)

	•	Length of Actual Ratings: 1824
    
	•	Length of Predicted Ratings: 1824
    
	•	Root Mean Squared Error (RMSE): 0.9996
    
	•	Precision: 0.0
    
	•	Recall: 0.0
    
	•	F1-Score: 0.0
    

Analysis:
The SVD model achieved an RMSE of approximately 0.9996, indicating poor performance in predicting user ratings. The precision, recall, and F1-score were all 0.0, suggesting that the model failed to correctly identify relevant items. This poor performance could be due to the model’s assumption of linearity in latent factors, which may not hold true in our data.

2. Truncated SVD

	•	Root Mean Squared Error (RMSE): 0.9996
    
	•	Precision: 0.0
    
	•	Recall: 0.0
    
	•	F1-Score: 0.0
    

Analysis:
The Truncated SVD model also resulted in an RMSE of approximately 0.9996, similar to the basic SVD model. The precision, recall, and F1-score were again all 0.0, indicating that this model did not perform well in our recommendation context. The linear assumption of latent factors likely affected its performance.

3. Alternating Least Squares (ALS)

	•	Root Mean Squared Error (RMSE): 0.0199
    
	•	Precision: 1.0
    
	•	Recall: 0.0
    
	•	F1-Score: 0.0
    

Analysis:
The ALS model achieved a significantly lower RMSE of 0.0199, suggesting better predictive accuracy compared to the SVD models. However, the precision was 1.0 while the recall and F1-score were 0.0, indicating that the model identified very few items as relevant (leading to precision) but missed many actual relevant items (resulting in zero recall). This suggests potential underfitting, likely due to the high sparsity of the user-item matrix. Further hyperparameter tuning is needed to improve the model’s performance.

4. Deep Learning Model

	•	Root Mean Squared Error (RMSE): 0.0183
    
	•	Precision: 1.0
    
	•	Recall: 1.0
    
	•	F1-Score: 1.0
    

Analysis:
The deep learning model achieved the lowest RMSE of 0.0183, indicating excellent predictive accuracy. However, the precision, recall, and F1-score were all 1.0, which is typically unrealistic and suggests that the model is overfitting to the training data. This overfitting means that the model may not generalize well to new, unseen data.

Conclusion


	•	SVD and Truncated SVD: Both SVD models failed to perform well due to their assumption of linear latent factors, resulting in high RMSE and zero precision, recall, and F1-score.
    
	•	ALS: The ALS model showed promising RMSE results, but the issue of underfitting in the top-k recommendations highlights the need for further tuning and addressing the sparsity of the data.
    
	•	Deep Learning: The deep learning model achieved the best RMSE, but its perfect scores in precision, recall, and F1-score indicate severe overfitting, which limits its practicality.
    

Recommendations


	•	Hyperparameter Tuning: Further tuning of the ALS model is essential to improve its performance and mitigate the underfitting issue.
    
	•	Overfitting Mitigation: For the deep learning model, applying regularization techniques, dropout, and cross-validation will help reduce overfitting and improve generalization to new data.
    
	•	Hybrid Approaches: Exploring hybrid recommendation systems that combine collaborative filtering with content-based methods can leverage additional user and item metadata, potentially improving overall recommendation quality.
    

By iterating on these models and incorporating the suggested improvements, we can develop a more accurate and reliable cross-platform recommendation system for books, music, and movies.