## Section 1:

In [2]:
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.decomposition import NMF
from math import sqrt

In [4]:
train_data = pd.read_csv(r"C:\Users\aryan\Downloads\Movie Ratings Data\train.csv")
test_data = pd.read_csv(r"C:\Users\aryan\Downloads\Movie Ratings Data\test.csv")
MV_movies = pd.read_csv(r"C:\Users\aryan\Downloads\Movie Ratings Data\movies.csv")
MV_users = pd.read_csv(r"C:\Users\aryan\Downloads\Movie Ratings Data\users.csv")

In [5]:
train_data.head()

Unnamed: 0,uID,mID,rating
0,744,1210,5
1,3040,1584,4
2,1451,1293,5
3,5455,3176,2
4,2507,3074,5


In [6]:
test_data.head()

Unnamed: 0,uID,mID,rating
0,2233,440,4
1,4274,587,5
2,2498,454,3
3,2868,2336,5
4,1636,2686,5


In [7]:
# Merge the train and test data for matrix factorization
all_data = pd.concat([train_data, test_data])

In [8]:
# Pivot the data to create the user-item matrix
ratings_matrix = all_data.pivot(index='uID', columns='mID', values='rating').fillna(0)

In [9]:
# Split the data for matrix factorization
X_train, X_test = train_test_split(ratings_matrix, test_size=0.2, random_state=42)

In [40]:
# Perform matrix factorization using Non-negative matrix factorization (NMF)
nmf = NMF(n_components=10, init='random', random_state=42)
X_train_nmf = nmf.fit_transform(X_train)
X_test_nmf = nmf.transform(X_test)

In [41]:
# Reconstruct the matrix
ratings_matrix_predicted = pd.DataFrame(nmf.inverse_transform(X_test_nmf), index=X_test.index, columns=X_test.columns)

In [42]:
# Merge the predicted ratings with test data on user and movie IDs
test_data_predicted = test_data.merge(ratings_matrix_predicted.stack().reset_index(name='predicted_rating'), 
                                     on=['uID', 'mID'], how='left')


In [43]:
# Replace the empty values with 0
test_data_predicted['predicted_rating'].fillna(0, inplace=True)

# Calculate the Root Mean Squared Error (RMSE)
rmse = sqrt(mean_squared_error(test_data_predicted['rating'], test_data_predicted['predicted_rating']))
print("Root Mean Squared Error (RMSE):", rmse)

Root Mean Squared Error (RMSE): 3.562539660933409


## Section 2:

### Conclusion:

The Root Mean Squared Error (RMSE) is the square root of the average of the squared differences between the actual ratings and the predicted ratings. A lower RMSE would mean superior performance in predicting movie ratings. The RMSE of 3.563 can be explained in that the model’s predictions are off from actual rating by 3.563 units. We may conclude that non-negative matrix factorization (NMF) approach did not perform adequately. This could be to multiple reasons such as issues in the data. For NMF specifically, it is a technique that adds more complexity to the model compared to simple baseline methods and so the choice of the number of components can play a role in the predictions. NMF may also miss certain patterns in the data since it enforces non-negativity constraints on the learned factors. If the data has negative correlations within it, then these relationships would be missed. To improve it, we can optimize the hyperparameters and see how changing components, latent factors, test size, strength of regularization. The data could be preprocessed to prepare it better for the NMF model . We can also investigate gathering more information in the data like user demographics. 


### Citation:

Naik, G. R. (2016). *Non-negative matrix factorization techniques: Advances in Theory and Applications.* Springer.

*MovieLens*. (2021, December 8). GroupLens. https://grouplens.org/datasets/movielens/