# Unsupervised Learning Solution
### EDSA - Movie Recommendation 2022 
#### AI Incorporated - Team 4 EDSA

© Explore Data Science Academy

<img src="https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2205222%2Fbca114f2e4f6b9b46f2cc76527d7401e%2FImage_header.png?generation=1593773828621598&alt=media" width=100%/> 

<a id="cont"></a>
## Table of Content

<a href=#one>1. Introduction</a>

    1.1 Overview
    1.2 Problem Statement
    1.3 Model Versioning with COMET
    1.4 Required Installations
       
<a href=#two>1. Import Packages</a>

<a href=#three>2. Collect Data</a>

<a href=#four>4. Exploratory Data Analysis (EDA)<a>
    
    4.1 

<a href=#five>5. Data Processing</a>
    
    5.1 

<a href=#six>6. Feature Engineering</a>

<a href=#seven>7. Modelling</a>
    
    7.1 

<a href=#eight>8. Model Performance</a>
    
    8.1 

<a href=#nine>9. Saving & Exporting Model</a>
    
    9.1 Export Test Prediction as CSV
    9.2 Log to Comet

<a href=#ten>10. Conclusion</a>

<a href=#eleven>11. Recommendation</a>

<a href=#ref>Reference Document Links</a>

<a id="one"></a>
## 1. INTRODUCTION
<a href=#cont>Back to Table of Contents</a>

#### 1.1 Overview

In today’s technology driven world, recommender systems are socially and economically critical to ensure that individuals can make optimised choices surrounding the content they engage with on a daily basis. One application where this is especially true is movie recommendations; where intelligent algorithms can help viewers find great titles from tens of thousands of options.

Hence, We will be constructing a recommendation algorithm based on `Content` and `Collaborative` filtering, capable of accurately predicting how a user will rate a movie they have not yet viewed, based on their historical preferences.

<img src="https://miro.medium.com/max/1400/1*odvftNNQJp3O6vpwmZsJOQ.png" width=100%/> 

#### 1.2 Problem Statement

In this era of Artifical Intelligence, Everthing from the Government to Education down to the ever growing entertainment industry now realies on AI tech to boost their Efficiency. Organisations using recommender systems focus on increasing sales and deliveries as a result of very personalized offers and an enhanced customer experience.

Hence, we will be providing an accurate and robust solution to this challenge has immense economic potential, with users of the system being personalised recommendations - generating platform affinity for the streaming services which best facilitates their audience's viewing.

#### 1.3 Model Versioning with COMET

To Begin with, We will be using Comet, a great tool for model versioning and experimentation as it records the parameters and conditions from each of your experiements- allowing us to reproduce your results, or go back to a previous version of our experiment.

In [None]:
# Install Comet
# !pip install comet_ml

In [None]:
# Import Comet package
from comet_ml import Experiment

# Setting the API key

# experiment = Experiment(
#     api_key="__________",
#     project_name="____________",
#     workspace="________",
# )

'\nGo ahead and get your api_key, project_name & workspace from your\nComet Project Folder.\n'

####  1.4 Required Installations

In [1]:
!pip install surprise

Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.1.tar.gz (11.8 MB)
[K     |████████████████████████████████| 11.8 MB 21.5 MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1630127 sha256=dde92c1b1faecbaeffdd935cefff8fc2e4033eb0aeac556647137bc53397cfc8
  Stored in directory: /root/.cache/pip/wheels/76/44/74/b498c42be47b2406bd27994e16c5188e337c657025ab400c1c
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.1 surprise-0.1


So, Let's Proceed

<a id="two"></a>
## 2. IMPORT PACKAGES
<a href=#cont>Back to Table of Contents</a>

In this section, we will be importing libraries which are a collections of modules in their classes and based on their functionality. For this Analysis and Modelling, we wil be requiring;

   ` For Data Manupulation, libraries such as Pandas, Numpy etc.`
   
`For Data Visualization, libraries such as mathplotlib, seaborn`
    
`libraries for data prepartion, feature selection, model building, Performance Calculation and more.`

**SEE** in-line comments BELOW for purpose per importation.

In [2]:
""" 
For a seamless run, 
All required libraries will be imported here. 
"""

# Libraries for data loading, data manipulation and data visulisation
import pandas as pd                                                   # <-- for loading CSV data
import numpy as np                                                    # <-- Used for mathematical operations
import matplotlib.pyplot as plt                                       # <-- for Graphical Representation                                                 
import seaborn as sns                                                 # <-- for specialized plots
import re                                                             # <-- for handling Regular expressions                                                           
import scipy as sp                                                    # <-- Used in our code for numerical efficientcy. 
sns.set()                                                             # <-- set plot style

# Libraries for data preparation


# Libraries for featurization and similarity computation
from sklearn.metrics.pairwise import cosine_similarity 
from sklearn.feature_extraction.text import TfidfVectorizer

# Libraries for Model Building

# Libraries used during sorting procedures.
import operator                                                       # <-- Convienient item retrieval during iteration 
import heapq                                                          # <-- Efficient sorting of large lists

# Libraries for calculating performance metrics
import time

# Libraries to Save/Restore Models
import pickle

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline 

<a id="three"></a>
## 3. Collect Data
<a href=#cont>Back to Table of Contents</a>

This dataset consists of several million 5-star ratings obtained from users of the online MovieLens movie recommendation service. The MovieLens dataset has long been used by industry and academic researchers to improve the performance of explicitly-based recommender systems.

We'll be using this special version of the MovieLens dataset which is enriched with additional data, and resampled for fair evaluation purposes.

**Source**

The data for the MovieLens dataset is maintained by the GroupLens research group in the Department of Computer Science and Engineering at the University of Minnesota. Additional movie content data was legally scraped from IMDB

**Supplied Files**

* `genome_scores.csv` - a score mapping the strength between movies and tag-related properties.
* `genome_tags.csv` - user assigned tags for genome-related scores
* `imdb_data.csv` - Additional movie metadata scraped from IMDB using the links.csv file.
* `links.csv` - File providing a mapping between a MovieLens ID and associated IMDB and TMDB IDs.
* `sample_submission.csv` - Sample of the submission format for the hackathon.
* `tags.csv` - User assigned for the movies within the dataset.
* `test.csv` - The test split of the dataset. Contains user and movie IDs with no rating data.
* `train.csv` - The training split of the dataset. Contains user and movie IDs with associated rating data.

Kindly Make Reference to [Kaggle](https://www.kaggle.com/competitions/edsa-movie-recommendation-2022/data) for More Information asa concern the data.

In [12]:
#Kaggle setup
! pip install -q kaggle

from google.colab import  files
files.upload()

#Create Kaggle Folder
!mkdir ~/.kaggle
#Copy Kaggle.json file to new folder
! cp kaggle.json ~/.kaggle/
#Grant Permissions
! chmod 600  ~/.kaggle/kaggle.json
#Datasets List
! kaggle datasets list

Saving kaggle.json to kaggle.json
ref                                                            title                                             size  lastUpdated          downloadCount  voteCount  usabilityRating  
-------------------------------------------------------------  -----------------------------------------------  -----  -------------------  -------------  ---------  ---------------  
datasets/muratkokludataset/date-fruit-datasets                 Date Fruit Datasets                              408KB  2022-04-03 09:25:39           1745        326  0.9375           
datasets/piterfm/2022-ukraine-russian-war                      2022 Ukraine Russia War                            2KB  2022-04-21 09:38:14          10192        559  1.0              
datasets/muratkokludataset/acoustic-extinguisher-fire-dataset  Acoustic Extinguisher Fire Dataset               621KB  2022-04-02 22:59:36            207        286  0.9375           
datasets/kamilpytlak/personal-key-indicators-o

In [14]:
#Download dataset
! kaggle competitions download -c edsa-movie-recommendation-2022

Downloading edsa-movie-recommendation-2022.zip to /content
100% 238M/239M [00:01<00:00, 170MB/s]
100% 239M/239M [00:01<00:00, 155MB/s]


In [15]:
#Unzip datasets
! mkdir datasets
!unzip  edsa-movie-recommendation-2022.zip

Archive:  edsa-movie-recommendation-2022.zip
  inflating: genome_scores.csv       
  inflating: genome_tags.csv         
  inflating: imdb_data.csv           
  inflating: links.csv               
  inflating: movies.csv              
  inflating: sample_submission.csv   
  inflating: tags.csv                
  inflating: test.csv                
  inflating: train.csv               


In [16]:
# Load Data
genome_scores_df = pd.read_csv('genome_scores.csv')
genome_tags_df = pd.read_csv('genome_tags.csv')
imdb_data_df = pd.read_csv('imdb_data.csv')
links_df = pd.read_csv('links.csv')
movies_df = pd.read_csv('movies.csv')
tags_df = pd.read_csv(r'tags.csv')

In [17]:
train_df = pd.read_csv(r'train.csv')
test_df = pd.read_csv(r'test.csv')

In [18]:
sample_submission_df = pd.read_csv(r'sample_submission.csv')
sample_submission_df.head()

Unnamed: 0,Id,rating
0,1_2011,1.0
1,1_4144,1.0
2,1_5767,1.0
3,1_6711,1.0
4,1_7318,1.0


In [19]:
# View Dataset
train_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,5163,57669,4.0,1518349992
1,106343,5,4.5,1206238739
2,146790,5459,5.0,1076215539
3,106362,32296,2.0,1423042565
4,9041,366,3.0,833375837


In [20]:
train_df.shape

(10000038, 4)

This dataset contains 10million-plus data points of various movies and users.

    We will use three columns from the data:
        * userId
        * movieId
        * rating

# JUST EXPERIMENTING
(No Order/FORMAT yet)

Experiment Done on Google Colab due to Speed and system challanges


**Designing our Movie Recommendation System**

To obtain recommendations for our users, we will predict their `ratings` for movies they haven’t watched yet. Movies are then indexed and suggested to users based on these predicted ratings.
To do this, we will use past records of movies and user ratings to predict their future ratings. At this point, it’s worth mentioning that in the real world, we will likely encounter new users or movies without a history. Such situations are called cold start problems.

`Cold start problems` can be handled by recommendations based on meta-information, such as:
* For new users, we can use their location, age, gender, browser, and user device to predict recommendations.
* For new movies, we can use genre, cast, and crew to recommend it to target users.

**Implementation**

For our recommender system, we’ll use both of the techniques mentioned above: content-based and collaborative filtering. To find the similarity between movies for our content based method, we’ll use a cosine similarity function. For our collaborative filtering method, we’ll use a matrix factorization technique.

The first step towards this is creating a matrix factorization based model. We’ll use the output of this model and a few handcrafted features to provide inputs to the final model. The basic process will look like this:
* Step 1: Build a matrix factorization-based model
* Step 2: Create handcrafted features
* Step 3: Implement the final model

In [21]:
'''
Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. 
This family of methods became widely known during the Netflix prize challenge due to how effective it was.

To implement matrix factorization, we use a simple Python library named Surprise, 
which is for building and testing recommender systems. 
The data frame is converted into a train set, 
a format of data set to be accepted by the Surprise library.
'''

from surprise import SVD
import surprise
from surprise import Reader, Dataset

# It is to specify how to read the data frame.
reader = Reader(rating_scale=(1,5))

# create the traindata from the data frame
train_data_mf = Dataset.load_from_df(train_df[['userId', 'movieId', 'rating']], reader)

In [22]:
# build the train set from traindata. 
#It is of dataset format from surprise library
trainset = train_data_mf.build_full_trainset()

svd = SVD(n_factors=100, biased=True, random_state=15, verbose=True)
svd.fit(trainset)

Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
Processing epoch 5
Processing epoch 6
Processing epoch 7
Processing epoch 8
Processing epoch 9
Processing epoch 10
Processing epoch 11
Processing epoch 12
Processing epoch 13
Processing epoch 14
Processing epoch 15
Processing epoch 16
Processing epoch 17
Processing epoch 18
Processing epoch 19


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f57cdf92a50>

In [23]:
# Create Kaggle submission file
predictions = []
for i, row in test_df.iterrows():
    x = (svd.predict(row.userId, row.movieId))
    pred = x[3]
    predictions.append(pred)
test_df['Id'] = test_df['userId'].map(str) +'_'+ test_df['movieId'].map(str)
results = pd.DataFrame({"Id":test_df['Id'],"rating": predictions})
results.to_csv("T4_submission_1.csv", index=False)

In [25]:
# Run to Save locally
files.download('T4_submission_1.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Now the model is ready. We’ll store these predictions to pass to the final model as an additional feature. This will help us incorporate collaborative filtering into our system.

In [None]:
# #getting predictions of train set
# train_preds = svd.test(trainset.build_testset())

In [None]:
# train_pred_mf = np.array([pred.est for pred in train_preds])

**Step 2: Creating Handcrafted Features**

Let’s convert the data in the data frame format into a user-movie interaction matrix. Matrices used in this type of problem are generally sparse because there’s a high chance users may only rate a few movies.

The advantages of the sparse matrix format of data, also called CSR format, are as follows:
* efficient arithmetic operations: CSR + CSR, CSR * CSR, etc.
* efficient row slicing
* fast matrix-vector products

scipy.sparse.csr_matrix is a utility function that efficiently converts the data frame into a sparse matrix.

In [None]:
# Creating a sparse matrix
# train_sparse_matrix = sparse.csr_matrix((train_df.rating.values, (train_df.userId.values, train_df.movieId.values)))

NameError: name 'sparse' is not defined

`train_sparse_matrix` is the sparse matrix representation of the train_data data frame.

We’ll create 3 sets of features using this sparse matrix:
* Features which represent global averages
* Features which represent the top five similar users
* Features which represent the top five similar movies
* Let’s take a look at how to prepare each in more detail.

**1. Features which represent the global averages**

The three global averages we’ll employ are:
* The average ratings of all movies given by all users
* The average ratings of a particular movie given by all users
* The average ratings of all movies given by a particular user

In [None]:
# # get the global average of ratings in our train set.
# train_averages = dict()

# train_global_average = train_sparse_matrix.sum()/train_sparse_matrix.count_nonzero()
# train_averages['global'] = train_global_average
# train_averages
# # Output: {‘global’: 3.5199769425298757}

In [None]:
# """
# Next, let’s create a function which takes the sparse matrix
# as input and gives the average ratings of a movie given by all users, 
# and the average rating of all movies given by a single user.
# """
# # get the user averages in dictionary (key: user_id/movie_id, value: avg rating)
# def get_average_ratings(sparse_matrix, of_users):
#     # average ratings of user/axes
#     ax = 1 if of_users else 0    # <-- 1=User axes, 0=Movie axes

#     # ".A1" is for converting Column_Matrix to 1-D numpy array
#     sum_of_ratings = sparse_matrix.sum(axis=ax).A1
#     # Boolean matrix of ratings ( whether a user rated that movie or not)
#     is_rated = sparse_matrix!=0
#     # no of ratings that each user OR movie..
#     no_of_ratings = is_rated.sum(axis=ax).A1
#     # max_user and max_movie ids in sparse matrix
#     u,m = sparse_matrix.shape
    
#     # create a dictionary of users and their average ratings..
#     average_ratings = { i : sum_of_ratings[i]/no_of_ratings[i]
#                             for i in range(u if of_users else m)
#                             if no_of_ratings[i] !=0}
#     #return that dictionary of average ratings
#     return average_ratings

In [None]:
# """The average rating is given by a user"""

# train_averages['user'] = get_average_ratings(train_sparse_matrix, of_users=True)

In [None]:
# """Average ratings are given for a movie:"""

# train_averages['movie'] = get_average_ratings(train_sparse_matrix, of_users=False)

**2. Features which represent the top 5 similar users**

In this set of features, we will create the top 5 similar users who rated a particular movie. The similarity is calculated using the cosine similarity between the users.

In [None]:
# # compute the similar Users of the "user"
# user_sim = cosine_similarity(train_sparse_matrix[user], train_sparse_matrix).ravel()
# top_sim_users = user_sim.argsort()[::-1][1:] # we are ignoring 'The User' from its similar users.

# # get the ratings of most similar users for this movie
# top_ratings = train_sparse_matrix[top_sim_users, movie].toarray().ravel()

# # we will make it's length "5" by adding movie averages to
# top_sim_users_ratings = list(top_ratings[top_ratings != 0][:5])
# top_sim_users_ratings.extend([train_averages['movie'][movie]]*(5 -len(top_sim_users_ratings)))

**3. Features which represent the top 5 similar movies**

In this set of features, we obtain the top 5 similar movies rated by a particular user. This similarity is calculated using the cosine similarity between the movies.

In [None]:
# # compute the similar movies of the "movie"
# movie_sim = cosine_similarity(train_sparse_matrix[:,movie].T,
# train_sparse_matrix.T).ravel()
# top_sim_movies = movie_sim.argsort()[::-1][1:]

# # we are ignoring 'The User' from its similar users.
# # get the ratings of most similar movie rated by this user
# top_ratings = train_sparse_matrix[user, top_sim_movies].toarray().ravel()

# # we will make it's length "5" by adding user averages to
# top_sim_movies_ratings = list(top_ratings[top_ratings != 0][:5])
# top_sim_movies_ratings.extend([train_averages['user'][user]]*(5-len(top_sim_movies_ratings)))

We append all these features for each movie-user pair and create a data frame.

**Step 3: Creating a final model for our movie recommendation system**

To create our final model, let’s use XGBoost, an optimized distributed gradient boosting library.

In [None]:
# # prepare train data
# x_train = final_data.drop(['user', 'movie','rating'], axis=1)
# y_train = final_data['rating']
# # initialize XGBoost model
# xgb_model = xgb.XGBRegressor(silent=False, n_jobs=13,random_state=15,n_estimators=100)
# # fit the model
# xgb_model.fit(x_train, y_train, eval_metric = 'rmse')

**Performance Metrics**

There are two main ways to evaluate a recommender system’s performance: 
* Root Mean Squared Error (RMSE) which measures the squared loss and 
* Mean Absolute Percentage Error (MAPE) which measures the absolute loss. 

Lower values mean lower error rates and thus better performance.

In [None]:
# #dictionaries for storing train and test results
# test_results = dict()
# # from the trained model, get the predictions
# y_est_pred = xgb_model.predict(x_test)
# # get the rmse and mape of train data
# rmse = np.sqrt(np.mean([ (y_test.values[i] - y_test_pred[i])**2 for i in
# range(len(y_test_pred)) ]))
# mape = np.mean(np.abs( (y_test.values- y_test_pred)/y_true.values )) * 100
# # store the results in train_results dictionary
# test_results = {'rmse': rmse_test, 'mape' : mape_test, 'predictions' : y_test_pred}

<a id="four"></a>
## 4. Exploratory Data Analysis (EDA)
<a href=#cont>Back to Table of Contents</a>

This includes looking to understand patterns in our data, pinpoint any outliers and indicate relationships between variables. This phase we will be carrying out some data analysis, descriptive statistics and data visualisations, all in the bid to understand to properly fine refining the data in the feature engineering in preparation for modeling. 

Hence, let's proceed to carrying out some EDA

In [None]:
# Overview of data


In [None]:
# check to confirm count of null values


<a id="five"></a>
## 5. DATA PROCESSING
<a href=#cont>Back to Table of Contents</a>

The primary funtion of data processing is to provide Faster, higher-quality data, which is key to any successesful model building, and also enabling more valuable insights to be extracted as well. Therefore, Let's commence processing and cleaning our data.

<a id="six"></a>
## 6. Feature Engineering
<a href=#cont>Back to Table of Contents</a>

This involves preparations to make ready our data to serve those structured selected features to models upon request.

<a id="seven"></a>
## 7. Modeling
<a href=#cont>Back to Table of Contents</a>

There are several modelling techniques we can apply as classifiers, and of the vast options, we will be trying;

`1. Base Model: ` Description..... 

`2. Model 2: ` Description..... 

`3. Model 3 `Description..... 


<a id="eight"></a>
## 8. MODEL PERFORMANCE
<a href=#cont>Back to Table of Contents</a>

Here will be reviewing the individual performance of our machine learning model and why to use one in place of the other

### 8.1 Model Testing Scores


In [None]:
# Test scores 


#### 8.2 Best Model Resolution

From the Result we can conclusively say ..............

#### 8.3 Hypertune Best Model

For every model, our goal is to minimize the error or say to have classification or predictions as close as possible to actual values. This is one of the cores or say the major objective of hyperparameter tuning. 

In [None]:
# Creating a pipeline for the gridsearch

# set parameter grid
# param_grid = { }  

# hyper_best_model = Pipeline([ ])

# # Fiting data to Best Model 
# hyper_best_model.fit(X_train, y_train) 

# # predicting the fit on validation set
# y_pred = hyper_best_model.predict(X_val)  

In [None]:
# Best Model Score in %


#### 8.4 Best Model Visual Evaluation
Measuring the effectiveness and performance is what exactly the confusion matrix is design to do. So we will be putting this up bot in Numbers and visuals.

As you can see above........

<a id="nine"></a>
## 9. SAVING & EXPORTING MODEL
<a href=#cont>Back to Table of Contents</a>

Now, we don't want our models just be sitting in some jupyter notebook, at this point, Let's save results to desired format, preferrably CSV and model as a pickle file. This will be used for deployment purposes to solving real life scenerios.

#### 9.1 Export Test Prediction as CSV

In [None]:
'''
Unhash to Run 
(CTRL + /)
'''

# X_test = vect.transform(df_test['message']) 
# test_pred = hyper_best_model.predict(X_test)
# save_df = pd.DataFrame(test_pred, columns=['sentiment'])
# output=pd.DataFrame({'tweetid': df_test['tweetid']})
# submission=output.join(save_df)

'\nUnhash to Run \n(CTRL + /)\n'

In [None]:
# submission.to_csv('submission_hyper.csv', index=False)

In [None]:
# Export Model as pickle file

# model_save_path = "1.0_EDSA_T4_Content_Recommender.pkl"
# with open(model_save_path,'wb') as file:
#     pickle.dump(hyper_best_model, file)

#### 9.2 Log to Comet

In [None]:
# Create dictionaries for the data we want to log
# This had to be defined since that applied to our model is the best from the grid search.

# params ={"random_state": 42,
#          "model_type ": "LogisticsRegression",
#          "Bag of words": "Count_Vectorizer",
#          "C": 0.1,
#          "min_df": 1,
#          "max_df": 0.9,
#          "n_grams": "(1, 2)"
#         }

# nb_metrics ={"Accuracy": metrics.accuracy_score(y_val_en, y_pred),
#              "recall": metrics.recall_score(y_val_en, y_pred, average='micro'),
#              "f1": metrics.f1_score(y_val_en, y_pred, average='micro'),
#             }

# confusionmatrix = confusion_matrix(y_val_en, y_pred)

In [None]:
#log parameters and results
# experiment.log_parameters(params)
# experiment.log_metrics(nb_metrics)
# experiment.log_notebook('5.0 Advance_Classification_Notebook.ipynb', overwrite=False)
# experiment.log_confusion_matrix(labels=["News", "pro", "Neutral","Anti"], matrix=confusionmatrix)

NOTE: It is required If using comet within a jupyter notebook, to end our experiment on completion as illustrated below.

In [None]:
# STRICTLY FOR LOCAL JUPYTER NOTEBOOKS
# experiment.end()

Kindly [Go to Streamlite Webpage](http://) to test-run an actual perfromance of our model on the web.

<a id="ten"></a>
## 10. Conclusion
<a href=#cont>Back to Table of Contents</a>

   In summary .......

<a id="eleven"></a>
## 11. Recommendation
<a href=#cont>Back to Table of Contents</a>

.......

<a id="ref"></a>
## Reference Links
<a href=#cont>Back to Table of Contents</a>

* [EXPLORE Data Science Academy Resources](https://explore-datascience.net/)
* [GitHub Collab Ref.](https://github.com/)
* [Commet Collab Ref](https://www.comet.ml/) 
* [Kaggle Collab Ref](https://www.kaggle.com/competitions/edsa-movie-recommendation-2022/overview)
* [How to Build a Movie Recommendation System by Ramya Vidiyala](https://towardsdatascience.com/how-to-build-a-movie-recommendation-system-67e321339109)