## Collaborative Filtering based Recommender System
Collaborative filtering is probably the most commonly used recommendation algorithm, there are two main types of methods:

- User-based collaborative filtering is based on the user similarity or neighborhood
- Item-based collaborative filtering is based on similarity among items
They both work similarly but for this project we would be using User-Based collaborative filtering.

User-based collaborative filtering looks for users who are similar. This is very similar to the user clustering method where we employed explicit user profiles to calculate user similarity. However, the user profiles may not be available, so how can we determine if two users are similar?

User-item interaction matrix
For most collaborative filtering-based recommender systems, the main dataset format is a 2-D matrix called the user-item interaction matrix. In the matrix, its row is labeled as the user id/index and column labelled to be the item id/index, and the element (i, j) represents the rating of user i to item j.

A user profile can be seen as the user feature vector that mathematically represents a user's learning interests.

#### About Dataset

This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in the files `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows.

This and other GroupLens data sets are publicly available for download at <http://grouplens.org/datasets/>.

### Building the recommender system

Import necessary libraries

In [1]:
from surprise import Dataset, Reader, SVD, KNNBasic
from surprise.model_selection import train_test_split
from surprise import accuracy
import pandas as pd
import pickle

Load movies dataframe `movies.csv`

In [2]:
movies_df = pd.read_csv('movies.csv')
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


Load user's ratings dataframe `ratings.csv`

In [3]:
ratings_df = pd.read_csv('ratings.csv')
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


Load the ratings dataset into a Surprise Dataset

In [4]:
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings_df[['userId', 'movieId', 'rating']], reader)

Split the dataset into train and test sets

In [5]:
trainset, testset = train_test_split(data, test_size=0.2)

#### Using SVD algorithm

In [6]:
# Instantiate
SVD_algor = SVD()

# Train the model
SVD_algor.fit(trainset)

# Make predictions for testset
predictions = SVD_algor.test(testset)

# Evaluate the model
svd_rmse = accuracy.rmse(predictions)

RMSE: 0.8770


#### Using KNNBasic algorithm

In [7]:
# Instantiate
KNN_algor = KNNBasic(sim_options={'user_based': True})  # User-based collaborative filtering

# Train the model
KNN_algor.fit(trainset)

# Make predictions for testset
predictions = KNN_algor.test(testset)

# Evaluate the model
knn_rmse = accuracy.rmse(predictions)

Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9517


#### Results

In [8]:
results = {
    "SVD":[svd_rmse],
    "KNN": [knn_rmse]
          }
# Convert the dictionary to a DataFrame
results_df = pd.DataFrame(results).T
results_df = results_df.rename(columns={0:"RMSE"})
results_df.sort_values(by='RMSE', ascending=False)

Unnamed: 0,RMSE
KNN,0.951742
SVD,0.87702


From the results, we can see that the KNN algorithm performed best with a RMSE of 0.94225

#### Save the model

In [9]:
# Save the KNN model to a file
file_name = 'knn_model.pkl'
with open(file_name, 'wb') as file:
    pickle.dump(KNN_algor, file)

### Webscrape image_url and extract url from data in links.csv

In [None]:
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm

import warnings
warnings.filterwarnings('ignore')

##### Load links dataframe `links.csv`

In [None]:
# load 
link_df = pd.read_csv('links.csv')
link_df.head()

Check information of the dataframes

In [None]:
link_df.info()

In [None]:
# Create empty columns with null values
link_df['url'] = np.nan
link_df['img_url'] = np.nan


for idx, tmdbId in tqdm(enumerate(link_df['tmdbId']), total = len(link_df['tmdbId'])):
    try:
        # Get url
        url = 'https://www.themoviedb.org/movie/' + str(tmdbId)
        link_df['url'][idx] = url

        # assign the response to a object
        response = requests.get(url)

        # Use BeautifulSoup() to create a BeautifulSoup object from a response text content
        soup = BeautifulSoup(response.text, 'html.parser')

        # Find image container
        obj = soup.find('div', 'image_content backdrop').img

        # Get image url
        image_url = 'https://www.themoviedb.org' + obj.get('data-src')

        # Link image url
        link_df['img_url'][idx] = image_url
    except AttributeError:
        image_url = 'https://www.firstcolonyfoundation.org/wp-content/uploads/2022/01/no-photo-available.jpeg'
        link_df['img_url'][idx] = image_url

##### Save link_df

In [None]:
# Save link_df
link_df.to_csv('link_df.csv', index=False)

###  Make recommendations for a specific user

Import necessary libraries

In [10]:
import pandas as pd
import pickle

Load movies dataframe movies.csv

In [11]:
movies_df = pd.read_csv('movies.csv')

Load user's ratings dataframe ratings.csv

In [12]:
ratings_df = pd.read_csv('ratings.csv')
ratings_df.head(3)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224


Load links dataframe ratings.csv

In [13]:
links_df = pd.read_csv('link_df.csv')
links_df.head(3)

Unnamed: 0,movieId,imdbId,tmdbId,url,img_url
0,1,114709,862.0,https://www.themoviedb.org/movie/862.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
1,2,113497,8844.0,https://www.themoviedb.org/movie/8844.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
2,3,113228,15602.0,https://www.themoviedb.org/movie/15602.0,https://www.themoviedb.org/t/p/w300_and_h450_b...


Merge them on the 'movieId' column

In [14]:
merged_df = pd.merge(movies_df, links_df, on='movieId')
merged_df.head(3)

Unnamed: 0,movieId,title,genres,imdbId,tmdbId,url,img_url
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,114709,862.0,https://www.themoviedb.org/movie/862.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
1,2,Jumanji (1995),Adventure|Children|Fantasy,113497,8844.0,https://www.themoviedb.org/movie/8844.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
2,3,Grumpier Old Men (1995),Comedy|Romance,113228,15602.0,https://www.themoviedb.org/movie/15602.0,https://www.themoviedb.org/t/p/w300_and_h450_b...


In [15]:
# Save merged_df
merged_df.to_csv('merged_df.csv', index=False)

Load the k-NN Model:

In [16]:
# Load the saved model from the file
file_name = 'knn_model.pkl'
with open(file_name, 'rb') as file:
    loaded_model = pickle.load(file)

Example of using the loaded model for prediction

In [17]:
user_id = 2
user_movies = ratings_df[ratings_df['userId'] == user_id]['movieId'].unique()
unrated_movies = merged_df[~merged_df['movieId'].isin(user_movies)]

# Get top-10 movie recommendations for the user
k = 10
user_recommendations = loaded_model.get_neighbors(user_id, k=k)
recommended_movies = unrated_movies[unrated_movies['movieId'].isin(user_recommendations)]

recommended_movies

Unnamed: 0,movieId,title,genres,imdbId,tmdbId,url,img_url
16,17,Sense and Sensibility (1995),Drama|Romance,114388,4584.0,https://www.themoviedb.org/movie/4584.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
271,312,Stuart Saves His Family (1995),Comedy,114571,28033.0,https://www.themoviedb.org/movie/28033.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
285,327,Tank Girl (1995),Action|Comedy|Sci-Fi,114614,9067.0,https://www.themoviedb.org/movie/9067.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
297,339,While You Were Sleeping (1995),Comedy|Romance,114924,2064.0,https://www.themoviedb.org/movie/2064.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
409,471,"Hudsucker Proxy, The (1994)",Comedy,110074,11934.0,https://www.themoviedb.org/movie/11934.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
471,538,Six Degrees of Separation (1993),Drama,108149,23210.0,https://www.themoviedb.org/movie/23210.0,https://www.themoviedb.org/t/p/w300_and_h450_b...
492,567,Kika (1993),Comedy|Drama,107315,8223.0,https://www.themoviedb.org/movie/8223.0,https://www.themoviedb.org/t/p/w300_and_h450_b...


###
#### ***------------------------------------------------THE END!!!----------------------------------------------***
# Author

## [Emuejevoke Eshemitan](https://www.linkedin.com/in/emuejevoke-eshemitan/)
