<a href="https://colab.research.google.com/github/Username1234jj/Movie-recommendation-engine/blob/main/Movie_Recommendation_Engine_Collaborative_Filter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
rounakbanik_the_movies_dataset_path = kagglehub.dataset_download('rounakbanik/the-movies-dataset')

print('Data source import complete.')


**What is a Recommendation System?**

Simply put a Recommendation System is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. In our case, this domain-specific item is a movie, therefore the main focus of our recommendation system is to filter and predict only those movies which a user would prefer given some data about the user him or herself.

**What are the different filtration strategies?**

![](https://editor.analyticsvidhya.com/uploads/88506recommendation%20system.png)

**Collaborative Filtering**

This filtration strategy is based on the combination of the user’s behavior and comparing and contrasting that with other users’ behavior in the database. The history of all users plays an important role in this algorithm. The main difference between content-based filtering and collaborative filtering that in the latter, the interaction of all users with the items influences the recommendation algorithm while for content-based filtering only the concerned user’s data is taken into account.
There are multiple ways to implement collaborative filtering but the main concept to be grasped is that in collaborative filtering multiple user’s data influences the outcome of the recommendation. and doesn’t depend on only one user’s data for modeling.

**Let’s start coding up our own Movie recommendation system**

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-bright')
from sklearn.neighbors import NearestNeighbors
from fuzzywuzzy import process
%matplotlib inline

In [None]:
ratings_data = pd.read_csv("../input/the-movies-dataset/ratings_small.csv")
ratings_data = ratings_data.drop('timestamp', axis = 1)
ratings_data.head()

In [None]:
movie_names = pd.read_csv("../input/the-movies-dataset/movies_metadata.csv")
movie_names = movie_names[['title', 'genres']]
movie_names.head()

In [None]:
movie_names.info()

In [None]:
# movie_data = ratings_data.merge(movie_names, on='movieId')
movie_data = pd.concat([ratings_data, movie_names], axis=1)
movie_data.head()

In [None]:
trend = pd.DataFrame(movie_data.groupby('title')['rating'].mean())
trend['total number of ratings'] = pd.DataFrame(movie_data.groupby('title')['rating'].count())

trend.head()

In [None]:
#plot rounded-up ratings with number of movies
plt.figure(figsize =(10, 4))
ax=plt.barh(trend['rating'].round(),trend['total number of ratings'],color='b')
plt.show()

In [None]:
# Calculate mean rating of all movies and check the popular high rating movies
movie_data.groupby('title')['rating'].mean().sort_values(ascending=False).head(10)

**Pivoting**

Pivot Table : A pivot table is a table of statistics that summarizes the data of a more extensive table. This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way. Pivot tables are a technique in data processing.

![](https://pandas.pydata.org/pandas-docs/stable/_images/reshaping_pivot.png)

In [None]:
movies_users = ratings_data.pivot(index=['userId'], columns=['movieId'], values='rating').fillna(0)
movies_users

**Removing sparsity**

Our final_dataset has dimensions of 671 × 9066 where most of the values are sparse. We are using only a small dataset but for the original large dataset of movie lens which has more than 100000 features, our system may run out of computational resources when that is feed to the model. To reduce the sparsity we use the csr_matrix function from the scipy library.

In [None]:
from scipy.sparse import csr_matrix
mat_movies_users=csr_matrix(movies_users.values)
mat_movies_users

In [None]:
# Cosine Similarity
model_knn= NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)

In [None]:
model_knn.fit(mat_movies_users)

In [None]:
def Recommender(movie_name, data, model, n_recommendations):
    model.fit(data)
    movie_index = process.extractOne(movie_name, movie_names['title'])[2]
    print('Movie Selected: ',movie_names['title'][movie_index], ', Index: ', movie_index)
    print('Searching for recommendations.....')
    distances, indices = model.kneighbors(data[movie_index], n_neighbors=n_recommendations)
    recc_movie_indices = sorted(list(zip(indices.squeeze().tolist(),distances.squeeze().tolist())),key=lambda x: x[1])[:0:-1]
    recommend_frame = []
    for val in recc_movie_indices:
#         print(movie_names['title'][val[0]])
        recommend_frame.append({'Title':movie_names['title'][val[0]],'Distance':val[1]})

    df = pd.DataFrame(recommend_frame, index = range(1,n_recommendations))

    return df

In [None]:
n_recommendations = 20
Recommender('Jumanji', mat_movies_users, model_knn, n_recommendations)

**Summary**

The above recommendation system uses item to item collaborative approach. This system is the simpleset implementation of recommendation system and need a lot of tuning.Also the system's first recommendations will always be the popular movies which are voted by nore people so a change can also be made there.