<a href="https://colab.research.google.com/github/alexoliveros92/Recommendation_engine/blob/main/CA2_DM_recommendationEngine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
from collections import Counter
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer

## Movies recommendation based on ratings and description

In [2]:
netFlxDts = pd.read_csv("/content/NetflixCleanedDataset.csv")
netFlxDts.head(10)

Unnamed: 0,movie_ID,title,year,certificate,duration_min,genre,Tag,rating,description,stars,votes
0,1,Cobra Kai,2018,TV-14,30,Action,"Action, Comedy, Drama",8.5,Decades after their 1984 All Valley Karate Tou...,"['Ralph Macchio','William Zabka','Courtney Hen...",177031
1,2,The Crown,2016,TV-MA,58,Biography,"Biography, Drama, History",8.7,Follows the political rivalries and romance of...,"['Claire Foy','Olivia Colman','Imelda Staunton...",199885
2,3,Better Call Saul,2022,TV-MA,46,Crime,"Crime, Drama",8.9,The trials and tribulations of criminal lawyer...,"['Bob Odenkirk','Rhea Seehorn','Jonathan Banks...",501384
3,4,Devil in Ohio,2022,TV-MA,356,Drama,"Drama, Horror, Mystery",5.9,When a psychiatrist shelters a mysterious cult...,"['Emily Deschanel','Sam Jaeger','Gerardo Celas...",9773
4,5,Cyberpunk: Edgerunners,2022,TV-MA,24,Animation,"Animation, Action, Adventure",8.6,A Street Kid trying to survive in a technology...,"['Zach Aguilar','Kenichiro Ohashi','Emi Lo','A...",15413
5,6,The Sandman,2022,TV-MA,45,Drama,"Drama, Fantasy, Horror",7.8,Upon escaping after decades of imprisonment by...,"['Tom Sturridge','Boyd Holbrook','Patton Oswal...",116358
6,7,Rick and Morty,2013,TV-MA,23,Animation,"Animation, Adventure, Comedy",9.2,An animated series that follows the exploits o...,"['Justin Roiland','Chris Parnell','Spencer Gra...",502160
7,8,Breaking Bad,2013,TV-MA,49,Crime,"Crime, Drama, Thriller",9.5,A high school chemistry teacher diagnosed with...,"['Bryan Cranston','Aaron Paul','Anna Gunn','Be...",1831340
8,9,The Imperfects,2022,TV-MA,45,Action,"Action, Adventure, Drama",6.3,After an experimental gene therapy turns them ...,"['Morgan Taylor Campbell','Italia Ricci','Rhia...",3123
9,10,Blonde,2022,NC-17,166,Biography,"Biography, Drama, Mystery",6.2,A fictionalized chronicle of the inner life of...,"['Andrew Dominik','Ana de Armas','Lucy DeVito'...",935


In [3]:
# Droping rows with missing values in the 'description' and 'rating' columns
netFlxDts = netFlxDts.dropna(subset=['description', 'rating'])

In [4]:
# Create a TF-IDF vectorizer to convert movie descriptions to vectors
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(netFlxDts['description'])

- TfidVectorizer is used to compare the repetation of words in perticular documnets. In our code, it is being consumed to utilize the description.

- fit_transform method converts the movie descriptions in the DataFrame into a TF-IDF matrix. Each row of the matrix represents a movie, and each column represents a unique word in the descriptions.

In [5]:
## Remove duplicate values based on titles
netFlxDts = netFlxDts.drop_duplicates(subset=['title'])

In [6]:
# Compute the cosine similarity matrix based on movie descriptions
cosine_sim = cosine_similarity(tfidf_matrix)

- Cosine Similarity is using to check the simmilarity between others tfid matrix. Based on the simmilarity we can get the movies recommendation.

- The cosine similarity between two vectors is the cosine of the angle between them in a multidimensional space. It is a measure of the similarity between the two vectors, ranging from -1 (opposite directions) to 1 (same direction), with 0 indicating no correlation between the vectors.

In [7]:

# Create a dictionary to store the movie titles and their corresponding indices in the DataFrame
indices = pd.Series(netFlxDts.index, index=netFlxDts['title']).drop_duplicates()


This line constructs a reverse mapping of movie titles and indices. The resulting indices Series object has movie titles as its index and movie indices as its values. This mapping will be used later to look up the index of a movie given its title.

In [8]:
def get_recommendations(title):
    # Get all the indices of the movies that match the title
    indices = netFlxDts.index[netFlxDts['title'] == title].tolist()

    # Get the pairwise similarity scores for all movies with those indices
    sim_scores = []
    for idx in indices:
        sim_scores += list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the indices of the top 10 most similar movies
    movie_indices = [i[0] for i in sim_scores[1:11] if i[0] < len(netFlxDts)]

    # Return the top 10 most similar movies
    return netFlxDts.sort_values(['rating', 'title'], ascending=[False, True]).head(10)

In [9]:
recommended_movies = get_recommendations('Breaking Bad')
recommended_movies

Unnamed: 0,movie_ID,title,year,certificate,duration_min,genre,Tag,rating,description,stars,votes
17,18,1899,2022,Unrated,60,Drama,"Drama, History, Horror",9.6,Multinational immigrants traveling from the ol...,"['Ben Ashenden','Aneurin Barnard','Emily Beech...",853
7,8,Breaking Bad,2013,TV-MA,49,Crime,"Crime, Drama, Thriller",9.5,A high school chemistry teacher diagnosed with...,"['Bryan Cranston','Aaron Paul','Anna Gunn','Be...",1831340
3220,3230,Elesin Oba: The King's Horseman,2022,Unrated,96,Adventure,"Adventure, Drama, History",9.4,"Inspired by true life events, in the Oyo Empir...","['Biyi Bandele','Odunlade Adekola','Shaffy Bel...",72
4638,4664,Story Time Book: Read-Along,2022,Unrated,70,Animation,Animation,9.4,Kids can read along with illustrated books tha...,"['Lileina Joy','Maya Aoki Tuttle','Emily Wold'...",16
192,194,Avatar: The Last Airbender,2008,TV-Y7-FV,23,Animation,"Animation, Action, Adventure",9.3,"In a war-torn world of elemental magic, a youn...","['Dee Bradley Baker','Zach Tyler Eisen','Mae W...",309241
781,784,Cosmos: A Spacetime Odyssey,2014,TV-PG,557,Documentary,Documentary,9.3,An exploration of our discovery of the laws of...,"['Neil deGrasse Tyson','Christopher Emerson','...",121400
1388,1392,Our Planet,2019,TV-G,403,Documentary,Documentary,9.3,Documentary series focusing on the breadth of ...,['David Attenborough'],43175
1193,1197,Reply 1988,2016,Unrated,90,Comedy,"Comedy, Drama, Family",9.2,Follows the lives of 5 families living on the ...,"['Hyeri Lee','Go Kyung-Pyo','Ryu Jun-Yeol','Pa...",7286
6,7,Rick and Morty,2013,TV-MA,23,Animation,"Animation, Adventure, Comedy",9.2,An animated series that follows the exploits o...,"['Justin Roiland','Chris Parnell','Spencer Gra...",502160
4340,4364,CM101MMXI Fundamentals,2013,Unrated,139,Documentary,"Documentary, Comedy",9.1,The funny little details of everyday life; the...,"['Murat Dündar','Cem Yilmaz']",46282
