## GENERATION OF MOVIE RECOMMENDATIONS

Students: Vasile-Pătrașcu Mihaela & Pătrașcu Andreea Roxana

#### Step 1: Import libraries

In [None]:
import pandas as pd
import numpy as np

#### Step 2: Load IMDB dataset

In [None]:
df = pd.read_csv('DataSchool_IMDB_Clean.csv', encoding = 'utf-8', index_col=[0])

df.head()

Unnamed: 0,Name,Date,Rate,Votes,Genre,Duration,Type,Certificate,Episodes,Nudity,Violence,Profanity,Alcohol,Frightening
0,No Time to Die,2021,7.6,107163,"Action, Adventure, Thriller",163.0,Film,PG-13,1,Mild,Moderate,Mild,Mild,Moderate
1,The Guilty,2021,6.3,64375,"Crime, Drama, Thriller",90.0,Film,R,1,No Rate,No Rate,Severe,No Rate,Moderate
2,The Many Saints of Newark,2021,6.4,27145,"Crime, Drama",120.0,Film,R,1,Moderate,Severe,Severe,Moderate,Moderate
3,Venom: Let There Be Carnage,2021,6.4,30443,"Action, Adventure, Sci-Fi",97.0,Film,PG-13,1,No Rate,Moderate,Moderate,Mild,Moderate
4,Dune,2021,8.3,84636,"Action, Adventure, Drama",155.0,Film,PG-13,1,No Rate,Moderate,No Rate,Mild,Moderate


In [None]:
df['Votes'] = df['Votes'].str.replace(',','')

df['Votes'].head()

df['Votes'] = pd.to_numeric(df['Votes'])

# df = df[df['Name'] != 'Hostel: Part II']

copy_df = df

#### Step 3: Have an overview of your data
1. df.head() 
2. df.info() 
3. df.describe() 
4. etc...

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4804 entries, 0 to 5027
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         4804 non-null   object 
 1   Date         4804 non-null   int64  
 2   Rate         4804 non-null   float64
 3   Votes        4804 non-null   int64  
 4   Genre        4804 non-null   object 
 5   Duration     4804 non-null   float64
 6   Type         4804 non-null   object 
 7   Certificate  4804 non-null   object 
 8   Episodes     4804 non-null   int64  
 9   Nudity       4804 non-null   object 
 10  Violence     4804 non-null   object 
 11  Profanity    4804 non-null   object 
 12  Alcohol      4804 non-null   object 
 13  Frightening  4804 non-null   object 
dtypes: float64(2), int64(3), object(9)
memory usage: 563.0+ KB


INSIGHTS: 
- 4804 movies and tv shows available in the dataset
- We have 14 columns in our dataset, some of them are categorical data that should be transformed into numerical ones, such as Type, genre, Certificate and the last 5 that describe movies

# Build a recommender system

A recommender system is a type of information filtering system that predicts user preferences and provides personalized recommendations. It is a system that produces a list of items (such as books, movies, music, etc.) that are most likely to be of interest to a user based on their past behavior and preferences.

- Recommender systems are required because they help to narrow the large amount of data and products available online, allowing users to discover the most relevant content for them.

- Recommender systems use data such as previous purchases, user profiles, and browsing habits to create personalized recommendations that are tailored to each individual user.

- They can help to increase customer engagement, reduce customer churn, and increase sales by providing users with content that is most likely to be of interest or use to them.

The concept of recommender systems can be traced back to the early 1990s when researchers began to explore ways to improve the search and navigation of digital libraries. The first commercial recommender system was developed by Corbis Corporation in 1993. This system, called the Corbis Recommender, used collaborative filtering to recommend images to users based on their past selections.

In 1997, Amazon released its first product recommendation engine, which used collaborative filtering to recommend books based on users’ past purchases and ratings. Since then, recommender systems have become increasingly popular, and today they are used in a variety of industries, such as retail, media, entertainment, and travel.

In the early 2000s, researchers began to explore more sophisticated methods of personalization, such as content-based filtering and hybrid recommender systems. Content-based filtering uses the attributes of an item to make recommendations, while hybrid systems combine multiple algorithms to make more accurate predictions.

In the 2010s, researchers began to explore new applications of recommender systems, such as social recommendation, personalized search, and context-aware recommendation. Social recommendation systems leverage users’ social networks to make more accurate recommendations. Personalized search systems aim to return search results that are tailored to the user’s individual interests. Context-aware recommendation systems use contextual information, such as the user’s location or the time of day, to make more accurate recommendations.

Today, recommender systems are ubiquitous and are used to recommend products, services, and content to users in a variety of industries.

In [None]:
copy_df.head()

Unnamed: 0,Name,Date,Rate,Votes,Genre,Duration,Type,Certificate,Episodes,Nudity,Violence,Profanity,Alcohol,Frightening
0,No Time to Die,2021,7.6,107163,"Action, Adventure, Thriller",163.0,Film,PG-13,1,Mild,Moderate,Mild,Mild,Moderate
1,The Guilty,2021,6.3,64375,"Crime, Drama, Thriller",90.0,Film,R,1,No Rate,No Rate,Severe,No Rate,Moderate
2,The Many Saints of Newark,2021,6.4,27145,"Crime, Drama",120.0,Film,R,1,Moderate,Severe,Severe,Moderate,Moderate
3,Venom: Let There Be Carnage,2021,6.4,30443,"Action, Adventure, Sci-Fi",97.0,Film,PG-13,1,No Rate,Moderate,Moderate,Mild,Moderate
4,Dune,2021,8.3,84636,"Action, Adventure, Drama",155.0,Film,PG-13,1,No Rate,Moderate,No Rate,Mild,Moderate


In [None]:
data = pd.concat([copy_df, copy_df["Genre"].str.split(",", expand=True)], axis=1)
data.rename(columns={0: 'Genre1', 1: 'Genre2', 2: 'Genre3'}, inplace=True)
# data.drop('Genre', axis = 1, inplace = True)
data['Duration'] = data['Duration'].astype(int)
data['Genre2'].fillna(value='No Genre', inplace=True)
data['Genre3'].fillna(value='No Genre', inplace=True)
data.head()

Unnamed: 0,Name,Date,Rate,Votes,Genre,Duration,Type,Certificate,Episodes,Nudity,Violence,Profanity,Alcohol,Frightening,Genre1,Genre2,Genre3
0,No Time to Die,2021,7.6,107163,"Action, Adventure, Thriller",163,Film,PG-13,1,Mild,Moderate,Mild,Mild,Moderate,Action,Adventure,Thriller
1,The Guilty,2021,6.3,64375,"Crime, Drama, Thriller",90,Film,R,1,No Rate,No Rate,Severe,No Rate,Moderate,Crime,Drama,Thriller
2,The Many Saints of Newark,2021,6.4,27145,"Crime, Drama",120,Film,R,1,Moderate,Severe,Severe,Moderate,Moderate,Crime,Drama,No Genre
3,Venom: Let There Be Carnage,2021,6.4,30443,"Action, Adventure, Sci-Fi",97,Film,PG-13,1,No Rate,Moderate,Moderate,Mild,Moderate,Action,Adventure,Sci-Fi
4,Dune,2021,8.3,84636,"Action, Adventure, Drama",155,Film,PG-13,1,No Rate,Moderate,No Rate,Mild,Moderate,Action,Adventure,Drama


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4804 entries, 0 to 5027
Data columns (total 17 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         4804 non-null   object 
 1   Date         4804 non-null   int64  
 2   Rate         4804 non-null   float64
 3   Votes        4804 non-null   int64  
 4   Genre        4804 non-null   object 
 5   Duration     4804 non-null   int64  
 6   Type         4804 non-null   object 
 7   Certificate  4804 non-null   object 
 8   Episodes     4804 non-null   int64  
 9   Nudity       4804 non-null   object 
 10  Violence     4804 non-null   object 
 11  Profanity    4804 non-null   object 
 12  Alcohol      4804 non-null   object 
 13  Frightening  4804 non-null   object 
 14  Genre1       4804 non-null   object 
 15  Genre2       4804 non-null   object 
 16  Genre3       4804 non-null   object 
dtypes: float64(1), int64(4), object(12)
memory usage: 675.6+ KB


In [None]:
#filtering the data here
data = data[data['Rate']>3]

In [None]:
#Filtering the ratings where movies have gotten atleast 100 ratings by the users.
data = data[data['Votes']>100]

In [None]:
data.shape

(4623, 17)

Content-based recommender systems are a type of recommendation system that uses the characteristics of an item to recommend other items with similar characteristics.

This system uses item metadata such as genre, category, description, etc., to recommend items to users.
It recommends items that are similar to what a user has liked in the past. This type of recommender system is often used in retail, movies, music, and books.

In [None]:
df2=data[["Name", "Genre"]]

In [None]:
df2.head()

Unnamed: 0,Name,Genre,Rate
0,No Time to Die,"Action, Adventure, Thriller",7.6
1,The Guilty,"Crime, Drama, Thriller",6.3
2,The Many Saints of Newark,"Crime, Drama",6.4
3,Venom: Let There Be Carnage,"Action, Adventure, Sci-Fi",6.4
4,Dune,"Action, Adventure, Drama",8.3


In [None]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=10000,stop_words='english')
vector = cv.fit_transform(df2['Genre']).toarray()

In [None]:
vector.shape

(4623, 30)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(vector)
similarity

array([[1.        , 0.33333333, 0.        , ..., 0.33333333, 0.        ,
        0.57735027],
       [0.33333333, 1.        , 0.81649658, ..., 1.        , 0.33333333,
        0.28867513],
       [0.        , 0.81649658, 1.        , ..., 0.81649658, 0.40824829,
        0.        ],
       ...,
       [0.33333333, 1.        , 0.81649658, ..., 1.        , 0.33333333,
        0.28867513],
       [0.        , 0.33333333, 0.40824829, ..., 0.33333333, 1.        ,
        0.        ],
       [0.57735027, 0.28867513, 0.        , ..., 0.28867513, 0.        ,
        1.        ]])

In [None]:
# !pip3 install fuzzywuzzy
from fuzzywuzzy import process

In [None]:
def recommend(movie):
    index = process.extractOne(movie , df2['Genre'])[2]  
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[0:6]:
        print(df2.iloc[i[0]].Name)
        
    
recommend('Brooklin Nine-Nine')

This Means War
Mr. Right
Friends
The Big Bang Theory
How I Met Your Mother
The Duff
