## **Movies Recommendation Systems Projects**

### **Definition of Recommendation System:**
- A recommendation system is a type of information filtering system that predicts the preferences or interests of a user and provides recommendations accordingly. 

- These systems are widely used in various domains such as e-commerce, social media, entertainment, and more to enhance user experience and engagement. 

### **Here are some common types of recommendation systems:**

#### **1.Content-Based Filtering:**
- Content-Based Filtering recommends items to users based on the attributes or features of the items and the user's historical preferences.

- It analyzes the content of the items and recommends items with similar content to those the user has liked in the past.

#### **2.Collaborative Filtering:**
- Collaborative Filtering (CF) recommends items based on the preferences of users who have
similar tastes to the target user. 

 - Types of Collaborative Filtering:
    - a.User-Based Collaborative Filtering: 

                ----> Recommends items to a user based on the preferences of similar users.
                
    - b.Item-Based Collaborative Filtering: 

                ----> Recommends items similar to those previously liked or interacted with by the user.

#### **3.Hybrid Recommender Systems:**
- Hybrid Recommender Systems combine multiple recommendation approaches (e.g., collaborative filtering and content-based filtering) to provide more accurate and diverse recommendations.

- They leverage the strengths of different techniques to overcome the limitations of individual methods and improve recommendation quality.

#### **4.Knowledge-Based Recommender Systems:**
- Knowledge-Based Recommender Systems recommend items based on explicit knowledge about user preferences and item characteristics.

#### **5.Context-Aware Recommender Systems:**
- Context-Aware Recommender Systems consider additional contextual information such as time, location, device, and user behavior to provide personalized recommendations.
           
#### **6.Matrix Factorization Methods:**
- These methods are effective in handling sparse and high-dimensional data and are commonly used in collaborative filtering-based recommendation systems.   

#### Let's Start Our Project: "Movies Recommendation System

**Here we uses " 1.Content based filtering " types of Recommendation System for these project.**
#### **Our Project goes like this:**

#### **1.Data ----> 2.preprocessing ----> 3.model ----> 4.website -----> 5.Deployment(Optional)**
#### 1. Data
- Data are the individual pieces of information that are gathered, recorded, or stored for analysis, interpretation, or processing. 
- Data can take various forms, including numbers, text, images, audio, video, and more.

##### **Data can be classified into two main types:**
- **a. Structured Data:**
   - Structured data is organized in a predefined format, typically stored in tables with rows and columns.

   - Examples of structured data include database records, spreadsheets, and CSV files.
             
- **b. Unstructured Data:**
   - Unstructured data lacks a predefined format and organization, making it more challenging to process and analyze.
   - Examples of unstructured data include text documents, images, audio recordings, and video files.
   - Unstructured data may require specialized techniques such as natural language processing (NLP) or computer vision to extract meaningful information.

- **c. Semi-structured Data:**
   - Semi-structured data has some organization but does not conform to the rigid structure of structured data. 
   - Examples of semi-structured data include XML files, JSON documents, and NoSQL databases.

### **Step 1: Load and import necessary libraries**

In [128]:
# Load and import necessary libraries
import numpy as np 
import pandas as pd 

# Load the movie data from a CSV file into a Pandas DataFrame
movies = pd.read_csv("dataset/tmdb_5000_movies.csv")

# Load the credits data from a CSV file into a Pandas DataFrame
credits = pd.read_csv("dataset/tmdb_5000_credits.csv")

In [129]:
# Display the first few rows of the movies DataFrame
movies.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800


In [130]:
# Display the first few rows of the credits DataFrame
credits.head(1)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [131]:
# Merge the two DataFrames based on the 'title' 
movies = movies.merge(credits, on="title")

In [132]:
# Display the first few rows of the merged DataFrame to ensure the merge was successful
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [133]:
# print the information about datasets
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

In [134]:
# genres
# id
# keyword
# title
# overview
# cast
# crew

# selects the relevant columns needed for the recommendation system
movies = movies[["movie_id", "title", "overview", "genres", "keywords", "cast", "crew"]]

In [135]:
# print first few rows of the merged DataFrame
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


### **Step 2: Preproccesing**
- Preprocessing refers to the steps and techniques applied to raw data before it is used for analysis, modeling, or any other downstream tasks. 

- The goal of preprocessing is to transform raw data into a clean, structured format that is suitable for further analysis or machine learning algorithms. 

- Preprocessing often involves tasks such as cleaning, transforming, and organizing the data

- In the context of recommendation systems, preprocessing typically involves cleaning and transforming the data to handle missing values,outliers, and sparsity, as well as encoding categorical variables, normalizing numerical features, and splitting the data into training and testing sets.

In [136]:
# Handling missing values
movies.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64

In [137]:
# Drop rows with missing values
movies.dropna(inplace = True)

In [138]:
# Check wheather missing value is present or not
movies.isnull().sum()

movie_id    0
title       0
overview    0
genres      0
keywords    0
cast        0
crew        0
dtype: int64

In [139]:
# Check for duplicated rows
movies.duplicated().sum()

0

In [140]:
# Access the 'genres' column value for the first row (index 0) of the 'movies' DataFrame.
movies.iloc[0].genres  

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [141]:
# '[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
# ["Action", "Adventure", "Fantasy", "Science Fiction"]

In [142]:
# Access the entire 'genres' column from the 'movies' DataFrame
movies["genres"]

0       [{"id": 28, "name": "Action"}, {"id": 12, "nam...
1       [{"id": 12, "name": "Adventure"}, {"id": 14, "...
2       [{"id": 28, "name": "Action"}, {"id": 12, "nam...
3       [{"id": 28, "name": "Action"}, {"id": 80, "nam...
4       [{"id": 28, "name": "Action"}, {"id": 12, "nam...
                              ...                        
4804    [{"id": 28, "name": "Action"}, {"id": 80, "nam...
4805    [{"id": 35, "name": "Comedy"}, {"id": 10749, "...
4806    [{"id": 35, "name": "Comedy"}, {"id": 18, "nam...
4807                                                   []
4808                  [{"id": 99, "name": "Documentary"}]
Name: genres, Length: 4806, dtype: object

#### **Function to Convert Text to List**


In [143]:
# Import the Abstract Syntax Tree module for parsing strings
import ast

# Define a function to convert text into a list of names
def convert(text):
    L = []
    # Safely evaluate strings containing Python expressions and convert to a list
    for i in ast.literal_eval(text):  
        # Append the 'name' from each dictionary in the list
        L.append(i['name'])  
    return L

In [144]:
# Convert genres from strings to a list of genre names
movies["genres"] = movies["genres"].apply(convert)
movies["genres"]

0       [Action, Adventure, Fantasy, Science Fiction]
1                        [Adventure, Fantasy, Action]
2                          [Action, Adventure, Crime]
3                    [Action, Crime, Drama, Thriller]
4                [Action, Adventure, Science Fiction]
                            ...                      
4804                        [Action, Crime, Thriller]
4805                                [Comedy, Romance]
4806               [Comedy, Drama, Romance, TV Movie]
4807                                               []
4808                                    [Documentary]
Name: genres, Length: 4806, dtype: object

In [145]:
# Convert keywords from strings to a list of keyword names
movies["keywords"] = movies["keywords"].apply(convert)
movies["keywords"]

0       [culture clash, future, space war, space colon...
1       [ocean, drug abuse, exotic island, east india ...
2       [spy, based on novel, secret agent, sequel, mi...
3       [dc comics, crime fighter, terrorist, secret i...
4       [based on novel, mars, medallion, space travel...
                              ...                        
4804    [united states–mexico barrier, legs, arms, pap...
4805                                                   []
4806    [date, love at first sight, narration, investi...
4807                                                   []
4808            [obsession, camcorder, crush, dream girl]
Name: keywords, Length: 4806, dtype: object

In [146]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [147]:
# Define a function to fetch the first three cast members
def fetch_cast(text):
    L = []
    counter = 0
    # Safely evaluate strings containing Python expressions
    for i in ast.literal_eval(text):  
        # Append the first three cast members' names
        if counter != 3:  
            L.append(i['name'])  
            counter += 1
        else:
            break  # Stop after three cast members
    return L

In [148]:
# Convert cast from strings to a list of the first 3 cast names
movies["cast"] = movies["cast"].apply(fetch_cast)
movies["cast"]

0        [Sam Worthington, Zoe Saldana, Sigourney Weaver]
1           [Johnny Depp, Orlando Bloom, Keira Knightley]
2            [Daniel Craig, Christoph Waltz, Léa Seydoux]
3            [Christian Bale, Michael Caine, Gary Oldman]
4          [Taylor Kitsch, Lynn Collins, Samantha Morton]
                              ...                        
4804    [Carlos Gallardo, Jaime de Hoyos, Peter Marqua...
4805         [Edward Burns, Kerry Bishé, Marsha Dietlein]
4806           [Eric Mabius, Kristin Booth, Crystal Lowe]
4807            [Daniel Henney, Eliza Coupe, Bill Paxton]
4808    [Drew Barrymore, Brian Herzlinger, Corey Feldman]
Name: cast, Length: 4806, dtype: object

In [149]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [150]:
# Access the 'crew' column for the first row (index 0) of the 'movies' DataFrame.

movies["crew"][0]  # but we have to find "job" = "Director" only so


'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cameron"},

In [151]:
# Define a function to fetch the director's name from the crew data
def fetch_director(text):
    L = []
    # Safely evaluate strings containing Python expressions
    for i in ast.literal_eval(text):  
        # Check if the job title is 'Director'
        if i['job'] == 'Director':  
            # Append the director's name to the list
            L.append(i['name'])  
            break  # Stop after finding the director
    return L

In [152]:
# Convert crew from strings to the director's name
movies["crew"] = movies["crew"].apply(fetch_director)
movies["crew"]

0           [James Cameron]
1          [Gore Verbinski]
2              [Sam Mendes]
3       [Christopher Nolan]
4          [Andrew Stanton]
               ...         
4804     [Robert Rodriguez]
4805         [Edward Burns]
4806          [Scott Smith]
4807          [Daniel Hsia]
4808     [Brian Herzlinger]
Name: crew, Length: 4806, dtype: object

In [153]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes]
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]


In [154]:
# Access the 'overview' column for the first row (index 0) of the 'movies' DataFrame.
movies["overview"][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.'

#### **Text Processing**


In [155]:
# Split the overview into a list of words
movies["overview"] = movies["overview"].apply(lambda x: x.split())
movies["overview"]

0       [In, the, 22nd, century,, a, paraplegic, Marin...
1       [Captain, Barbossa,, long, believed, to, be, d...
2       [A, cryptic, message, from, Bond’s, past, send...
3       [Following, the, death, of, District, Attorney...
4       [John, Carter, is, a, war-weary,, former, mili...
                              ...                        
4804    [El, Mariachi, just, wants, to, play, his, gui...
4805    [A, newlywed, couple's, honeymoon, is, upended...
4806    ["Signed,, Sealed,, Delivered", introduces, a,...
4807    [When, ambitious, New, York, attorney, Sam, is...
4808    [Ever, since, the, second, grade, when, he, fi...
Name: overview, Length: 4806, dtype: object

In [157]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]


In [158]:
# Remove spaces from genre names
movies['genres'] = movies['genres'].apply(lambda x: [i.replace(" ", "") for i in x]) 

# Remove spaces from keyword names
movies['keywords'] = movies['keywords'].apply(lambda x: [i.replace(" ", "") for i in x])  

# Remove spaces from cast names
movies['cast'] = movies['cast'].apply(lambda x: [i.replace(" ", "") for i in x])  

# Remove spaces from crew names
movies['crew'] = movies['crew'].apply(lambda x: [i.replace(" ", "") for i in x])  

In [159]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


#### **Creating the 'Tags' Column which is our main columns**

In [160]:
# Combine relevant columns to create 'tags'
movies["tags"] = movies["overview"] + movies["genres"] + movies["keywords"] + movies["cast"] + movies["crew"]

In [161]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."


In [162]:
# Create a new DataFrame with movie_id, title, and tags
new_df = movies[["movie_id", "title", "tags"]]
new_df

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4804,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4805,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4806,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4807,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


In [163]:
# Convert the list of tags into a single string
new_df["tags"] = new_df["tags"].apply(lambda x: " ".join(x))  # new_df.loc[:, "tags"] = new_df["tags"].apply(lambda x: " ".join(x))


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["tags"] = new_df["tags"].apply(lambda x: " ".join(x))  # new_df.loc[:, "tags"] = new_df["tags"].apply(lambda x: " ".join(x))


In [165]:
# Access the 'tags' column from the 'new_df' DataFrame.
new_df["tags"]


0       In the 22nd century, a paraplegic Marine is di...
1       Captain Barbossa, long believed to be dead, ha...
2       A cryptic message from Bond’s past sends him o...
3       Following the death of District Attorney Harve...
4       John Carter is a war-weary, former military ca...
                              ...                        
4804    El Mariachi just wants to play his guitar and ...
4805    A newlywed couple's honeymoon is upended by th...
4806    "Signed, Sealed, Delivered" introduces a dedic...
4807    When ambitious New York attorney Sam is sent t...
4808    Ever since the second grade when he first saw ...
Name: tags, Length: 4806, dtype: object

In [166]:
new_df.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


In [167]:
# Access the 'tags' column for the first row (index 0) of the 'new_df' DataFrame.
# This will return the combined tags (as a single string) for the first movie.
new_df["tags"][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'

In [168]:
# Convert all text in the 'tags' column to lowercase.
# This ensures consistency by making all tags lowercase, which can help improve the accuracy of text-based operations such as similarity calculations.
new_df["tags"] = new_df["tags"].apply(lambda x: x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["tags"] = new_df["tags"].apply(lambda x: x.lower())


In [169]:
new_df.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a paraplegic marine is di..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believed to be dead, ha..."
2,206647,Spectre,a cryptic message from bond’s past sends him o...
3,49026,The Dark Knight Rises,following the death of district attorney harve...
4,49529,John Carter,"john carter is a war-weary, former military ca..."


In [170]:
# Import the Natural Language Toolkit (nltk) library for text processing tasks.
import nltk  

# Import the PorterStemmer class from nltk's stem module to perform stemming.
from nltk.stem.porter import PorterStemmer  

# Create an instance of the PorterStemmer class to use for stemming words.
ps = PorterStemmer()  

# Define a function 'Stem' that takes a string 'text' and returns its stemmed version.
def Stem(text):    # main work: if "loved", "love", "loving" convert into love
    y = []  
    for i in text.split():  
        # Apply the stemmer to each word and append the stemmed word to the list 'y'.
        y.append(ps.stem(i))  
    # Join the list of stemmed words into a single string separated by spaces and return it.
    return " ".join(y)  

# Apply the 'Stem' function to each entry in the 'tags' column of 'new_df' DataFrame and update the column with the stemmed text.
new_df["tags"] = new_df["tags"].apply(Stem)  
new_df["tags"]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["tags"] = new_df["tags"].apply(Stem)


0       in the 22nd century, a parapleg marin is dispa...
1       captain barbossa, long believ to be dead, ha c...
2       a cryptic messag from bond’ past send him on a...
3       follow the death of district attorney harvey d...
4       john carter is a war-weary, former militari ca...
                              ...                        
4804    el mariachi just want to play hi guitar and ca...
4805    a newlyw couple' honeymoon is upend by the arr...
4806    "signed, sealed, delivered" introduc a dedic q...
4807    when ambiti new york attorney sam is sent to s...
4808    ever sinc the second grade when he first saw h...
Name: tags, Length: 4806, dtype: object

In [171]:
new_df.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believ to be dead, ha c..."
2,206647,Spectre,a cryptic messag from bond’ past send him on a...
3,49026,The Dark Knight Rises,follow the death of district attorney harvey d...
4,49529,John Carter,"john carter is a war-weary, former militari ca..."


### **3.Perform the Model**
- In machine learning (ML), a model is a mathematical representation or structure that captures the relationship between input data and output predictions.
     
#### Now we have to convert all the text into vector form, this method is called textvector.

##### There are three types:
  - 1. Bag of Words (BoW):
      - Bag of Words is a simple and commonly used technique for text vectorization.
   
  - 2. Term Frequency-Inverse Document Frequency (TF-IDF):
      - TF-IDF is useful for identifying important terms in a document and is commonly used in information retrieval, text mining, and search engine algorithms.
        
  - 3. Word Embeddings (Word Vectors):
      - Word embeddings are dense, low-dimensional representations of words in a continuous vector space.

      - Word embeddings are widely used in natural language processing (NLP) tasks such as text classification, named entity recognition, and machine translation.

#### But we have to use "1. Bags of Words(BoW)" method to convert text into vector form mostly in our projects.


In [172]:
# Import CountVectorizer from scikit-learn for text feature extraction
from sklearn.feature_extraction.text import CountVectorizer

# Initialize CountVectorizer with a max of 5000 features and English stop words
cv = CountVectorizer(max_features = 5000, stop_words = "english")

In [174]:
# Convert the tags into vectors using CountVectorizer
vectors = cv.fit_transform(new_df["tags"]).toarray()
vectors

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [175]:
# print the shape of the vectors
vectors.shape   # first show row and second show column

(4806, 5000)

In [176]:
# Retrieve the feature names (vocabulary) from the CountVectorizer object 'cv' using the 'get_feature_names_out()' method.
cv.get_feature_names_out()  # cv.get_feature_names()  


array(['000', '007', '10', ..., 'zone', 'zoo', 'zooeydeschanel'],
      dtype=object)

In [177]:
### for checking """ cv.get_feature_names()  """
if hasattr(cv, 'get_feature_names'):
    feature_names = cv.get_feature_names()
else:
    print("Attribute 'get_feature_names' not found.")

## so we use  cv.get_feature_names_out()

Attribute 'get_feature_names' not found.


In [178]:
# Print all the attributes and methods of the CountVectorizer object 'cv' to inspect its capabilities and properties.
print(dir(cv))

['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__sklearn_clone__', '__str__', '__subclasshook__', '__weakref__', '_build_request_for_signature', '_char_ngrams', '_char_wb_ngrams', '_check_feature_names', '_check_n_features', '_check_stop_words_consistency', '_check_vocabulary', '_count_vocab', '_doc_link_module', '_doc_link_template', '_doc_link_url_param_generator', '_get_default_requests', '_get_doc_link', '_get_metadata_request', '_get_param_names', '_get_tags', '_limit_features', '_more_tags', '_parameter_constraints', '_repr_html_', '_repr_html_inner', '_repr_mimebundle_', '_sort_features', '_stop_words_id', '_validate_data', '_validate_ngram_range', '_validate_params', '_validate_v

In [179]:
## As a sample  
ps.stem("Maked")

'make'

#### **Compute Cosine Similarity**

In [180]:
## Now we have to find minimum angle between vectors using cosine_distances or cosine_similarity

# Import cosine_similarity to compute the similarity between vectors
from sklearn.metrics.pairwise import cosine_similarity

# Compute the cosine similarity between the vectors
similarity = cosine_similarity(vectors)
similarity

array([[1.        , 0.08346223, 0.0860309 , ..., 0.04499213, 0.        ,
        0.        ],
       [0.08346223, 1.        , 0.06063391, ..., 0.02378257, 0.        ,
        0.02615329],
       [0.0860309 , 0.06063391, 1.        , ..., 0.02451452, 0.        ,
        0.        ],
       ...,
       [0.04499213, 0.02378257, 0.02451452, ..., 1.        , 0.03962144,
        0.04229549],
       [0.        , 0.        , 0.        , ..., 0.03962144, 1.        ,
        0.08714204],
       [0.        , 0.02615329, 0.        , ..., 0.04229549, 0.08714204,
        1.        ]])

In [182]:
# Get the shape (dimensions) of the 'similarity' array, 
similarity.shape

(4806, 4806)

In [None]:
# Retrieve the first row of the 'similarity' matrix, which represents similarity scores for the first item.
similarity[0]

array([1.        , 0.08346223, 0.0860309 , ..., 0.04499213, 0.        ,
       0.        ])

In [183]:
# Enumerate over the similarity scores of the first item, sort them in descending order, and retrieve the indices of the top 5 most similar items (excluding the first item itself).
sorted(list(enumerate(similarity[0])), reverse=True, key=lambda x: x[1])[1:6]

[(1216, 0.28676966733820225),
 (2409, 0.26901379342448517),
 (3730, 0.2605130246476754),
 (507, 0.255608593705383),
 (539, 0.25038669783359574)]

In [184]:
# Check if the 'title' column in 'new_df' DataFrame has the value "Avatar".
new_df["title"] == "Avatar"

0        True
1       False
2       False
3       False
4       False
        ...  
4804    False
4805    False
4806    False
4807    False
4808    False
Name: title, Length: 4806, dtype: bool

In [185]:
# Filter 'new_df' DataFrame to get rows where the 'title' column equals "Avatar".
new_df[new_df["title"] == "Avatar"]

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."


In [188]:
# Get the index of the first row where the 'title' column equals "Avatar".
new_df[new_df["title"] == "Avatar"].index[0]

0

#### **Recommendation Function**

In [189]:
# Define a function to recommend movies based on similarity
def recommend(movie):
    # Find the index of the movie
    movie_index = new_df[new_df['title'] == movie].index[0]  
    # Get the similarity scores for that movie
    distances = similarity[movie_index]  
    # Sort and get the top 5 similar movies
    movies_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]  
    # movies_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[:5]  

    # Print the titles of the recommended movies
    for i in movies_list:
        print(new_df.iloc[i[0]].title)  

In [190]:
# Call the 'recommend' function with "Independence Day" as the input to get recommendations based on this title.
recommend("Independence Day")

Independence Daysaster
Meet Dave
Aliens vs Predator: Requiem
Escape from Planet Earth
The Day the Earth Stood Still


In [191]:
# Retrieve the title of the movie at index 1216 from the 'new_df' DataFrame.
new_df.iloc[1216].title

'Aliens vs Predator: Requiem'

#### **Save Model and Data**

In [192]:
# Import pickle for saving and loading Python objects
import pickle

In [195]:
# Save the new DataFrame as a pickle file
pickle.dump(new_df.to_dict(), open("Movie_Recommender_System/movies_dict.pkl", "wb"))  

In [196]:
# Save the similarity matrix as a pickle file
pickle.dump(similarity, open("Movie_Recommender_System/similarity.pkl", "wb"))

### **Step 4: Websites ( these step are performed on another folder "Movie_Recommender_System"**
- A website is a collection of web pages and related content that are typically identified by a common domain name and accessible via the Internet. 

- These web pages are stored on web servers and can contain various types of content such as text,images, videos, hyperlinks, forms, and interactive elements.