***Movie Recommendation System***


A movie recommendation system helps people find movies they might like. One way it does this is through **collaborative filtering**, where it looks at what movies people with similar tastes have enjoyed. This method works well when there's a lot of data about what movies people have watched and rated.

Another way these systems work is called **content-based filtering**. This method suggests movies based on specific details like genres, actors, or plots. It picks movies that are similar to ones a person has already shown interest in.

Some systems combine both methods to give even better suggestions or **Combining collaborative filtering** and **content-based filtering** can further refine movie recommendations.. By using these techniques, movie recommendation systems make it easier for users to discover movies they'll enjoy watching.

# **Import Library**

In [None]:
import pandas as pd

In [None]:
import numpy as np

# **Import Datasets**

In [7]:
df=pd.read_csv('https://raw.githubusercontent.com/YBI-Foundation/Dataset/main/Movies%20Recommendation.csv')

In [8]:
df.head()

Unnamed: 0,Movie_ID,Movie_Title,Movie_Genre,Movie_Language,Movie_Budget,Movie_Popularity,Movie_Release_Date,Movie_Revenue,Movie_Runtime,Movie_Vote,...,Movie_Homepage,Movie_Keywords,Movie_Overview,Movie_Production_House,Movie_Production_Country,Movie_Spoken_Language,Movie_Tagline,Movie_Cast,Movie_Crew,Movie_Director
0,1,Four Rooms,Crime Comedy,en,4000000,22.87623,09-12-1995,4300000,98.0,6.5,...,,hotel new year's eve witch bet hotel room,It's Ted the Bellhop's first night on the job....,"[{""name"": ""Miramax Films"", ""id"": 14}, {""name"":...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,"[{'name': 'Allison Anders', 'gender': 1, 'depa...",Allison Anders
1,2,Star Wars,Adventure Action Science Fiction,en,11000000,126.393695,25-05-1977,775398007,121.0,8.1,...,http://www.starwars.com/films/star-wars-episod...,android galaxy hermit death star lightsaber,Princess Leia is captured and held hostage by ...,"[{""name"": ""Lucasfilm"", ""id"": 1}, {""name"": ""Twe...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,"[{'name': 'George Lucas', 'gender': 2, 'depart...",George Lucas
2,3,Finding Nemo,Animation Family,en,94000000,85.688789,30-05-2003,940335536,100.0,7.6,...,http://movies.disney.com/finding-nemo,father son relationship harbor underwater fish...,"Nemo, an adventurous young clownfish, is unexp...","[{""name"": ""Pixar Animation Studios"", ""id"": 3}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton
3,4,Forrest Gump,Comedy Drama Romance,en,55000000,138.133331,06-07-1994,677945399,142.0,8.2,...,,vietnam veteran hippie mentally disabled runni...,A man with a low IQ has accomplished great thi...,"[{""name"": ""Paramount Pictures"", ""id"": 4}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,"[{'name': 'Alan Silvestri', 'gender': 2, 'depa...",Robert Zemeckis
4,5,American Beauty,Drama,en,15000000,80.878605,15-09-1999,356296601,122.0,7.9,...,http://www.dreamworks.com/ab/,male nudity female nudity adultery midlife cri...,"Lester Burnham, a depressed suburban father in...","[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4760 entries, 0 to 4759
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Movie_ID                  4760 non-null   int64  
 1   Movie_Title               4760 non-null   object 
 2   Movie_Genre               4760 non-null   object 
 3   Movie_Language            4760 non-null   object 
 4   Movie_Budget              4760 non-null   int64  
 5   Movie_Popularity          4760 non-null   float64
 6   Movie_Release_Date        4760 non-null   object 
 7   Movie_Revenue             4760 non-null   int64  
 8   Movie_Runtime             4758 non-null   float64
 9   Movie_Vote                4760 non-null   float64
 10  Movie_Vote_Count          4760 non-null   int64  
 11  Movie_Homepage            1699 non-null   object 
 12  Movie_Keywords            4373 non-null   object 
 13  Movie_Overview            4757 non-null   object 
 14  Movie_Pr

In [10]:
df.shape

(4760, 21)

In [11]:
df.columns

Index(['Movie_ID', 'Movie_Title', 'Movie_Genre', 'Movie_Language',
       'Movie_Budget', 'Movie_Popularity', 'Movie_Release_Date',
       'Movie_Revenue', 'Movie_Runtime', 'Movie_Vote', 'Movie_Vote_Count',
       'Movie_Homepage', 'Movie_Keywords', 'Movie_Overview',
       'Movie_Production_House', 'Movie_Production_Country',
       'Movie_Spoken_Language', 'Movie_Tagline', 'Movie_Cast', 'Movie_Crew',
       'Movie_Director'],
      dtype='object')

# **Get Feature Selection**

In [12]:
df_features=df[['Movie_Genre','Movie_Keywords','Movie_Tagline','Movie_Cast','Movie_Director']].fillna('')

In [13]:
df_features.shape

(4760, 5)

In [14]:
df_features

Unnamed: 0,Movie_Genre,Movie_Keywords,Movie_Tagline,Movie_Cast,Movie_Director
0,Crime Comedy,hotel new year's eve witch bet hotel room,Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,Allison Anders
1,Adventure Action Science Fiction,android galaxy hermit death star lightsaber,"A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,George Lucas
2,Animation Family,father son relationship harbor underwater fish...,"There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,Andrew Stanton
3,Comedy Drama Romance,vietnam veteran hippie mentally disabled runni...,"The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,Robert Zemeckis
4,Drama,male nudity female nudity adultery midlife cri...,Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,Sam Mendes
...,...,...,...,...,...
4755,Horror,,The hot spot where Satan's waitin'.,Lisa Hart Carroll Michael Des Barres Paul Drak...,Pece Dingo
4756,Comedy Family Drama,,It’s better to stand out than to fit in.,Roni Akurati Brighton Sharbino Jason Lee Anjul...,Frank Lotito
4757,Thriller Drama,christian film sex trafficking,She never knew it could happen to her...,Nicole Smolen Kim Baldwin Ariana Stephens Brys...,Jaco Booyens
4758,Family,,,,


In [15]:
x= df_features['Movie_Genre']+' '+df_features['Movie_Keywords']+' '+df_features['Movie_Tagline']+' '+df_features['Movie_Cast']+' '+df_features['Movie_Director']+' '

In [16]:
x.shape

(4760,)

# **Get Features Text Conversion to Tokens**

In [25]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Assuming x is your data (e.g., a list of strings)
x = ["some text", "another text"]

# Create an instance of TfidfVectorizer
tfidf = TfidfVectorizer()

# Fit-transform the data
x_tfidf = tfidf.fit_transform(x)


In [26]:
x_tfidf.shape

(2, 3)

In [27]:
print(x_tfidf)

  (0, 2)	0.5797386715376657
  (0, 1)	0.8148024746671689
  (1, 0)	0.8148024746671689
  (1, 2)	0.5797386715376657


# **Get Similarity Score Using Cosine Similarity**

In [28]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Example data
documents = [
    "This is the first document.",
    "This document is the second document.",
    "And this is the third one.",
    "Is this the first document?",
]

# Step 1: Compute TF-IDF vectors
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)

# Step 2: Compute cosine similarity
cosine_similarities = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Output the cosine similarity matrix
print("Cosine Similarity Matrix:")
print(cosine_similarities)


Cosine Similarity Matrix:
[[1.         0.64692568 0.30777187 1.        ]
 [0.64692568 1.         0.22523955 0.64692568]
 [0.30777187 0.22523955 1.         0.30777187]
 [1.         0.64692568 0.30777187 1.        ]]


In [29]:
similarity_score = cosine_similarities[0, 1]
print(f"Cosine similarity between document 1 and document 2: {similarity_score}")


Cosine similarity between document 1 and document 2: 0.6469256761082494


## **Get Movie Name as Input from user and Validate for Closest Spelling**

In [38]:
import difflib

# Example list of movie names (you can replace this with your dataset)
All_movie_title_list = [
    "The Shawshank Redemption",
    "The Godfather",
    "The Dark Knight",
    "Pulp Fiction",
    "Fight Club",
    "Forrest Gump",
    "The Matrix",
    "Goodfellas",
    "Schindler's List",
    "Inception"
]

# Example user input (replace this with your actual input method)
Fav_movie_name = "The Shawsank Redempshun"

# Calculate similarity score for each movie relative to Fav_movie_name
similarity_scores = {}
for movie in All_movie_title_list:
    similarity_scores[movie] = difflib.SequenceMatcher(None, Fav_movie_name, movie).ratio()

# Sort movies based on similarity scores
sorted_movies = sorted(similarity_scores.items(), key=lambda x: x[1], reverse=True)

# Print sorted movies based on recommendation score
print("Movies sorted based on recommendation score:")
for movie, score in sorted_movies:
    print(f"{movie}: {score}")

# Print names of similar movies based on index
print("\nNames of similar movies based on index:")
for i, (movie, score) in enumerate(sorted_movies):
    print(f"Index {i}: {movie}")


Movies sorted based on recommendation score:
The Shawshank Redemption: 0.851063829787234
The Dark Knight: 0.42105263157894735
The Godfather: 0.3333333333333333
The Matrix: 0.30303030303030304
Forrest Gump: 0.22857142857142856
Schindler's List: 0.20512820512820512
Fight Club: 0.18181818181818182
Goodfellas: 0.18181818181818182
Inception: 0.125
Pulp Fiction: 0.11428571428571428

Names of similar movies based on index:
Index 0: The Shawshank Redemption
Index 1: The Dark Knight
Index 2: The Godfather
Index 3: The Matrix
Index 4: Forrest Gump
Index 5: Schindler's List
Index 6: Fight Club
Index 7: Goodfellas
Index 8: Inception
Index 9: Pulp Fiction


# **Top 10 Movie Recommendation System**

In [40]:
# Example user input (replace this with your actual input method)
Fav_movie_name = input("Enter your favorite movie: ")

# Calculate similarity score for each movie relative to Fav_movie_name
similarity_scores = {}
for movie in All_movie_title_list:
    similarity_scores[movie] = difflib.SequenceMatcher(None, Fav_movie_name, movie).ratio()

# Sort movies based on similarity scores
sorted_movies = sorted(similarity_scores.items(), key=lambda x: x[1], reverse=True)

# Recommend top 10 movies
print(f"\nTop 10 movie recommendations for '{Fav_movie_name}':")
for i, (movie, score) in enumerate(sorted_movies[:10]):
    print(f"{i+1}. {movie} (Similarity Score: {score})")


Enter your favorite movie: The Shawsank Redempshun

Top 10 movie recommendations for 'The Shawsank Redempshun':
1. The Shawshank Redemption (Similarity Score: 0.851063829787234)
2. The Dark Knight (Similarity Score: 0.42105263157894735)
3. The Godfather (Similarity Score: 0.3333333333333333)
4. The Matrix (Similarity Score: 0.30303030303030304)
5. Forrest Gump (Similarity Score: 0.22857142857142856)
6. Schindler's List (Similarity Score: 0.20512820512820512)
7. Fight Club (Similarity Score: 0.18181818181818182)
8. Goodfellas (Similarity Score: 0.18181818181818182)
9. Inception (Similarity Score: 0.125)
10. Pulp Fiction (Similarity Score: 0.11428571428571428)
