Title:**Movie Recommendation System**

**Description**:A Movie Recommendation System is a machine learning project that suggests movies to users based on their preferences and past behavior. The system uses different algorithms to predict what movies a user might enjoy, including collaborative filtering, content-based filtering, or a combination of both.

In collaborative filtering, the system looks at other users with similar tastes and recommends movies they liked. Content-based filtering, on the other hand, suggests movies similar to the ones the user has previously enjoyed, based on features like genre, director, or actors.

Users can create profiles, rate movies, and receive personalized suggestions that update over time. The system uses data like movie ratings, genres, and reviews to make these predictions.

Machine learning techniques such as matrix factorization, clustering algorithms, or even deep learning can be used to improve recommendation accuracy. Evaluation metrics like precision, recall, and RMSE (Root Mean Squared Error) help measure the performance of the system.

For this project, tools like Python, Scikit-learn, and datasets like MovieLens or IMDB are commonly used. The system can be deployed as a web application using frameworks like Flask or Django.

**Import Library**

In [1]:
import pandas as pd

In [2]:
import numpy as np

**Import Dataset**

In [3]:
df=pd.read_csv('https://raw.githubusercontent.com/YBIFoundation/Dataset/refs/heads/main/Movies%20Recommendation.csv')

In [4]:
df.head()

Unnamed: 0,Movie_ID,Movie_Title,Movie_Genre,Movie_Language,Movie_Budget,Movie_Popularity,Movie_Release_Date,Movie_Revenue,Movie_Runtime,Movie_Vote,...,Movie_Homepage,Movie_Keywords,Movie_Overview,Movie_Production_House,Movie_Production_Country,Movie_Spoken_Language,Movie_Tagline,Movie_Cast,Movie_Crew,Movie_Director
0,1,Four Rooms,Crime Comedy,en,4000000,22.87623,09-12-1995,4300000,98.0,6.5,...,,hotel new year's eve witch bet hotel room,It's Ted the Bellhop's first night on the job....,"[{""name"": ""Miramax Films"", ""id"": 14}, {""name"":...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,"[{'name': 'Allison Anders', 'gender': 1, 'depa...",Allison Anders
1,2,Star Wars,Adventure Action Science Fiction,en,11000000,126.393695,25-05-1977,775398007,121.0,8.1,...,http://www.starwars.com/films/star-wars-episod...,android galaxy hermit death star lightsaber,Princess Leia is captured and held hostage by ...,"[{""name"": ""Lucasfilm"", ""id"": 1}, {""name"": ""Twe...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,"[{'name': 'George Lucas', 'gender': 2, 'depart...",George Lucas
2,3,Finding Nemo,Animation Family,en,94000000,85.688789,30-05-2003,940335536,100.0,7.6,...,http://movies.disney.com/finding-nemo,father son relationship harbor underwater fish...,"Nemo, an adventurous young clownfish, is unexp...","[{""name"": ""Pixar Animation Studios"", ""id"": 3}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton
3,4,Forrest Gump,Comedy Drama Romance,en,55000000,138.133331,06-07-1994,677945399,142.0,8.2,...,,vietnam veteran hippie mentally disabled runni...,A man with a low IQ has accomplished great thi...,"[{""name"": ""Paramount Pictures"", ""id"": 4}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,"[{'name': 'Alan Silvestri', 'gender': 2, 'depa...",Robert Zemeckis
4,5,American Beauty,Drama,en,15000000,80.878605,15-09-1999,356296601,122.0,7.9,...,http://www.dreamworks.com/ab/,male nudity female nudity adultery midlife cri...,"Lester Burnham, a depressed suburban father in...","[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4760 entries, 0 to 4759
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Movie_ID                  4760 non-null   int64  
 1   Movie_Title               4760 non-null   object 
 2   Movie_Genre               4760 non-null   object 
 3   Movie_Language            4760 non-null   object 
 4   Movie_Budget              4760 non-null   int64  
 5   Movie_Popularity          4760 non-null   float64
 6   Movie_Release_Date        4760 non-null   object 
 7   Movie_Revenue             4760 non-null   int64  
 8   Movie_Runtime             4758 non-null   float64
 9   Movie_Vote                4760 non-null   float64
 10  Movie_Vote_Count          4760 non-null   int64  
 11  Movie_Homepage            1699 non-null   object 
 12  Movie_Keywords            4373 non-null   object 
 13  Movie_Overview            4757 non-null   object 
 14  Movie_Pr

In [7]:
df.shape

(4760, 21)

In [8]:
df.columns

Index(['Movie_ID', 'Movie_Title', 'Movie_Genre', 'Movie_Language',
       'Movie_Budget', 'Movie_Popularity', 'Movie_Release_Date',
       'Movie_Revenue', 'Movie_Runtime', 'Movie_Vote', 'Movie_Vote_Count',
       'Movie_Homepage', 'Movie_Keywords', 'Movie_Overview',
       'Movie_Production_House', 'Movie_Production_Country',
       'Movie_Spoken_Language', 'Movie_Tagline', 'Movie_Cast', 'Movie_Crew',
       'Movie_Director'],
      dtype='object')

**Get Feature Selection**

In [9]:
df_features=df[['Movie_Genre','Movie_Keywords','Movie_Tagline','Movie_Cast','Movie_Director']].fillna('')

In [10]:
df_features.shape

(4760, 5)

In [11]:
df_features

Unnamed: 0,Movie_Genre,Movie_Keywords,Movie_Tagline,Movie_Cast,Movie_Director
0,Crime Comedy,hotel new year's eve witch bet hotel room,Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,Allison Anders
1,Adventure Action Science Fiction,android galaxy hermit death star lightsaber,"A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,George Lucas
2,Animation Family,father son relationship harbor underwater fish...,"There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,Andrew Stanton
3,Comedy Drama Romance,vietnam veteran hippie mentally disabled runni...,"The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,Robert Zemeckis
4,Drama,male nudity female nudity adultery midlife cri...,Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,Sam Mendes
...,...,...,...,...,...
4755,Horror,,The hot spot where Satan's waitin'.,Lisa Hart Carroll Michael Des Barres Paul Drak...,Pece Dingo
4756,Comedy Family Drama,,It’s better to stand out than to fit in.,Roni Akurati Brighton Sharbino Jason Lee Anjul...,Frank Lotito
4757,Thriller Drama,christian film sex trafficking,She never knew it could happen to her...,Nicole Smolen Kim Baldwin Ariana Stephens Brys...,Jaco Booyens
4758,Family,,,,


In [12]:
x=df_features['Movie_Genre']+' '+df_features['Movie_Keywords']+' '+df_features['Movie_Tagline']+' '+df_features['Movie_Cast']+' '+df_features['Movie_Director']

In [13]:
x

Unnamed: 0,0
0,Crime Comedy hotel new year's eve witch bet ho...
1,Adventure Action Science Fiction android galax...
2,Animation Family father son relationship harbo...
3,Comedy Drama Romance vietnam veteran hippie me...
4,Drama male nudity female nudity adultery midli...
...,...
4755,Horror The hot spot where Satan's waitin'. Li...
4756,Comedy Family Drama It’s better to stand out ...
4757,Thriller Drama christian film sex trafficking ...
4758,Family


In [14]:
x.shape

(4760,)

**Get Feature text conversion to tokens**

In [15]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [16]:
tfidf=TfidfVectorizer()

In [17]:
x=tfidf.fit_transform(x)

In [18]:
x.shape

(4760, 17258)

In [21]:
print(x)

  (0, 3583)	0.06486754376295062
  (0, 3240)	0.04527089872278055
  (0, 7213)	0.25146675849405775
  (0, 10898)	0.17625708810661284
  (0, 17052)	0.26079573581490934
  (0, 5059)	0.29553419178998613
  (0, 16862)	0.12768803549311025
  (0, 1595)	0.15687561633854538
  (0, 13052)	0.1465525095337543
  (0, 15708)	0.17654247479915475
  (0, 11362)	0.18801785343006192
  (0, 6463)	0.18801785343006192
  (0, 5662)	0.1465525095337543
  (0, 13467)	0.19712637387361423
  (0, 12731)	0.19712637387361423
  (0, 614)	0.07642616241686973
  (0, 11244)	0.08262965296941757
  (0, 9206)	0.15186283580984414
  (0, 1495)	0.19712637387361423
  (0, 7454)	0.14745635785412262
  (0, 7071)	0.19822417598406614
  (0, 5499)	0.11454057510303811
  (0, 3878)	0.11998399582562203
  (0, 11242)	0.07277788238484746
  (0, 15219)	0.09800472886453934
  :	:
  (4757, 3485)	0.199161573117024
  (4757, 1184)	0.18890726729447022
  (4757, 14568)	0.24255077606762876
  (4757, 15508)	0.24255077606762876
  (4757, 5802)	0.24255077606762876
  (4757, 81

**Get Similarity Score using Cosine Similarity**

In [22]:
from sklearn.metrics.pairwise import cosine_similarity

In [23]:
similarity_score=cosine_similarity(x)

In [24]:
similarity_score

array([[1.        , 0.01351235, 0.03570468, ..., 0.        , 0.        ,
        0.        ],
       [0.01351235, 1.        , 0.00806674, ..., 0.        , 0.        ,
        0.        ],
       [0.03570468, 0.00806674, 1.        , ..., 0.        , 0.08014876,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.08014876, ..., 0.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

In [25]:
similarity_score.shape

(4760, 4760)

**Get Movie Name as input from user and validate for closest spelling**

In [26]:
favourite_movie_name=input('Enter your favourite movie name:')

Enter your favourite movie name:Family Star


In [27]:
All_movies_title_list=df['Movie_Title'].tolist()

In [28]:
import difflib

In [29]:
movie_Recommandation = difflib.get_close_matches(favourite_movie_name,All_movies_title_list)
print(movie_Recommandation)

['Family Plot', 'The Family Stone', 'The Family Man']


In [30]:
Close_Match = movie_Recommandation[0]
print(Close_Match)

Family Plot


In [31]:
Index_of_Close_Match_Movie = df[df.Movie_Title == Close_Match]['Movie_ID'].values[0]
print(Index_of_Close_Match_Movie)

932


In [32]:
Recommendation_Score = list(enumerate(similarity_score[Index_of_Close_Match_Movie]))
print(Recommendation_Score)

[(0, 0.0), (1, 0.015857237966330748), (2, 0.044845671090827245), (3, 0.0), (4, 0.0), (5, 0.0), (6, 0.024211050143785377), (7, 0.012753326647009032), (8, 0.0), (9, 0.0), (10, 0.0), (11, 0.0), (12, 0.0), (13, 0.0), (14, 0.04782528813938533), (15, 0.06262327876726285), (16, 0.003958391467168829), (17, 0.029286597267946222), (18, 0.004553941718540023), (19, 0.027942932966484445), (20, 0.0), (21, 0.005078716565588897), (22, 0.017627646202675224), (23, 0.0), (24, 0.0), (25, 0.0), (26, 0.011922080128988388), (27, 0.023375042680795287), (28, 0.018025109314420543), (29, 0.0), (30, 0.016080122650479506), (31, 0.021162976826709905), (32, 0.0), (33, 0.0), (34, 0.004941860582051031), (35, 0.016462962814537536), (36, 0.01630437246831304), (37, 0.0), (38, 0.0), (39, 0.022594890617962972), (40, 0.0), (41, 0.0), (42, 0.01824958846679104), (43, 0.0), (44, 0.00395336030030767), (45, 0.03422276375770694), (46, 0.020914897551345173), (47, 0.02005180755496404), (48, 0.005232896902209581), (49, 0.0), (50, 0.

In [33]:
len(Recommendation_Score)

4760

Got all the movies sort Based on Recommendation Score with Favorite Movie

In [34]:
Sorted_Similar_Movies = sorted(Recommendation_Score,key=lambda x:x[1],reverse=True)
print(Sorted_Similar_Movies)

[(932, 1.0), (2144, 0.16410832029459974), (147, 0.1521563051723581), (2536, 0.14339507573564866), (1869, 0.14070880385963858), (1716, 0.14069083216440637), (4483, 0.13094728986704185), (1361, 0.12953958891698955), (952, 0.12470771503091217), (248, 0.11746518975431287), (2802, 0.11595751592167765), (935, 0.1134351032047765), (1494, 0.11308607314520636), (136, 0.11172006610711943), (3852, 0.1108664603726183), (2601, 0.10736336473140194), (3323, 0.10331211708843029), (1709, 0.10290714815749846), (2226, 0.09805778519517545), (691, 0.09793701775230558), (126, 0.09792526108493765), (355, 0.09492708856891502), (3225, 0.09468269667569035), (2680, 0.09379920421669748), (1927, 0.09219753656190723), (218, 0.09155250550524374), (1182, 0.09130601648963135), (3252, 0.09097847488031324), (1319, 0.09059958807147513), (334, 0.08859334503688078), (3390, 0.08798521747556699), (4168, 0.08767084565573124), (1691, 0.08556615667479593), (988, 0.08548369981620281), (695, 0.08538436005282557), (2035, 0.0849540

In [35]:
print('Top 30 Movies Suggested for you:\n')
i=1
for movie in Sorted_Similar_Movies:
  index=movie[0]
  title_from_index=df[df.index==index]['Movie_Title'].values[0]
  if(i<31):
    print(i,'.',title_from_index)
    i+=1

Top 30 Movies Suggested for you:

1 . The Mist
2 . Pollock
3 . Mystic River
4 . Welcome to Mooseport
5 . The Majestic
6 . Frequency
7 . Elsa & Fred
8 . Flubber
9 . Dreamcatcher
10 . Silent Hill
11 . Dear John
12 . Into the Wild
13 . The Hoax
14 . Meet Joe Black
15 . House at the End of the Street
16 . Duets
17 . The Helix... Loaded
18 . Spy Hard
19 . The Children of Huang Shi
20 . Death at a Funeral
21 . The Shawshank Redemption
22 . City of Angels
23 . Real Steel
24 . Pandorum
25 . Jeepers Creepers 2
26 . Ladyhawke
27 . Dear Frankie
28 . Trapeze
29 . He Got Game
30 . From Dusk Till Dawn


**Top 10 Movie Recommendation System**

In [40]:
Movie_name=input("Enter Your Favorite Movie Name:")
List_of_all_titles=df["Movie_Title"].tolist()
Find_close_Match=difflib.get_close_matches(Movie_name,List_of_all_titles)
Close_Match=Find_close_Match[0]
Index_of_Movie=df[df.Movie_Title==Close_Match]['Movie_ID'].values[0]
Recommendation_Score=list(enumerate(similarity_score[Index_of_Movie]))
sorted_similar_movies =sorted(Recommendation_Score,key=lambda x:x[1],reverse=True)
print('Top 10 Movies Suggested for you:\n')
i=1
for movie in sorted_similar_movies:
  index=movie[0]
  title_from_index=df.iloc[index]['Movie_Title']
  if(i<11):
    print(i,'.',title_from_index)
    i+=1

Enter Your Favorite Movie Name:family star
Top 10 Movies Suggested for you:

1 . The Mist
2 . Pollock
3 . Mystic River
4 . Welcome to Mooseport
5 . The Majestic
6 . Frequency
7 . Elsa & Fred
8 . Flubber
9 . Dreamcatcher
10 . Silent Hill
