<a href="https://colab.research.google.com/github/RiyaKhushiRadha/CodSoft-Internship-Projects/blob/main/Movie_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Import Libraries**

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.metrics.pairwise import cosine_similarity

**Load MovieLens Dataset**

In [None]:
!wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
!unzip ml-latest-small.zip

--2025-07-29 14:23:29--  http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘ml-latest-small.zip’


2025-07-29 14:23:29 (6.61 MB/s) - ‘ml-latest-small.zip’ saved [978202/978202]

Archive:  ml-latest-small.zip
   creating: ml-latest-small/
  inflating: ml-latest-small/links.csv  
  inflating: ml-latest-small/tags.csv  
  inflating: ml-latest-small/ratings.csv  
  inflating: ml-latest-small/README.txt  
  inflating: ml-latest-small/movies.csv  


In [None]:
# Load movie metadata
ratings = pd.read_csv("ml-latest-small/ratings.csv")
movies = pd.read_csv("ml-latest-small/movies.csv")

print("Movies:\n", movies.head())
print("Ratings:\n", ratings.head())

Movies:
    movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  
Ratings:
    userId  movieId  rating  timestamp
0       1        1     4.0  964982703
1       1        3     4.0  964981247
2       1        6     4.0  964982224
3       1       47     5.0  964983815
4       1       50     5.0  964982931


**TF-IDF Vectorization & Cosine Similarity**

In [None]:
# Fill any missing genres
movies['genres'] = movies['genres'].fillna('')

# TF-IDF Vectorizer on genres
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'])

# Cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Build index map
indices = pd.Series(movies.index, index=movies['title']).drop_duplicates()

**Recommendation Function (Content-Based)**

In [None]:
def recommend_content(title, num_recommendations=5):
    if title not in indices:
        return "Movie not found!"
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:num_recommendations+1]
    movie_indices = [i[0] for i in sim_scores]
    return movies['title'].iloc[movie_indices]

In [None]:
recommend_content("Toy Story (1995)")

Unnamed: 0,title
1706,Antz (1998)
2355,Toy Story 2 (1999)
2809,"Adventures of Rocky and Bullwinkle, The (2000)"
3000,"Emperor's New Groove, The (2000)"
3568,"Monsters, Inc. (2001)"


**Build User-Item Matrix**

In [None]:
# Create user-item matrix
user_movie_matrix = ratings.pivot(index='userId', columns='movieId', values='rating')
user_movie_matrix.fillna(0, inplace=True)

**Compute User Similarity**

In [None]:
user_sim = cosine_similarity(user_movie_matrix)

**Recommendation Function (Collaborative)**

In [None]:
def recommend_collaborative(user_id, num_recommendations=5):
    if user_id not in user_movie_matrix.index:
        return "User not found!"

    similar_users = list(enumerate(user_sim[user_id - 1]))  # index starts from 0
    similar_users = sorted(similar_users, key=lambda x: x[1], reverse=True)[1:]

    user_ratings = user_movie_matrix.loc[user_id]
    recommendations = {}

    for other_user_idx, sim_score in similar_users[:10]:  # use top 10 similar users
        other_user_id = other_user_idx + 1
        other_ratings = user_movie_matrix.loc[other_user_id]

        for movie_id in user_ratings.index:
            if user_ratings[movie_id] == 0 and other_ratings[movie_id] > 0:
                if movie_id not in recommendations:
                    recommendations[movie_id] = sim_score * other_ratings[movie_id]
                else:
                    recommendations[movie_id] += sim_score * other_ratings[movie_id]

    sorted_recs = sorted(recommendations.items(), key=lambda x: x[1], reverse=True)
    top_movie_ids = [movie_id for movie_id, _ in sorted_recs[:num_recommendations]]
    return movies[movies['movieId'].isin(top_movie_ids)]['title']

In [None]:
recommend_collaborative(1)  # Try with userId = 1

Unnamed: 0,title
507,Terminator 2: Judgment Day (1991)
659,"Godfather, The (1972)"
902,Aliens (1986)
1211,"Hunt for Red October, The (1990)"
2078,"Sixth Sense, The (1999)"


**Preprocess the Genres**

In [None]:
# Replace "|" with space so the genres become a string like "Adventure Animation Children"
movies['genres'] = movies['genres'].str.replace('|', ' ')

**Apply TF-IDF Vectorizer on Genres**

In [None]:
# Create TF-IDF matrix on genres
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'])

**Compute Cosine Similarity Between Movies**

In [None]:
# Compute cosine similarity (dot product) between all movie vectors
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

**Create a Movie Index Mapping**

In [None]:
# Create a mapping from movie title to index
indices = pd.Series(movies.index, index=movies['title']).drop_duplicates()

**Build the Recommendation Function**

In [None]:
def recommend_movies_by_genre(title, num_recommendations=5):
    if title not in indices:
        return "Movie not found!"

    idx = indices[title]

    # Get similarity scores of all movies with this one
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort by similarity score
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the top N most similar movies (skip the first one — it's the same movie)
    sim_scores = sim_scores[1:num_recommendations + 1]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top similar movie titles
    return movies['title'].iloc[movie_indices]

In [None]:
recommend_movies_by_genre("Toy Story (1995)")

Unnamed: 0,title
1706,Antz (1998)
2355,Toy Story 2 (1999)
2809,"Adventures of Rocky and Bullwinkle, The (2000)"
3000,"Emperor's New Groove, The (2000)"
3568,"Monsters, Inc. (2001)"


**Define Function to Search by Genre**

In [None]:
def find_movies_by_genre(genre_keyword):
    # Convert genre to lowercase for case-insensitive search
    genre_keyword = genre_keyword.lower()

    # Filter rows where the genre contains the keyword
    matched = movies[movies['genres'].str.lower().str.contains(genre_keyword)]

    # Return only the titles
    return matched['title'].head(20)  # show first 20 matches

In [None]:
find_movies_by_genre("Comedy")

Unnamed: 0,title
0,Toy Story (1995)
2,Grumpier Old Men (1995)
3,Waiting to Exhale (1995)
4,Father of the Bride Part II (1995)
6,Sabrina (1995)
10,"American President, The (1995)"
11,Dracula: Dead and Loving It (1995)
17,Four Rooms (1995)
18,Ace Ventura: When Nature Calls (1995)
19,Money Train (1995)


**User Interactions**

In [None]:
print("Welcome to Movie Recommendation System!!")

while True:

    print("What would you like to do?")
    print("1. Recommend movies based on content")
    print("2. Recommend movies based on collaborative filtering")
    print("3. Find movies by genre")
    print("4. Recommend movies by same genre")
    print("5. Exit")

    choice = input("Enter your choice (1-5): ")

    if choice not in ['1', '2', '3', '4','5']:
        print("Invalid choice. Please enter a number between 1 and 4.")

    elif choice == '1':
      title = input("Enter the movie title: ")
      num_recommendations = int(input("Enter the number of recommendations: "))
      print(recommend_content(title, num_recommendations))

    elif choice == '2':
      user_id = int(input("Enter the user ID: "))
      num_recommendations = int(input("Enter the number of recommendations: "))
      print(recommend_collaborative(user_id, num_recommendations))

    elif choice == '3':
      genre_keyword = input("Enter the genre keyword: ")
      print(find_movies_by_genre(genre_keyword))

    elif choice == '4':
      title = input("Enter the movie title: ")
      num_recommendations = int(input("Enter the number of recommendations: "))
      print(recommend_movies_by_genre(title, num_recommendations))

    elif choice == '5':
      print("Thank you for using the Movie Recommendation System!")
      break

Welcome to Movie Recommendation System!!
What would you like to do?
1. Recommend movies based on content
2. Recommend movies based on collaborative filtering
3. Find movies by genre
4. Recommend movies by same genre
5. Exit
Enter your choice (1-5): 1
Enter the movie title: Waiting to Exhale (1995)
Enter the number of recommendations: 5
10        American President, The (1995)
47               Mighty Aphrodite (1995)
52     Postman, The (Postino, Il) (1994)
83                Beautiful Girls (1996)
165       Something to Talk About (1995)
Name: title, dtype: object
What would you like to do?
1. Recommend movies based on content
2. Recommend movies based on collaborative filtering
3. Find movies by genre
4. Recommend movies by same genre
5. Exit
Enter your choice (1-5): 2
Enter the user ID: 5
Enter the number of recommendations: 5
31     Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
314                          Forrest Gump (1994)
334                                 Speed (1994)
418          