<a href="https://colab.research.google.com/github/PSivaMallikarjun/Movie-Recommendation-System-Using-AI-ML/blob/main/Implementation_of_a_Movie_Recommendation_System_using_the_IMDB_dataset_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Movie Recommendation System Using AI/ML

Building a Movie Recommendation System

Using the IMDB movie dataset, we will make a software that recommends 5 different movies that are most similar to that movie for any user watching a particular movie. You know, when you watch a movie on NETFLIX, it says the following may also interest you, just like that. While doing this, we will establish a Recommendation System by analyzing the likes of all users in the database who watched and liked the movie.

 we are doing in the colab in google give me the code spinets according this project.

This system uses Cosine Similarity with TF-IDF Vectorization for content-based filtering.

Steps to Use in Google Colab:
Upload the IMDB dataset (movies.csv) to Colab.
Replace the url with your uploaded file path.
Run the code to get recommendations by changing the movie_name variable.

This project is an AI-powered movie recommendation system that suggests similar movies based on user input, just like Netflix's "You May Also Like" feature. It uses Machine Learning (ML) techniques to find similarities between movies based on their genres, descriptions, and titles.

1. Data Collection & Preprocessing

The dataset consists of movie titles, genres, and overviews (descriptions).
The dataset is stored in a CSV file (movies.csv) and loaded into a Pandas DataFrame for processing.
All movie titles are converted to lowercase to prevent case-sensitive mismatches.
A new column called combined_features is created by merging title, genre, and overview into a single text block.

2. Feature Extraction (TF-IDF Vectorization)

To compare movie descriptions mathematically, we convert text into numerical data using TF-IDF (Term Frequency-Inverse Document Frequency).
The TfidfVectorizer from sklearn is used to transform text into a structured format that ML models can understand.
Stop words (common words like "the", "and", "is") are removed to focus only on important words.

3. Finding Similar Movies Using Cosine Similarity

Cosine Similarity measures how similar two movies are based on their text features.
It computes a similarity score for each movie compared to all others in the dataset.
A higher similarity score means the movies are more alike.

4. Movie Recommendation Process

The user enters a movie title.
The system searches for the movie in the dataset.
The system finds the 5 most similar movies by ranking them based on their cosine similarity score.
The recommended movies are displayed as output.

Example Workflow
🔹 User Input: "The Dark Knight"
🔹 System Processes:

Finds "The Dark Knight" in the dataset.
Compares it with all other movies using Cosine Similarity.
Returns 5 most similar movies based on plot and genre.

Key Technologies Used
✅ Python – The main programming language.
✅ Pandas – For handling movie data in tabular form.
✅ Scikit-Learn (sklearn) – For implementing TF-IDF and Cosine Similarity.
✅ NLP (Natural Language Processing) – Helps convert text into a format that AI understands.



Output Recommendation:
markdown
Copy
Edit
Movies similar to The Dark Knight are:
1. Inception
2. Interstellar
3. The Prestige
4. Memento


Real-World Applications
🔹 Netflix, Amazon Prime Video, Disney+ – Personalized movie suggestions.
🔹 E-commerce (Amazon, Flipkart) – Recommending similar products.
🔹 Music Streaming (Spotify, Apple Music) – Song recommendations.

This project provides a fundamental understanding of AI/ML-driven recommendations and can be expanded with user behavior analysis and deep learning models for even better predictions.

In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Create a sample IMDB dataset
data = {
    'title': ["The Dark Knight", "Inception", "Interstellar", "The Prestige", "Memento"],
    'genres': ["Action Crime Drama", "Action Adventure Sci-Fi", "Adventure Drama Sci-Fi", "Drama Mystery Sci-Fi", "Mystery Thriller"],
    'overview': [
        "When the menace known as The Joker emerges, Batman must accept one of the greatest psychological tests of his ability to fight injustice.",
        "A thief who enters the dreams of others to steal secrets from their subconscious is given the inverse task of planting an idea into someone's mind.",
        "A team of explorers travel through a wormhole in space in an attempt to ensure humanity's survival.",
        "After a tragic accident, two stage magicians engage in a battle to create the ultimate illusion while sacrificing everything they have to outwit each other.",
        "A man with short-term memory loss attempts to track down his wife's murderer."
    ]
}

df = pd.DataFrame(data)

df.to_csv("movies.csv", index=False)

# Load the generated dataset
df = pd.read_csv("movies.csv")

# Display the first few rows
df.head()

# Select important features for recommendations
df['combined_features'] = df['title'] + " " + df['genres'] + " " + df['overview']

# Convert text data into TF-IDF vectors
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['combined_features'].fillna(''))

# Compute the Cosine Similarity between movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to get movie recommendations
def get_recommendations(movie_title, num_recommendations=5):
    if movie_title not in df['title'].values:
        return "Movie not found in the dataset!"

    movie_index = df[df['title'] == movie_title].index[0]
    similarity_scores = list(enumerate(cosine_sim[movie_index]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    top_movie_indices = [i[0] for i in similarity_scores[1:num_recommendations+1]]
    return df['title'].iloc[top_movie_indices]

# Example Usage
movie_name = "The Dark Knight"
print("Movies similar to", movie_name, "are:")
print(get_recommendations(movie_name))


Movies similar to The Dark Knight are:
1       Inception
2    Interstellar
3    The Prestige
4         Memento
Name: title, dtype: object


In [4]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os

# Create a sample IMDB dataset
movie_data = {
    'title': ["The Dark Knight", "Inception", "Interstellar", "The Prestige", "Memento"],
    'genres': ["Action Crime Drama", "Action Adventure Sci-Fi", "Adventure Drama Sci-Fi", "Drama Mystery Sci-Fi", "Mystery Thriller"],
    'overview': [
        "When the menace known as The Joker emerges, Batman must accept one of the greatest psychological tests of his ability to fight injustice.",
        "A thief who enters the dreams of others to steal secrets from their subconscious is given the inverse task of planting an idea into someone's mind.",
        "A team of explorers travel through a wormhole in space in an attempt to ensure humanity's survival.",
        "After a tragic accident, two stage magicians engage in a battle to create the ultimate illusion while sacrificing everything they have to outwit each other.",
        "A man with short-term memory loss attempts to track down his wife's murderer."
    ]
}

df = pd.DataFrame(movie_data)

dataset_filename = "movies.csv"
if not os.path.exists(dataset_filename):
    df.to_csv(dataset_filename, index=False)

# Load the generated dataset
df = pd.read_csv(dataset_filename)

# Ensure all movie titles are in lowercase for better matching
df['title'] = df['title'].str.lower()

# Select important features for recommendations
df['combined_features'] = df['title'] + " " + df['genres'] + " " + df['overview']

# Convert text data into TF-IDF vectors
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['combined_features'].fillna(''))

# Compute the Cosine Similarity between movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to get movie recommendations
def get_recommendations(movie_title, num_recommendations=5):
    movie_title = movie_title.lower()
    if movie_title not in df['title'].values:
        return "Movie not found in the dataset!"

    movie_index = df[df['title'] == movie_title].index[0]
    similarity_scores = list(enumerate(cosine_sim[movie_index]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    top_movie_indices = [i[0] for i in similarity_scores[1:num_recommendations+1]]
    return df['title'].iloc[top_movie_indices].tolist()

# Allow user input for dynamic recommendations
movie_name = input("Enter a movie title: ")
print("Movies similar to", movie_name, "are:")
print(get_recommendations(movie_name))


Enter a movie title: The Dark Knight
Movies similar to The Dark Knight are:
['inception', 'interstellar', 'the prestige', 'memento']


**Summary** :
This AI-powered Movie Recommendation System utilizes Machine Learning (ML) and Natural Language Processing (NLP) to suggest similar movies based on a given movie title. It follows a structured pipeline:

Dataset Creation & Preprocessing – A sample IMDb dataset is generated and formatted for analysis.
Feature Extraction – Uses TF-IDF Vectorization to convert text-based movie features into numerical form.
Similarity Calculation – Employs Cosine Similarity to measure the closeness between movies.
Movie Recommendations – Given a movie title, the system finds and recommends 5 most similar movies based on genre, description, and title.
Key Takeaways
AI/ML enhances movie recommendations by analyzing text-based metadata.
Cosine Similarity efficiently finds related movies.
This approach is used in platforms like Netflix, Prime Video, and YouTube for content recommendations.
This project serves as a foundation for more advanced recommendation engines integrating user preferences, ratings, and deep learning models for improved accuracy.