# ðŸŽ¬ Movie Recommendation System (Content-Based Filtering)
This notebook builds a simple AI-based movie recommender using TF-IDF and cosine similarity. You can later export it as a Python module to integrate with your MERN stack.

In [1]:
# Install required packages
!pip install pandas scikit-learn




[notice] A new release of pip is available: 24.3.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## Step 1: Import Libraries and Create Dataset

In [2]:
import pandas as pd

# Example movie dataset
data = {
    'title': [
        'Inception',
        'The Matrix',
        'Interstellar',
        'The Dark Knight',
        'Avatar'
    ],
    'overview': [
        'A thief who steals corporate secrets through dream-sharing technology.',
        'A computer hacker learns about the true nature of reality and his role in the war.',
        'A team of explorers travel through a wormhole in space to ensure humanity\'s survival.',
        'When the menace known as the Joker wreaks havoc in Gotham, Batman must accept his greatest test.',
        'A marine on an alien planet becomes torn between following orders and protecting his home.'
    ],
    'genres': [
        'Action Sci-Fi Thriller',
        'Action Sci-Fi',
        'Adventure Drama Sci-Fi',
        'Action Crime Drama',
        'Action Adventure Fantasy'
    ]
}

df = pd.DataFrame(data)
df

Unnamed: 0,title,overview,genres
0,Inception,A thief who steals corporate secrets through d...,Action Sci-Fi Thriller
1,The Matrix,A computer hacker learns about the true nature...,Action Sci-Fi
2,Interstellar,A team of explorers travel through a wormhole ...,Adventure Drama Sci-Fi
3,The Dark Knight,When the menace known as the Joker wreaks havo...,Action Crime Drama
4,Avatar,A marine on an alien planet becomes torn betwe...,Action Adventure Fantasy


## Step 2: Build the Recommender Model

In [3]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Combine relevant text fields
df['combined'] = df['overview'] + ' ' + df['genres']

# Create TF-IDF matrix
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(df['combined'])

# Compute cosine similarity between all movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Create reverse lookup map for movie title -> index
indices = pd.Series(df.index, index=df['title']).drop_duplicates()

def get_recommendations(title, num_recommendations=5):
    if title not in indices:
        return []
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:num_recommendations + 1]
    movie_indices = [i[0] for i in sim_scores]
    return df['title'].iloc[movie_indices].tolist()

## Step 3: Test the Recommendation System

In [4]:
get_recommendations('Inception')

['The Matrix', 'Interstellar', 'Avatar', 'The Dark Knight']

## Step 4: (Optional) Save this Model as a Python File
Once satisfied, export it as `recommender.py` and integrate it with FastAPI for your MERN backend.

In [5]:
with open('recommender.py', 'w') as f:
    f.write('''\
import pandas as pd\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.metrics.pairwise import cosine_similarity\n\ndata = ''' + str(data) + '''\n\ndf = pd.DataFrame(data)\ndf['combined'] = df['overview'] + ' ' + df['genres']\nvectorizer = TfidfVectorizer(stop_words='english')\ntfidf_matrix = vectorizer.fit_transform(df['combined'])\ncosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)\nindices = pd.Series(df.index, index=df['title']).drop_duplicates()\n\ndef get_recommendations(title, num_recommendations=5):\n    if title not in indices:\n        return []\n    idx = indices[title]\n    sim_scores = list(enumerate(cosine_sim[idx]))\n    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)\n    sim_scores = sim_scores[1:num_recommendations + 1]\n    movie_indices = [i[0] for i in sim_scores]\n    return df['title'].iloc[movie_indices].tolist()\n''')