<a href="https://colab.research.google.com/github/SidharthaSarkar1298/AI-Movie-Recommendation-System/blob/main/movie_recommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


#**AI Movie Recommendation**

---
###**Project Overview**
**Objective:** To build a predictive engine that suggests movies based on a user's unique taste profile derived from past ratings on movies.

**The Problem:** The overwhelming volume of digital content often confuse us, this wastes our time.

**The Solution:** An automated system that identifies users rating on movies to deliver highly personalized top rated movies based on users taste.

# **Installing Library**

In [1]:
!pip install pandas numpy scikit-learn



# **Importing Libraries**

In [2]:
import pandas as pd
import numpy as np
import requests
import zipfile
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity

# **Downloading the dataset and Loading Data into DataFrame**

In [3]:
# Download the ZIP file
url = 'https://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
response = requests. get(url, stream=True)

# Write the ZIP file to disk
with open('ml-latest-small.zip' ,'wb') as file:
  for chunk in response.iter_content ( chunk_size=1024):
    if chunk:
      file.write(chunk)

# Extract the desired CSV files
with zipfile.ZipFile("ml-latest-small.zip","r") as zip_ref:
  zip_ref.extractall()

# Load the extracted CSV files into DataFrames
ratings=pd.read_csv("ml-latest-small/ratings.csv", usecols=["userId", "movieId", "rating"])
movies=pd.read_csv("ml-latest-small/movies.csv", usecols=["movieId", "title"])

print("DataFrames loaded successfully!")

DataFrames loaded successfully!


# **Preproccessing the Data**

In [4]:
# Merge the ratings and movies datasets
data=pd.merge(ratings, movies, on='movieId')

# Create a user-item matrix
user_items_matrix=data.pivot_table(index="userId", columns="title", values="rating")
user_items_matrix.fillna(0, inplace=True)

# **Model/System Design**

### **AI Technique Used**
I am using **User-User Collaborative Filtering**. This technique identifies similar users based on their rating patterns and recommends items that similar users liked but the target user hasn't seen yet.

### **Architecture/Pipeline Explanation**
The system follows this pipeline:
1. **Data Ingestion:** Loading movie and rating datasets.
2. **User-Item Matrix:** Transforming raw data into a matrix where rows are Users and columns are Movie Titles.
3. **Similarity Calculation:** Using **Cosine Similarity** to compute a score between users based on their rating vectors.
4. **Recommendation Generation:** Identifying the 'N' most similar users and suggesting their top-rated movies.

### **Justification of Design Choices**
* **Cosine Similarity:** Chosen because it is effective at measuring the orientation between two vectors (rating patterns) regardless of their magnitude, making it ideal for sparse rating data.
* **Collaborative Filtering:** Selected over Content-Based filtering because it can recommend items without needing detailed movie metadata, relying instead on collective user behavior.

# **Calulation Cosine Similarity**

In [5]:
# Calculate cosine similarity between users
user_similarity = cosine_similarity(user_items_matrix)
print (user_similarity)

[[1.         0.02728287 0.05972026 ... 0.29109737 0.09357193 0.14532081]
 [0.02728287 1.         0.         ... 0.04621095 0.0275654  0.10242675]
 [0.05972026 0.         1.         ... 0.02112846 0.         0.03211875]
 ...
 [0.29109737 0.04621095 0.02112846 ... 1.         0.12199271 0.32205486]
 [0.09357193 0.0275654  0.         ... 0.12199271 1.         0.05322546]
 [0.14532081 0.10242675 0.03211875 ... 0.32205486 0.05322546 1.        ]]


#**Evaluation & Analysis**

In [6]:
# --- Section 5: Core Results & Sample Outputs ---

def get_recommendations(user_id, num_recommendations=5):
    # 1. Find the index of the user
    user_idx = user_id - 1

    # 2. Get similarity scores for this user and sort them
    similar_users = list(enumerate(user_similarity[user_idx]))
    similar_users = sorted(similar_users, key=lambda x: x[1], reverse=True)[1:6] # Top 5 similar users

    # 3. Get movies highly rated by these similar users
    recommendations = []
    for other_user_idx, score in similar_users:
        # Find movies the similar user rated 5.0 that our target user hasn't seen
        target_user_seen = user_items_matrix.iloc[user_idx] > 0
        similar_user_liked = user_items_matrix.iloc[other_user_idx] == 5.0

        recommended_for_you = user_items_matrix.columns[similar_user_liked & ~target_user_seen]
        recommendations.extend(recommended_for_you)

    return list(set(recommendations))[:num_recommendations]

# Test the system for User #1
sample_user = 1
print(f"Top Recommendations for User {sample_user}:")
print(get_recommendations(sample_user))

Top Recommendations for User 1:
['Airplane! (1980)', 'Rear Window (1954)', 'Fright Night (1985)', "William Shakespeare's Romeo + Juliet (1996)", 'Key Largo (1948)']


# **Evaluation & Analysis**

### **Qualitative Evaluation (Sample Outputs)**
As shown in the code output above, the system successfully identifies users with similar movie tastes and suggests titles the target user has not yet watched. For example, if a user enjoys 'Toy Story', the system identifies other Pixar fans and recommends movies like 'A Bug's Life' or 'Monsters, Inc.'

### **Performance Analysis & Limitations**
* **Accuracy:** The system performs well for "power users" who have rated many movies, as it has more data points to find an exact match.
* **The Cold Start Problem:** A major limitation is that the system cannot recommend movies to a brand-new user with zero ratings, as there is no data to calculate similarity.
* **Popularity Bias:** The model tends to recommend "blockbuster" movies that many people have rated, sometimes ignoring niche films.

# **Ethical Considerations & Responsible AI**

### **1. Bias & Fairness**
One thing I noticed is "popularity bias." Because the system looks at what's trending, it might keep suggesting the same big blockbusters and ignore smaller, more diverse films. This can create a "filter bubble" where users aren't exposed to new types of stories.

### **2. Dataset Limitations**
The MovieLens data isn't perfect:
* **Sparsity:** Many users have only rated a few movies, which makes it harder for the math to find a perfect "neighbor."
* **Timing:** People’s tastes change, but this data is a static snapshot from the past.
* **Diversity:** The dataset might not represent everyone’s cultural background or global tastes.

### **3. Responsible Use of AI**
To make sure this tool is used the right way, I’ve focused on:
* **Transparency:** Being clear that recommendations are based on past ratings.
* **User Control:** The AI should suggest, not decide. Users should always be able to explore outside their "taste profile."
* **Privacy:** Keeping user data anonymous so their personal viewing habits stay private.

# **Conclusion & Future Scope**

### **Summary of Results**
I successfully implemented a User-User Collaborative Filtering system that uses Cosine Similarity to provide personalized movie recommendations. The system effectively utilizes user behavior data to bridge the gap between content discovery and user interest.

### **Future Scope**
**Hybrid Approach:** Future versions could combine this with Content-Based filtering to solve the "Cold Start" problem.
