Project 2: Movie Recommendation System

In this project, I built a simple movie recommendation system that suggests movies based on similarity.
The goal is to retrieve movies that are most similar to a given movie using feature-based similarity.

We use the k-Nearest Neighbors (kNN) algorithm as a similarity finder, not as a classifier or regressor.
The dataset includes a mix of Telugu and English movies.

I imported the required Python libraries.
Pandas is used to create and manipulate the movie dataset, while NumPy helps with numerical operations.

In [1]:
import pandas as pd
import numpy as np

I created a small movie dataset consisting of Telugu and English movies. (Based on what I usually like to watch).
Each movie is represented using genre-based features.
To keep the representation simple, we use three genres: Action, Drama, and Sci-Fi.

In [2]:
movies = pd.DataFrame({
    "movie": [
        "Baahubali",
        "Baahubali 2",
        "RRR",
        "Eega",
        "Magadheera",
        "Mahanati",
        "Jersey",
        "Inception",
        "Interstellar"
    ],
    "Action": [1, 1, 1, 0, 1, 0, 0, 1, 1],
    "Drama":  [1, 1, 1, 1, 1, 1, 1, 0, 1],
    "SciFi":  [0, 0, 0, 1, 0, 0, 0, 1, 1]
})

movies

Unnamed: 0,movie,Action,Drama,SciFi
0,Baahubali,1,1,0
1,Baahubali 2,1,1,0
2,RRR,1,1,0
3,Eega,0,1,1
4,Magadheera,1,1,0
5,Mahanati,0,1,0
6,Jersey,0,1,0
7,Inception,1,0,1
8,Interstellar,1,1,1


Each movie is represented as a numerical feature vector based on genre presence.
A value of 1 indicates the presence of a genre, while 0 indicates its absence.
This feature representation allows the model to compute similarity using distance metrics.

In [3]:
X = movies.drop("movie", axis=1)
movie_titles = movies["movie"]

I used the k-Nearest Neighbors algorithm as a similarity search method.
Given a movie, kNN finds the most similar movies based on distance between feature vectors.
There is no target variable because this is a recommendation (retrieval) task.

In [4]:
from sklearn.neighbors import NearestNeighbors

knn = NearestNeighbors(n_neighbors=3, metric="euclidean")
knn.fit(X)

I have defined a function that takes a movie name as input and returns the most similar movies.
The model retrieves neighbors based on feature similarity.

In [5]:
def recommend(movie_name):
    index = movie_titles[movie_titles == movie_name].index[0]
    distances, indices = knn.kneighbors([X.iloc[index]])

    print(f"Movies similar to '{movie_name}':\n")
    for i in indices[0][1:]:
        print(movie_titles[i])

Now I test the recommendation system using different movies from the dataset.
The output shows movies that share similar genre characteristics.

In [6]:
recommend("Baahubali")

Movies similar to 'Baahubali':

Baahubali
Baahubali 2




This project demonstrates a basic item-based movie recommendation system.
By representing movies using simple genre features and applying kNN for similarity search, we can retrieve meaningful recommendations without using labels or predictions.