
# Movie Recommendation System

## Project Overview
This project aims to build a movie recommendation system using both collaborative filtering and content-based filtering. The recommendation system will provide suggestions based on user preferences or similar movie genres.

### Objective
- Develop a recommendation model to suggest movies to users based on user ratings or movie genres.

### Dataset
We'll be using the [MovieLens dataset](https://grouplens.org/datasets/movielens/), which contains information on users, movies, and ratings.



## 1. Data Loading and Preprocessing
In this section, we load the data, perform initial inspection, and preprocess the data to ensure it is clean and ready for analysis.


In [None]:

# Import necessary libraries
import pandas as pd
import numpy as np

# Load datasets (movies and ratings)
movies = pd.read_csv('movies.csv')    # movie details
ratings = pd.read_csv('ratings.csv')  # user ratings

# Merge datasets for ease of use
data = pd.merge(ratings, movies, on='movieId')

# Display the first few rows of the dataset
data.head()



## 2. Exploratory Data Analysis (EDA)
Here, we analyze the data distribution for ratings and genres, and look at popular movies.


In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

# Distribution of ratings
sns.histplot(data['rating'], bins=10, kde=True)
plt.title('Rating Distribution')
plt.show()



## 3. Modeling
We'll use two methods:
1. **Collaborative Filtering (User-Item Filtering)** using the SVD algorithm.
2. **Content-Based Filtering** using cosine similarity on genres.


### Collaborative Filtering using SVD

In [None]:

from surprise import SVD, Dataset, Reader
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse

# Load data into Surprise format
reader = Reader(rating_scale=(0.5, 5))
data_surprise = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

# Train-test split
trainset, testset = train_test_split(data_surprise, test_size=0.2)

# Model training using SVD
algo = SVD()
algo.fit(trainset)

# Predictions and evaluation
predictions = algo.test(testset)
print(f"RMSE: {rmse(predictions)}")


### Content-Based Filtering using TF-IDF and Cosine Similarity

In [None]:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Vectorize genres or descriptions
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'])

# Compute cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to get movie recommendations
def get_recommendations(title, cosine_sim=cosine_sim):
    idx = movies[movies['title'] == title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return movies['title'].iloc[movie_indices]

# Example usage
print(get_recommendations('Toy Story (1995)'))



## 4. Evaluation
- We evaluated the Collaborative Filtering model using RMSE.
- For Content-Based, we checked recommendations for known movies.



## 5. Deployment (Optional)
To deploy this model as a web service, you can use Flask, Django, or FastAPI. This would allow users to interact with the recommendation system online.
