# **Movie Recommendation System Using Python**

## **Objective**
The objective of this project is to build a movie recommendation system that can suggest movies to users based on their preferences. This system will utilize both content-based filtering and collaborative filtering techniques to provide personalized movie recommendations.

## **Data Source**
- The [MovieLens dataset](https://grouplens.org/datasets/movielens/) is used in this project. It contains information about movies, users, and their ratings.

## **Data Preprocessing**
In this step, we will load the dataset, clean the data, and prepare it for analysis.

In [None]:
import pandas as pd

# Load datasets
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

# Inspect the datasets
print(movies.head())
print(ratings.head())

# Check for missing values
print(movies.isnull().sum())
print(ratings.isnull().sum())

# Fill missing values if any
movies['genres'] = movies['genres'].fillna('')

# Merge datasets
data = pd.merge(ratings, movies, on='movieId')
print(data.head())

## **Exploratory Data Analysis (EDA)**
We will explore the data to understand the distribution of ratings, the number of ratings per movie, and other relevant trends.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Distribution of ratings
plt.figure(figsize=(10, 6))
sns.histplot(data['rating'], bins=5, kde=False)
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

# Number of ratings per movie
ratings_per_movie = data.groupby('title')['rating'].count().sort_values(ascending=False)
print(ratings_per_movie.head(10))

# Visualize the top 10 most rated movies
plt.figure(figsize=(10, 6))
sns.barplot(x=ratings_per_movie.values[:10], y=ratings_per_movie.index[:10], palette='viridis')
plt.title('Top 10 Most Rated Movies')
plt.xlabel('Number of Ratings')
plt.show()

## **Model Building: Content-Based Filtering**
We will build a content-based recommendation system using movie genres.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Vectorize the genres
tfidf = TfidfVectorizer(stop_words='english')
movies['genres'] = movies['genres'].fillna('')
tfidf_matrix = tfidf.fit_transform(movies['genres'])

# Compute the cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to get recommendations
def get_recommendations(title, cosine_sim=cosine_sim):
    idx = movies.index[movies['title'] == title].tolist()[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return movies['title'].iloc[movie_indices]

# Example usage
print(get_recommendations('Toy Story (1995)'))

## **Model Building: Collaborative Filtering**
We will build a collaborative filtering recommendation system using the `surprise` library.

In [None]:
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split, cross_validate

# Load dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_builtin('ml-100k')

# Train/Test split
trainset, testset = train_test_split(data, test_size=0.25)

# Use SVD algorithm
algo = SVD()
algo.fit(trainset)

# Predict ratings for the test set
predictions = algo.test(testset)

# Cross-validation
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

## **Model Evaluation**
We will evaluate the performance of our recommendation models using metrics such as RMSE, Precision, and Recall.

In [None]:
# Example of calculating RMSE (Root Mean Square Error)
from surprise import accuracy
rmse = accuracy.rmse(predictions)

# Cross-validation results are already displayed above
# No further code is needed for basic evaluation metrics


## **Deployment**
The recommendation system can be deployed using a web framework like Flask. This would allow users to input their movie preferences and receive personalized recommendations.

In [None]:
# Basic outline for a Flask application
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/recommend', methods=['POST'])
def recommend():
    user_input = request.json['title']
    recommendations = get_recommendations(user_input)
    return jsonify(recommendations.tolist())

if __name__ == '__main__':
    app.run(debug=True)

## **Conclusion**
In this project, we successfully built a movie recommendation system using both content-based and collaborative filtering techniques. The system was evaluated using RMSE and cross-validation, and it can be further enhanced with a hybrid approach combining both methods.