# Movie Recommendation System
This notebook demonstrates the implementation of a content-based Movie Recommendation System using Python and the Pandas, Scikit-learn libraries. The goal is to recommend similar movies based on textual features like genres, keywords, and cast. It demonstrates the use of CountVectorizer in order to vectorize the movie details for recomendations.

### Importing Required Libraries
We begin by importing the essential libraries for data manipulation and machine learning.

In [113]:
import numpy as np
import pandas as pd
import ast
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pickle

### Loading the Dataset
The dataset containing movie details is loaded using `pandas`.

In [114]:
movies = pd.read_csv('./tmdb_5000_movies.csv')
credits = pd.read_csv('./tmdb_5000_credits.csv')

### Inspecting the Dataset
Let's check the first few rows and get a feel for the data structure.

In [None]:
movies.head(2)

In [None]:
movies.shape

### Selecting Relevant Features
We extract important features from the dataset that will be used to compute similarity between movies.

In [117]:
movies = movies.merge(credits,on='title')

In [118]:
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]

In [None]:
movies.head(2)

### Handling Missing Values
Any missing entries in the selected features are filled with empty strings to avoid processing errors.

In [120]:
movies.dropna(inplace=True)

In [None]:
def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name'])
    return L

movies['genres'] = movies['genres'].apply(convert)
movies.head(2)

In [None]:
movies['keywords'] = movies['keywords'].apply(convert)
movies.head(2)

In [None]:
movies['cast'] = movies['cast'].apply(convert)
movies.head(2)

In [124]:
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

In [125]:
def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L

movies['crew'] = movies['crew'].apply(fetch_director)

In [None]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)
movies.head()

### Combining Features into a Single String
We combine multiple textual features into a single string per movie, which will be vectorized later.

In [None]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

new = movies.drop(columns=['overview','genres','keywords','cast','crew'])

new['tags'] = new['tags'].apply(lambda x: " ".join(x))
new.head()

### Text Vectorization using CountVectorizer
We convert the combined text features into numerical vectors using the Bag of Words (CountVectorizer) technique.

In [None]:
cv = CountVectorizer(max_features=5000,stop_words='english')
vector = cv.fit_transform(new['tags']).toarray()

vector.shape

### Calculating Cosine Similarity Between Movies
Using the generated vectors, we compute the pairwise cosine similarity between all movies.

In [None]:
similarity = cosine_similarity(vector)

similarity

### Defining the Recommendation Function
A function is defined to recommend movies similar to a given movie based on cosine similarity scores.

In [135]:
def recommend(movie, length = 5):
    index = new[new['title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[1:length + 1]:
        print(new.iloc[i[0]].title)

### Testing the Recommendation System
We now test the recommendation function with a sample movie name.

In [None]:
recommend('The Indian in the Cupboard')