<a href="https://colab.research.google.com/github/Sai-Darshan-3000/YBI-internship-Project/blob/main/Untitled4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Movie Recommendation

**Objective:**

To prompt the users for their favourite movie from a set of movies and to recommend a list of similar movies based on the users' prompt.

**Data Source:**

https://github.com/YBIFoundation/Dataset

**Import Library**

In [14]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
import numpy as np

**Import Data**

In [15]:
file_path = '/content/Movies Recommendation.csv'
movies_df = pd.read_csv(file_path)

**Describe Data and Visualization**

In [16]:
print(movies_df.head())

   Movie_ID      Movie_Title                       Movie_Genre Movie_Language  \
0         1       Four Rooms                      Crime Comedy             en   
1         2        Star Wars  Adventure Action Science Fiction             en   
2         3     Finding Nemo                  Animation Family             en   
3         4     Forrest Gump              Comedy Drama Romance             en   
4         5  American Beauty                             Drama             en   

   Movie_Budget  Movie_Popularity Movie_Release_Date  Movie_Revenue  \
0       4000000         22.876230         09-12-1995        4300000   
1      11000000        126.393695         25-05-1977      775398007   
2      94000000         85.688789         30-05-2003      940335536   
3      55000000        138.133331         06-07-1994      677945399   
4      15000000         80.878605         15-09-1999      356296601   

   Movie_Runtime  Movie_Vote  ...  \
0           98.0         6.5  ...   
1          1

**Data Preprocessing**

In [17]:
movies_df = movies_df.fillna('')

# Combine relevant features into a single string for each movie
def combine_features(row):
    return (str(row['Movie_ID']) + ' ' +
            row['Movie_Title'] + ' ' +
            row['Movie_Genre'] + ' ' +
            row['Movie_Language'] + ' ' +
            str(row['Movie_Budget']) + ' ' +
            str(row['Movie_Popularity']) + ' ' +
            row['Movie_Release_Date'] + ' ' +
            str(row['Movie_Revenue']) + ' ' +
            str(row['Movie_Runtime']) + ' ' +
            str(row['Movie_Vote']) + ' ' +
            row['Movie_Homepage'] + ' ' +
            row['Movie_Keywords'] + ' ' +
            row['Movie_Overview'] + ' ' +
            row['Movie_Production_House'] + ' ' +
            row['Movie_Production_Country'] + ' ' +
            row['Movie_Spoken_Language'] + ' ' +
            row['Movie_Tagline'] + ' ' +
            row['Movie_Cast'] + ' ' +
            row['Movie_Crew'] + ' ' +
            row['Movie_Director'])

# Apply the function to create a combined feature set
movies_df['combined_features'] = movies_df.apply(combine_features, axis=1)


**Target**

In [18]:
print(movies_df['combined_features'].head())

0    1 Four Rooms Crime Comedy en 4000000 22.87623 ...
1    2 Star Wars Adventure Action Science Fiction e...
2    3 Finding Nemo Animation Family en 94000000 85...
3    4 Forrest Gump Comedy Drama Romance en 5500000...
4    5 American Beauty Drama en 15000000 80.878605 ...
Name: combined_features, dtype: object


**Modeling**

In [19]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies_df['combined_features'])
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [20]:
def recommend_movies(title, cosine_sim=cosine_sim):
    if title.lower() not in movies_df['Movie_Title'].str.lower().values:
        return "Movie not found in the dataset."

    # Get the index of the movie that matches the title
    idx = movies_df[movies_df['Movie_Title'].str.lower() == title.lower()].index[0]

    # Get the pairwise similarity scores of all movies with the given movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the indices of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie titles
    movie_indices = [i[0] for i in sim_scores]
    return movies_df['Movie_Title'].iloc[movie_indices]


**Train and Test Split**

In [21]:
train, test = train_test_split(movies_df, test_size=0.2, random_state=42)

**Evaluation**

In [22]:
def evaluate_model(train, test):
    y_true = []
    y_pred = []

    for idx, row in test.iterrows():
        title = row['Movie_Title']
        actual_similar_movies = train[train.index != idx]['Movie_Title'].tolist()
        recommended_movies = recommend_movies(title)

        # Create binary vectors for evaluation
        y_true_vector = [1 if movie in actual_similar_movies else 0 for movie in recommended_movies]
        y_pred_vector = [1] * len(recommended_movies)

        y_true.extend(y_true_vector)
        y_pred.extend(y_pred_vector)

    precision = precision_score(y_true, y_pred, average='macro')
    recall = recall_score(y_true, y_pred, average='macro')
    f1 = f1_score(y_true, y_pred, average='macro')

    return precision, recall, f1

# Evaluate the model
precision, recall, f1 = evaluate_model(train, test)
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1 Score: {f1}')

Precision: 0.3914390756302521
Recall: 0.5
F1 Score: 0.43910917339303596


  _warn_prf(average, modifier, msg_start, len(result))


**Prediction:**

In [23]:
user_movie = input("Enter a movie name: ")
print("Recommended movies:")
recommendations = recommend_movies(user_movie)
if isinstance(recommendations, str):
    print(recommendations)
else:
    for movie in recommendations:
        print(movie)

Enter a movie name: Star Wars
Recommended movies:
The Empire Strikes Back
Alexander
The Day After Tomorrow
Jurassic World
The Chronicles of Riddick
The Core
Master and Commander: The Far Side of the World
Harry Potter and the Chamber of Secrets
Blade: Trinity
Kill Bill: Vol. 1


## **Explanation:**

***Loading and Preprocessing Data:***

The dataset is loaded and preprocessed to combine relevant features into a single string.

***TF-IDF Vectorization and Cosine Similarity:***

The combined features are converted into numerical format using TF-IDF, and cosine similarity is computed.

***Recommendation Function:***

The recommend_movies function is defined to get the top 10 similar movies based on the cosine similarity.

***Train-Test Split:***

The dataset is split into training and test sets.

***Evaluation Function***:  

The evaluate_model function computes precision, recall, and F1 score by comparing the recommended movies with the actual similar movies.

***Evaluation Metrics***:

The precision, recall, and F1 score are printed to evaluate the model's performance.

This evaluation approach assumes that the ground truth for similar movies is the set of all movies excluding the one being tested. This is a basic evaluation approach; in practice, you might have a labeled dataset with known similar movies for more accurate evaluation.