# **Recommendation System using Cosine Similarity**
In this project, our next task is to recommend movies to users based on their emotion. For this, we're using Cosine Similarity along with TF-IDF.
- Cosine Similarity compares the mood vector from user input with movie vectors to find the most relevant ones:
\begin{align}
  Cos Sim(𝐴 ,𝐵) &=  \dfrac{𝐴⋅𝐵}{∥𝐴∥ ∥𝐵∥}
\end{align}

- TF-IDF converts movie description into numerical vectors based on term importance:
\begin{align}
  TF-IDF(t,d) &=  TF(t,d) × log( \dfrac{N} {DF(t)})
\end{align}


Here, we have used **Netflix Movies and TV Shows Dataset** which contains information about movies and TV shows on Netflix.

### **Loading the Dataset**

In [39]:
import pandas as pd

movies = pd.read_csv('/content/Movies_Dataset.csv')

### **Displaying the Dataset**

In [40]:
movies.head(5)

Unnamed: 0,Show Id,Title,Description,Director,Genres,Cast,Production Country,Release Date,Rating,Duration,Imdb Score,Content Type,Date Added
0,e2ef4e91-fb25-42ab-b485-be8e3b23dedb,Alive,"As a grisly virus rampages a city, a lone man ...",Cho Il,"Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",South Korea,2020,TV-MA,99 min,6.2/10,Movie,"September 8, 2020"
1,b01b73b7-81f6-47a7-86d8-acb63080d525,AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...","Sabina Fedeli, Anna Migotto","Documentaries, International Movies","Helen Mirren, Gengher Gatti",Italy,2019,TV-14,95 min,6.4/10,Movie,"July 1, 2020"
2,79afe5bd-0dcd-4cac-9072-f77f46f355ef,2 States,Graduate students Krish and Ananya hope to win...,Abhishek Varman,"Comedies, Dramas, International Movies","Alia Bhatt, Arjun Kapoor, Ronit Roy, Amrita Si...",India,2014,TV-PG,143 min,7.0/10,Movie,"August 4, 2018"
3,ee4f5117-cf68-4862-b61c-310fb8c7379b,13 Reasons Why,"After a teenage girl's perplexing suicide, a c...",,"Crime TV Shows, TV Dramas, TV Mysteries","Dylan Minnette, Katherine Langford, Kate Walsh...",United States,2020,TV-MA,4 Seasons,7.5/10,TV Show,"June 5, 2020"
4,eda9ad19-b8d7-40dd-b3d2-4ff997387d53,13 Reasons Why: Beyond the Reasons,"Cast members, writers, producers and mental he...",,"Crime TV Shows, Docuseries","Dylan Minnette, Katherine Langford, Kate Walsh...",United States,2019,TV-MA,3 Seasons,7.7/10,TV Show,"August 23, 2019"


### **Data Preprocessing**

In [41]:
movies = movies[['Title','Description','Genres','Cast','Imdb Score']]

In [42]:
movies.head()

Unnamed: 0,Title,Description,Genres,Cast,Imdb Score
0,Alive,"As a grisly virus rampages a city, a lone man ...","Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",6.2/10
1,AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...","Documentaries, International Movies","Helen Mirren, Gengher Gatti",6.4/10
2,2 States,Graduate students Krish and Ananya hope to win...,"Comedies, Dramas, International Movies","Alia Bhatt, Arjun Kapoor, Ronit Roy, Amrita Si...",7.0/10
3,13 Reasons Why,"After a teenage girl's perplexing suicide, a c...","Crime TV Shows, TV Dramas, TV Mysteries","Dylan Minnette, Katherine Langford, Kate Walsh...",7.5/10
4,13 Reasons Why: Beyond the Reasons,"Cast members, writers, producers and mental he...","Crime TV Shows, Docuseries","Dylan Minnette, Katherine Langford, Kate Walsh...",7.7/10


### **Handling Missing Values**

In [43]:
movies.dropna(inplace=True)

In [44]:
movies['Description'] = movies['Description'].apply(lambda x: x.split() if isinstance(x, str) else x)
movies.head(5)

Unnamed: 0,Title,Description,Genres,Cast,Imdb Score
0,Alive,"[As, a, grisly, virus, rampages, a, city,, a, ...","Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",6.2/10
1,AnneFrank - Parallel Stories,"[Through, her, diary,, Anne, Frank's, story, i...","Documentaries, International Movies","Helen Mirren, Gengher Gatti",6.4/10
2,2 States,"[Graduate, students, Krish, and, Ananya, hope,...","Comedies, Dramas, International Movies","Alia Bhatt, Arjun Kapoor, Ronit Roy, Amrita Si...",7.0/10
3,13 Reasons Why,"[After, a, teenage, girl's, perplexing, suicid...","Crime TV Shows, TV Dramas, TV Mysteries","Dylan Minnette, Katherine Langford, Kate Walsh...",7.5/10
4,13 Reasons Why: Beyond the Reasons,"[Cast, members,, writers,, producers, and, men...","Crime TV Shows, Docuseries","Dylan Minnette, Katherine Langford, Kate Walsh...",7.7/10


### **Cleaning Texts by Removing Spaces**

In [45]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

### **Displaying Unique Genres from the Dataset**

In [46]:
movies['Genres'] = movies['Genres'].astype(str).str.lower().str.strip()

genre_set = set()
for genres in movies['Genres']:
    for g in genres.split(','):
        genre_set.add(g.strip())
print("Unique genres in dataset:", sorted(genre_set))

Unique genres in dataset: ['action & adventure', 'anime features', 'anime series', 'british tv shows', 'children & family movies', 'classic & cult tv', 'classic movies', 'comedies', 'crime tv shows', 'cult movies', 'documentaries', 'docuseries', 'dramas', 'horror movies', 'independent movies', 'international movies', 'international tv shows', "kids' tv", 'korean tv shows', 'lgbtq movies', 'movies', 'music & musicals', 'reality tv', 'romantic movies', 'romantic tv shows', 'sci-fi & fantasy', 'spanish-language tv shows', 'sports movies', 'teen tv shows', 'thrillers', 'tv action & adventure', 'tv comedies', 'tv dramas', 'tv horror', 'tv mysteries', 'tv sci-fi & fantasy', 'tv shows', 'tv thrillers']


### **Mapping existing Genre Labels to Standardized Genre Categories**

In [47]:
genre_map = {
    'action & adventure': ['action', 'adventure'],
    'anime features': ['animation'],
    'anime series': ['animation'],
    'british tv shows': ['drama'],
    'children & family movies': ['family'],
    'classic & cult tv': ['drama'],
    'classic movies': ['drama'],
    'comedies': ['comedy'],
    'crime tv shows': ['crime'],
    'cult movies': ['drama'],
    'documentaries': ['documentary'],
    'docuseries': ['documentary'],
    'dramas': ['drama'],
    'horror movies': ['horror'],
    'independent movies': ['drama'],
    'international movies': ['drama'],
    'international tv shows': ['drama'],
    "kids' tv": ['family'],
    'korean tv shows': ['drama'],
    'lgbtq movies': ['drama'],
    'movies': ['drama'],
    'music & musicals': ['musical'],
    'reality tv': ['drama'],
    'romantic movies': ['romance'],
    'romantic tv shows': ['romance'],
    'sci-fi & fantasy': ['sci-fi', 'fantasy'],
    'spanish-language tv shows': ['drama'],
    'sports movies': ['sports'],
    'teen tv shows': ['drama'],
    'thrillers': ['thriller'],
    'tv action & adventure': ['action', 'adventure'],
    'tv comedies': ['comedy'],
    'tv dramas': ['drama'],
    'tv horror': ['horror'],
    'tv mysteries': ['mystery'],
    'tv sci-fi & fantasy': ['sci-fi', 'fantasy'],
    'tv shows': ['drama'],
    'tv thrillers': ['thriller']
}

In [48]:
import pandas as pd

movies['Genres'] = movies['Genres'].astype(str).str.lower().str.strip()

def map_genres(genre_str):
    genres = genre_str.split(',')
    M = []
    for g in genres:
        g = g.strip()
        if g in genre_map:
            M.extend(genre_map[g])
    M = list(set(M))
    return ', '.join(M) if M else 'other'

movies['Genres'] = movies['Genres'].apply(map_genres)

print("Mapped Genre Values:\n", movies['Genres'].value_counts())

Mapped Genre Values:
 Genres
comedy, drama                      54
drama                              48
action, adventure, drama           43
romance, drama                     32
comedy, romance, drama             23
                                   ..
crime, action, adventure            1
sci-fi, comedy, family, fantasy     1
crime, horror, mystery              1
documentary, sports                 1
sci-fi, fantasy, mystery, drama     1
Name: count, Length: 74, dtype: int64


In [49]:
movies.head()

Unnamed: 0,Title,Description,Genres,Cast,Imdb Score
0,Alive,"[As, a, grisly, virus, rampages, a, city,, a, ...","horror, thriller, drama","Yoo Ah-in, Park Shin-hye",6.2/10
1,AnneFrank - Parallel Stories,"[Through, her, diary,, Anne, Frank's, story, i...","documentary, drama","Helen Mirren, Gengher Gatti",6.4/10
2,2 States,"[Graduate, students, Krish, and, Ananya, hope,...","comedy, drama","Alia Bhatt, Arjun Kapoor, Ronit Roy, Amrita Si...",7.0/10
3,13 Reasons Why,"[After, a, teenage, girl's, perplexing, suicid...","crime, mystery, drama","Dylan Minnette, Katherine Langford, Kate Walsh...",7.5/10
4,13 Reasons Why: Beyond the Reasons,"[Cast, members,, writers,, producers, and, men...","crime, documentary","Dylan Minnette, Katherine Langford, Kate Walsh...",7.7/10


### **Creating  Tags by Combining Description, Genres and Cast for each movie**

In [50]:
movies['Genres'] = movies['Genres'].apply(lambda x: x.split(', '))
movies['Cast'] = movies['Cast'].apply(lambda x: x.split(', '))
movies['Cast'] = movies['Cast'].apply(collapse)

movies['tags'] = movies['Description'] + movies['Genres'] + movies['Cast']

In [51]:
new = movies.drop(columns=['Description','Cast'])

In [52]:
new['tags'] = new['tags'].apply(lambda x: " ".join(x))
new.head()

Unnamed: 0,Title,Genres,Imdb Score,tags
0,Alive,"[horror, thriller, drama]",6.2/10,"As a grisly virus rampages a city, a lone man ..."
1,AnneFrank - Parallel Stories,"[documentary, drama]",6.4/10,"Through her diary, Anne Frank's story is retol..."
2,2 States,"[comedy, drama]",7.0/10,Graduate students Krish and Ananya hope to win...
3,13 Reasons Why,"[crime, mystery, drama]",7.5/10,"After a teenage girl's perplexing suicide, a c..."
4,13 Reasons Why: Beyond the Reasons,"[crime, documentary]",7.7/10,"Cast members, writers, producers and mental he..."


### **Saving the modified CSV**

In [53]:
new.to_csv('Netflix_Movies.csv', index=False)

### **Text Vectorization**

In [55]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')

In [56]:
vector = cv.fit_transform(new['tags']).toarray()

### **Applying Cosine Similarity**

In [57]:
from sklearn.metrics.pairwise import cosine_similarity

In [58]:
similarity = cosine_similarity(vector)

### **Importing necessary Libraries**

In [60]:
import pandas as pd
import numpy as np
import pickle, json, re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

try:
    nltk.data.find('corpora/stopwords')
except:
    nltk.download('stopwords')

stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()

### **Text Preprocessing**

In [61]:
def preprocess(text):
    if isinstance(text, str):
        text = text.lower()
        text = re.sub(r"http\S+|@\S+|#\S+|[^a-zA-Z\s]", '', text)
        tokens = text.split()
        tokens = [stemmer.stem(word) for word in tokens if word not in stop_words]
        return " ".join(tokens)
    return ""

print("Load movie dataset")
try:
    movie_df = pd.read_csv("/content/Netflix_Movies.csv")
    movie_df['tags'] = movie_df['tags'].astype(str)
    print(f"Loaded {len(movie_df)} movies.")
except Exception as e:
    print(f"Error loading movie dataset: {e}")
    exit()

Load movie dataset
Loaded 477 movies.


### **Loading Necessary Models**

In [62]:
print("Loading all the models")
try:
    emotion_model = load_model("/content/lstm_emotion_model.h5")
    with open("/content/tokenizer.pkl", "rb") as f:
        tokenizer = pickle.load(f)
    with open("/content/label_encoder.pkl", "rb") as f:
        label_encoder = pickle.load(f)
    with open("/content/config.json", "r") as f:
        config = json.load(f)
    max_len_padding = config["max_length"]
    print("All components loaded successfully.")
except Exception as e:
    print(f"Error loading model/tokenizer/config: {e}")
    exit()



Loading all the models
All components loaded successfully.


### **Predicting Emotions**

In [63]:
def pred_emotion(text):
    clean_text = preprocess(text)
    sequence = tokenizer.texts_to_sequences([clean_text])
    padded = pad_sequences(sequence, maxlen=max_len_padding, padding='post')
    prediction = emotion_model.predict(padded, verbose=0)
    predicted_class = prediction.argmax(axis=1)

    index_to_emotion = {
        0: 'sadness',
        1: 'joy',
        2: 'love',
        3: 'anger',
        4: 'fear',
        5: 'surprise'
    }

    return index_to_emotion.get(int(predicted_class[0]), None)

### **Emotion to Genre Mapping**

In [64]:
emotion_to_genres = {
    'sadness': ['drama', 'family'],
    'joy': ['comedy', 'romance', 'drama', 'adventure', 'animation', 'fantasy'],
    'anger': ['action', 'adventure', 'thriller'],
    'love': ['romance', 'fantasy', 'musical'],
    'fear': ['horror', 'thriller', 'crime'],
    'surprise': ['adventure','mystery', 'sci-fi']
}

### **Recommendations**

In [65]:
def recommend(emotion_label, movie_df, emotion_to_genre, top_n=5):
    genres = emotion_to_genre.get(emotion_label.lower(), [])
    if not genres:
        print(f"No genres mapped for emotion '{emotion_label}'")
        return pd.DataFrame(columns=['Title'])

    print(f"Emotion '{emotion_label}' → Genres: {genres}")

    target = [g.lower().replace(" ", "") for g in genres]

    if isinstance(movie_df['Genres'].iloc[0], list):
        fil_df = movie_df[movie_df['Genres'].apply(lambda g_list: any(g in g_list for g in target))]
    else:
        import ast
        movie_df['Genres'] = movie_df['Genres'].apply(lambda x: [g.lower().replace(" ", "") for g in ast.literal_eval(x)])
        fil_df = movie_df[movie_df['Genres'].apply(lambda g_list: any(g in g_list for g in target))]


    if fil_df.empty:
        print("No movies found for these genres.")
        return pd.DataFrame(columns=['Title'])

    cv = CountVectorizer(max_features=5000, stop_words='english')
    vector = cv.fit_transform(fil_df['tags']).toarray()
    user_vec = cv.transform([emotion_label.lower()]).toarray()
    similarity = cosine_similarity(user_vec, vector).flatten()
    top_indices = similarity.argsort()[-top_n:][::-1]

    return fil_df.iloc[top_indices][['Title']]

### **Recommending Movies based on Emotion**

In [66]:
user_input = input("How are you feeling today? ")

emotion = pred_emotion(user_input)

if emotion:
    print(f"\nPredicted Emotion: {emotion}")
    rec = recommend(emotion, movie_df, emotion_to_genres, top_n=10)

    if not rec.empty:
        print("\n Recommended Movies:")
        for title in rec['Title']:
            print("•", title)
    else:
        print("Sorry, no movie matches found.")
else:
    print("Could not detect emotion from your input.")

How are you feeling today? i'm scared

Predicted Emotion: fear
Emotion 'fear' → Genres: ['horror', 'thriller', 'mystery', 'crime']

 Recommended Movies:
• Stree
• The Vampire Diaries
• The Rain
• The Open House
• Zombieland
• The Haunting of Hill House
• The Haunting of Bly Manor
• The Invisible Guest
• The Invisible Guardian
• The Conjuring 2
