# Movie Descriptions
## NLP - Semantic Similarity Task 20.2

## Practical Task 2: Objective
- The goal of this notebook is to build a system that will tell you what to watch next based on the word vector similarity of the description of movies.
- This program compares movie description sentences using the pre-trained word vectors used by SpaCy's 'en_core_web_md' model.
## Key Concepts
- **Import Data:** Read in the movies.txt file. Each separate line is a description of a different movie.
- **Create Function:** Your task is to create a function to return which movies a user would watch next if they have watched Planet Hulk with the description below.
- **Planet Hulk Description:** “Will he save their world or destroy it? When the Hulk becomes too dangerous for the Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a planet where the Hulk can live in peace. Unfortunately, Hulk lands on the planet Sakaar where he is sold into slavery and trained as a gladiator.”
- **Expected Output:** The function should take in the description as a parameter and return the title of the most similar movie.

In [353]:
# Initialise variables: 
import spacy

In [354]:
# Load SpaCy medium model.
nlp = spacy.load("en_core_web_md")

In [355]:
# Load movie descriptions data from text file 'movies.txt'.
with open("movies.txt", "r", encoding="utf-8") as file: # Read-only mode 'r', Unicode Transformation Format, 8-bit. 
    movie_descriptions = [line.strip() for line in file]

## Print Out Movie Descriptions From Data Loaded Into The Model

In [356]:
# Print the movie descriptions once imported for initial review of data set.
for description in movie_descriptions:
    print(description)

Movie A :When Hiccup discovers Toothless isn't the only Night Fury, he must seek "The Hidden World", a secret Dragon Utopia before a hired tyrant named Grimmel finds it first.
Movie B :After the death of Superman, several new people present themselves as possible successors.
Movie C :A darkness swirls at the center of a world-renowned dance company, one that will engulf the artistic director, an ambitious young dancer, and a grieving psychotherapist. Some will succumb to the nightmare. Others will finally wake up.
Movie D :A humorous take on Sir Arthur Conan Doyle's classic mysteries featuring Sherlock Holmes and Doctor Watson.
Movie E :A 16-year-old girl and her extended family are left reeling after her calculating grandmother unveils an array of secrets on her deathbed.
Movie F :In the last moments of World War II, a young German soldier fighting for survival finds a Nazi captain's uniform. Impersonating an officer, the man quickly takes on the monstrous identity of the perpetrators

## Similarity Testing Of Movie Descriptions

In [357]:
# Initial testing of similarity levels of movie descriptions provided.
sentence_to_compare = "Will he save their world or destroy it? When the Hulk becomes too dangerous for the Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a planet where the Hulk can live in peace. Unfortunately, Hulk lands on the planet Sakaar where he is sold into slavery and trained as a gladiator."
sentences = [
"When Hiccup discovers Toothless isn't the only Night Fury, he must seek 'The Hidden World', a secret Dragon Utopia before a hired tyrant named Grimmel finds it first.",
"After the death of Superman, several new people present themselves as possible successors.",
"A darkness swirls at the centre of a world-renowned dance company, one that will engulf the artistic director, an ambitious young dancer, and a grieving psychotherapist. Some will succumb to the nightmare. Others will finally wake up.",
"A humorous take on Sir Arthur Conan Doyle's classic mysteries featuring Sherlock Holmes and Doctor Watson.",
"A 16-year-old girl and her extended family are left reeling after her calculating grandmother unveils an array of secrets on her deathbed.",
"In the last moments of World War II, a young German soldier fighting for survival finds a Nazi captain's uniform. Impersonating an officer, the man quickly takes on the monstrous identity of the perpetrators he is trying to escape from.",
"The world at an end, a dying mother sends her young son on a quest to find the place that grants wishes.",
"A musician helps a young singer and actress find fame, even as age and alcoholism send his own career into a downward spiral.",
"Corporate analyst and single mom, Jen, tackles Christmas with a business-like approach until her uncle arrives with a handsome stranger in tow.",
"Adapted from the bestselling novel by Madeleine St John, Ladies in Black is an alluring and tender-hearted comedy drama about the lives of a group of department store employees in 1959 Sydney.",
"Will he save their world or destroy it? When the Hulk becomes too dangerous for the Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a planet where the Hulk can live in peace. Unfortunately, Hulk lands on the planet Sakaar where he is sold into slavery and trained as a gladiator."]
model_sentence = nlp(sentence_to_compare)
for sentence in sentences:
    similarity = nlp(sentence).similarity(model_sentence)
    print(str(similarity) + ": " + sentence)

0.8520758121774696: When Hiccup discovers Toothless isn't the only Night Fury, he must seek 'The Hidden World', a secret Dragon Utopia before a hired tyrant named Grimmel finds it first.
0.8401837623151783: After the death of Superman, several new people present themselves as possible successors.
0.9087732757692548: A darkness swirls at the centre of a world-renowned dance company, one that will engulf the artistic director, an ambitious young dancer, and a grieving psychotherapist. Some will succumb to the nightmare. Others will finally wake up.
0.5444309494097908: A humorous take on Sir Arthur Conan Doyle's classic mysteries featuring Sherlock Holmes and Doctor Watson.
0.7267661469790314: A 16-year-old girl and her extended family are left reeling after her calculating grandmother unveils an array of secrets on her deathbed.
0.8930368281975917: In the last moments of World War II, a young German soldier fighting for survival finds a Nazi captain's uniform. Impersonating an officer, t

## Observations of results using en_core_web_md
- **Perfect similarity:** The model correctly recognises the perfect similarity (1) of the identical test movie description.
- **Highest similarity:** The model identifies description in index 3 (Movie C) to have the highest similarity of 0.909.
- **Lowest similarity:** The model identifies description in index 4 (Movie D) to have the lowest similarity of 0.544.
## Observations of results using en_core_web_sm (please note the small model is faster but not as accurate)
- **Perfect similarity:** The model correctly recognises the perfect similarity (1) of the identical test movie description.
- **Highest similarity:** The small model suggests index 7 (Movie G) is the most similar at 0.689.
- **Lowest similarity:** The small model suggests index 5 (Movie E) is the most similar at 0.4458.

## Create The Function

In [358]:
# Create movie_data list.
movie_data = [{"title": description.split(':')[0].strip(), "description": description.strip()} for description in movie_descriptions]
print(movie_data) # To check movie title has been extracted correctly.

[{'title': 'Movie A', 'description': 'Movie A :When Hiccup discovers Toothless isn\'t the only Night Fury, he must seek "The Hidden World", a secret Dragon Utopia before a hired tyrant named Grimmel finds it first.'}, {'title': 'Movie B', 'description': 'Movie B :After the death of Superman, several new people present themselves as possible successors.'}, {'title': 'Movie C', 'description': 'Movie C :A darkness swirls at the center of a world-renowned dance company, one that will engulf the artistic director, an ambitious young dancer, and a grieving psychotherapist. Some will succumb to the nightmare. Others will finally wake up.'}, {'title': 'Movie D', 'description': "Movie D :A humorous take on Sir Arthur Conan Doyle's classic mysteries featuring Sherlock Holmes and Doctor Watson."}, {'title': 'Movie E', 'description': 'Movie E :A 16-year-old girl and her extended family are left reeling after her calculating grandmother unveils an array of secrets on her deathbed.'}, {'title': 'Mov

In [359]:
def find_similar_movie(input_description, movie_data):
    # Process input movie description.
    input_doc = nlp(input_description)
    # Process movie descriptions.
    movie_docs = [nlp(movie["description"]) for movie in movie_data]
    # Calculate similarity using SpaCy.
    similarities = [input_doc.similarity(movie_doc) for movie_doc in movie_docs]
    # Print similarity scores for review.
    for score, movie in zip(similarities, movie_data):
        print(f"{score}: {movie['description']}")
    # Identify the index of the most similar movie description.
    most_similar_index = similarities.index(max(similarities))
    # Return the movie description of the most similar movie to watch next.
    return movie_data[most_similar_index]

## Output Next Movie Suggestion

In [360]:
# Output the similarity testing outcomes and suggest the next movie the user would most likely enjoy watching based on having just watched Planet Hulk.
input_description = "Will he save their world or destroy it? When the Hulk becomes too dangerous for the Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a planet where the Hulk can live in peace. Unfortunately, Hulk lands on the planet Sakaar where he is sold into slavery and trained as a gladiator."
result = find_similar_movie(input_description, movie_data)
print("\nHaving just watched Planet Hulk, next you might like to watch:", result["title"])
print("Description:", result["description"])

0.7771753433517111: Movie A :When Hiccup discovers Toothless isn't the only Night Fury, he must seek "The Hidden World", a secret Dragon Utopia before a hired tyrant named Grimmel finds it first.
0.7487074960551457: Movie B :After the death of Superman, several new people present themselves as possible successors.
0.8866718729298162: Movie C :A darkness swirls at the center of a world-renowned dance company, one that will engulf the artistic director, an ambitious young dancer, and a grieving psychotherapist. Some will succumb to the nightmare. Others will finally wake up.
0.37648310468544055: Movie D :A humorous take on Sir Arthur Conan Doyle's classic mysteries featuring Sherlock Holmes and Doctor Watson.
0.6708371652556161: Movie E :A 16-year-old girl and her extended family are left reeling after her calculating grandmother unveils an array of secrets on her deathbed.
0.8753508504222226: Movie F :In the last moments of World War II, a young German soldier fighting for survival find

## Observations of results using en_core_web_md
Please note the similarity values are lower than the initial test of similarity as these descriptions also include the movie title e.g. Movie A, Movie B etc.
However, the same similarity order ranking has been achieved and Movie C has once again been suggested to be viewed next.
- **Highest similarity:** The model identifies (Movie C) to have the highest similarity of 0.887. Watch next!
- **Second best similarity:** The model identifies (Movie G) to have the next best similarity of 0.878. Movie G would be suggested if the user did not wish to watch Movie C.
- **Lowest similarity:** The model identifies (Movie D) to have the lowest similarity of 0.376. Least likely to recommend next.
## Observations of results using en_core_web_sm (please note the small model is faster but not as accurate)
- **Highest similarity:** The small model suggests index 7 (Movie G) is the most similar at 0.650. (this was the second best choice by the medium model that is larger and more accurate.)
- **Lowest similarity:** The small model suggests index 4 (Movie D) is the most similar at 0.421. (this is not what it suggested when we tested the model above)
- **Summary:** The small model does not rank the most similar option from the medium model of Movie C very highly, with a similarity rating of only 0.599.