## 0. Data Prep

#### a. Read in the movies data set

In [1]:
import pandas as pd

# modify the column width
pd.set_option('display.max_colwidth', None)

# look at a subset of the movies
movies = pd.read_csv('movies.csv')
movies.head(3)

Unnamed: 0,movie_title,movie_info,movie_rating
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",PG
1,A Dog's Way Home,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",PG
2,A Tuba to Cuba,"The leader of New Orleans' famed Preservation Hall Jazz Band seeks to fulfill his late father's dream of retracing their musical roots to the shores of Cuba in search of the indigenous music that gave birth to New Orleans jazz. A TUBA TO CUBA celebrates the triumph of the human spirit expressed through the universal language of music and challenges us to resolve to build bridges, not walls.",NR


#### b. Look at a few movies

In [2]:
# view one movie
movies.loc[1]

movie_title                                                                                                                                                                                                                                              A Dog's Way Home
movie_info      Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.
movie_rating                                                                                                                                                                                                                                                           PG
Name: 1, dtype: object

In [3]:
# extract the text portion
movie1 = movies.loc[1, 'movie_info']
movie1

'Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.'

In [4]:
# view another movie
movies.tail(1)

Unnamed: 0,movie_title,movie_info,movie_rating
165,Yesterday,"Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed.",PG-13


In [5]:
# extract the text portion
movie2 = movies.loc[165, 'movie_info']
movie2

"Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed."

## 1. Sentiment Analysis

**Question**: How positive or negative is this movie description?

#### a. Use the transformers library for sentiment analysis

In [6]:
from transformers import pipeline

sentiment_analyzer = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


In [7]:
# stop showing verbose outputs, only error messages
from transformers import logging
logging.set_verbosity_error()

#### b. Apply sentiment analysis using DistilBERT

In [8]:
# a dog's way home
movie1

'Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.'

In [9]:
# check the sentiment score
sentiment_analyzer(movie1)

[{'label': 'POSITIVE', 'score': 0.9995336532592773}]

In [10]:
# yesterday
movie2

"Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed."

In [11]:
# check the sentiment score
sentiment_analyzer(movie2)

[{'label': 'NEGATIVE', 'score': 0.9984468817710876}]

## 2. Zero-Shot Classification

**Question**: Which of the following genres (comedy, action, drama, horror, romance) best describes this movie description?

#### a. Find a model on Hugging Face's Model Hub

1. Go to huggingface.co
2. Click on Models >> Tasks >> Zero-Shot Classification
3. Sort by popularity

#### b. Bring in the BART model for zero-shot classification

In [12]:
classifier = pipeline("zero-shot-classification", # task specified
                      model="facebook/bart-large-mnli") # model picked from model hub

#### c. Apply zero-shot classification using BART

In [13]:
classifier(movie1, candidate_labels = ['comedy', 'action', 'drama', 'horror', 'romance'])

{'sequence': 'Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.',
 'labels': ['action', 'drama', 'comedy', 'romance', 'horror'],
 'scores': [0.8686432838439941,
  0.09988968074321747,
  0.014216333627700806,
  0.011292763985693455,
  0.005957926623523235]}

In [14]:
classifier(movie2, candidate_labels = ['comedy', 'action', 'drama', 'horror', 'romance'])

{'sequence': "Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed.",
 'labels': ['drama', 'action', 'comedy', 'horror', 'romance'],
 'scores': [0.4112357497215271,
  0.3190882205963135,
  0.12281811982393265,
  0.10751312971115112,
  0.03934483230113983]}

#### d. Apply zero-shot classification using DeBERTa

In [15]:
classifier2 = pipeline("zero-shot-classification", # task specified
                       model="sileod/deberta-v3-base-tasksource-nli") # another model picked from model hub

In [16]:
classifier2(movie1, candidate_labels = ['comedy', 'action', 'drama', 'horror', 'romance'])

{'sequence': 'Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.',
 'labels': ['action', 'drama', 'comedy', 'romance', 'horror'],
 'scores': [0.5859643220901489,
  0.22670985758304596,
  0.08723773062229156,
  0.0763465017080307,
  0.023741617798805237]}

In [17]:
classifier2(movie2, candidate_labels = ['comedy', 'action', 'drama', 'horror', 'romance'])

{'sequence': "Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed.",
 'labels': ['drama', 'action', 'romance', 'horror', 'comedy'],
 'scores': [0.5804199576377869,
  0.20670437812805176,
  0.10009843111038208,
  0.0761198177933693,
  0.036657366901636124]}

## 3. Document Similarity

**Question**: How similar is this movie description to other movie descriptions?

#### a. Create the pipeline and similar movies function

In [18]:
# step 1: specify our feature extraction model
feature_extractor = pipeline('feature-extraction',
                     model='sentence-transformers/all-MiniLM-L6-v2')

In [19]:
# step 2: extract the embeddings
import numpy as np

embeddings = movies.movie_info.apply(lambda row: feature_extractor(row)[0][0])
embeddings_movies = np.vstack(embeddings)

In [20]:
# step 3: create a get_similar_movies function with the inputs: embeddings, movie_index, movie_details, top_n
from sklearn.metrics.pairwise import cosine_similarity

def get_similar_movies(movie_index, top_n=3):

    # create movie embedding for movie_index
    m_embedding = np.array(embeddings_movies[movie_index]).reshape(1, -1)
    
    # calculate similarity scores
    similarity_scores = cosine_similarity(m_embedding, embeddings_movies)
    similarity_scores_series = pd.Series(similarity_scores.flatten(), name='similarity_score')
    
    # bring in movie info
    movies_similarity_scores_df = pd.concat([movies[['movie_title', 'movie_info', 'movie_rating']], similarity_scores_series], axis=1)

    # display movie recs
    return movies_similarity_scores_df.sort_values('similarity_score', ascending=False).iloc[0:top_n+1]

#### b. Find similar movies using MiniLM

In [21]:
# find movies similar to A Dog's Way Home
get_similar_movies(1)

Unnamed: 0,movie_title,movie_info,movie_rating,similarity_score
1,A Dog's Way Home,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",PG,1.0
87,Pet Sematary,"Based on the seminal horror novel by Stephen King, Pet Sematary follows Dr. Louis Creed (Jason Clarke), who, after relocating with his wife Rachel (Amy Seimetz) and their two young children from Boston to rural Maine, discovers a mysterious burial ground hidden deep in the woods near the family's new home. When tragedy strikes, Louis turns to his unusual neighbor, Jud Crandall (John Lithgow), setting off a perilous chain reaction that unleashes an unfathomable evil with horrific consequences.",R,0.828111
81,Missing Link,"This April, meet Mr. Link (Galifianakis): 8 feet tall, 630 lbs, and covered in fur, but don't let his appearance fool you... he is funny, sweet, and adorably literal, making him the world's most lovable legend at the heart of Missing Link, the globe-trotting family adventure from LAIKA. Tired of living a solitary life in the Pacific Northwest, Mr. Link recruits fearless explorer Sir Lionel Frost (Jackman) to guide him on a journey to find his long-lost relatives in the fabled valley of Shangri-La. Along with adventurer Adelina Fortnight (Saldana), our fearless trio of explorers encounter more than their fair share of peril as they travel to the far reaches of the world to help their new friend. Through it all, the three learn that sometimes you can find a family in the places you least expect.",PG,0.821332
140,The Secret Life of Pets 2,"THE SECRET LIFE OF PETS 2 will follow summer 2016's blockbuster about the lives our pets lead after we leave for work or school each day. Illumination founder and CEO Chris Meledandri and his longtime collaborator Janet Healy will produce the sequel to the comedy that had the best opening ever for an original film, animated or otherwise. THE SECRET LIFE OF PETS 2 will see the return of writer Brian Lynch (Minions) and once again be directed by Chris Renaud (Despicable Me series, Dr. Seuss' The Lorax).",PG,0.803324


In [22]:
# find movies similar to Yesterday
get_similar_movies(165)

Unnamed: 0,movie_title,movie_info,movie_rating,similarity_score
165,Yesterday,"Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed.",PG-13,1.0
32,Dolemite Is My Name,"Stung by a string of showbiz failures, floundering comedian Rudy Ray Moore (Academy Award nominee Eddie Murphy) has an epiphany that turns him into a word-of-mouth sensation: step onstage as someone else. Borrowing from the street mythology of 1970s Los Angeles, Moore assumes the persona of Dolemite, a pimp with a cane and an arsenal of obscene fables. However, his ambitions exceed selling bootleg records deemed too racy for mainstream radio stations to play. Moore convinces a social justice-minded dramatist (Keegan-Michael Key) to write his alter ego a film, incorporating kung fu, car chases, and Lady Reed (Da'Vine Joy Randolph), an ex-backup singer who becomes his unexpected comedic foil. Despite clashing with his pretentious director, D'Urville Martin (Wesley Snipes), and countless production hurdles at their studio in the dilapidated Dunbar Hotel, Moore's Dolemite becomes a runaway box office smash and a defining movie of the Blaxploitation era.",R,0.825942
81,Missing Link,"This April, meet Mr. Link (Galifianakis): 8 feet tall, 630 lbs, and covered in fur, but don't let his appearance fool you... he is funny, sweet, and adorably literal, making him the world's most lovable legend at the heart of Missing Link, the globe-trotting family adventure from LAIKA. Tired of living a solitary life in the Pacific Northwest, Mr. Link recruits fearless explorer Sir Lionel Frost (Jackman) to guide him on a journey to find his long-lost relatives in the fabled valley of Shangri-La. Along with adventurer Adelina Fortnight (Saldana), our fearless trio of explorers encounter more than their fair share of peril as they travel to the far reaches of the world to help their new friend. Through it all, the three learn that sometimes you can find a family in the places you least expect.",PG,0.819001
86,Pavarotti,"From the filmmaking team behind the highly-acclaimed documentary The Beatles: Eight Days A Week - The Touring Years, PAVAROTTI is a riveting film that lifts the curtain on the icon who brought opera to the people. Academy Award (R) winner Ron Howard puts audiences front row center for an exploration of The Voice...The Man...The Legend. Luciano Pavarotti gave his life to the music and a voice to the world. This cinematic event features history-making performances and intimate interviews, including never-before-seen footage and cutting-edge Dolby Atmos technology.",PG-13,0.8187
