# Hugging Face Transformers Assignments

## 1. Sentiment Analysis

1. Create a new _nlp_transformers_ environment
2. Launch Jupyter Notebook
3. Read in the movie reviews data set including the VADER sentiment scores (_movie_reviews_sentiment.csv_)
4. Apply sentiment analysis to the _movie_info_ column using transformers
5. Compare the transformers sentiment scores with the VADER sentiment scores

In [1]:
# read in movie reviews with sentiment scores
import pandas as pd

# view full movie_info text
pd.set_option('display.max_colwidth', None)

# read in the movie reviews data
df = pd.read_csv('../Data/movie_reviews_with_sentiment.csv')
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237


In [2]:
# Update the sentiment column with the name sentiment_vader
df = df.rename(columns={'sentiment': 'sentiment_vader'})

In [3]:
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment_vader
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237


In [4]:
%%time

# add a timer and hide all non-critical warnings
from transformers import pipeline, logging

logging.set_verbosity_error()

sentiment_analyzer = pipeline("sentiment-analysis",
                              model="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
                              device=-1,
                              truncation=True)

sentiment_scores = df['movie_info'].apply(sentiment_analyzer)
sentiment_scores[:5]

CPU times: user 5.77 s, sys: 883 ms, total: 6.65 s
Wall time: 8.07 s


0    [{'label': 'POSITIVE', 'score': 0.9982469081878662}]
1    [{'label': 'POSITIVE', 'score': 0.9995336532592773}]
2    [{'label': 'POSITIVE', 'score': 0.9994434714317322}]
3    [{'label': 'POSITIVE', 'score': 0.9994601607322693}]
4    [{'label': 'POSITIVE', 'score': 0.9972022771835327}]
Name: movie_info, dtype: object

In [5]:
# extract the label and score and create a sentiment score for all reviews
df['Label_HF'] = sentiment_scores.apply(lambda x: x[0]['label'])
df['Score_HF'] = sentiment_scores.apply(lambda x: x[0]['score'])
df['Sentiment_HF'] = df.apply(lambda row: row['Score_HF'] if row['Label_HF'] == 'POSITIVE' else -row['Score_HF'], axis=1)

In [6]:
df.head(5)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment_vader,Label_HF,Score_HF,Sentiment_HF
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837,POSITIVE,0.998247,0.998247
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237,POSITIVE,0.999534,0.999534
2,A Tuba to Cuba,NR,"Documentary, Musical & Performing Arts",2/15/19,"The leader of New Orleans' famed Preservation Hall Jazz Band seeks to fulfill his late father's dream of retracing their musical roots to the shores of Cuba in search of the indigenous music that gave birth to New Orleans jazz. A TUBA TO CUBA celebrates the triumph of the human spirit expressed through the universal language of music and challenges us to resolve to build bridges, not walls.","Danny Clinch, T.G. Herrington",male,100,82,,0.936,POSITIVE,0.999443,0.999443
3,A Vigilante,R,Drama,3/29/19,"A once abused woman, Sadie (Olivia Wilde), devotes herself to ridding victims of their domestic abusers while hunting down the husband she must kill to truly be free. A Vigilante is a thriller inspired by the strength and bravery of real domestic abuse survivors and the incredible obstacles to safety they face.",Sarah Daggar-Nickson,female,92,50,"Led by Olivia Wilde's fearless performance and elevated by timely themes, A Vigilante is an uncompromising thriller that hits as hard as its protagonist.",-0.0334,POSITIVE,0.99946,0.99946
4,After,PG-13,"Drama, Romance",4/12/19,"Based on Anna Todd's best-selling novel which became a publishing sensation on social storytelling platform Wattpad, AFTER follows Tessa (Langford), a dedicated student, dutiful daughter and loyal girlfriend to her high school sweetheart, as she enters her first semester in college. Armed with grand ambitions for her future, her guarded world opens up when she meets the dark and mysterious Hardin Scott (Tiffin), a magnetic, brooding rebel who makes her question all she thought she knew about herself and what she wants out of life.",Jenny Gage,female,17,72,"Tepid and tired, After's fun flourishes are let down by its generic story.",0.9349,POSITIVE,0.997202,0.997202


In [7]:
# view the calculations
df[['movie_title', 'movie_info', 'sentiment_vader', 'Label_HF', 'Score_HF', 'Sentiment_HF']].head(8)

Unnamed: 0,movie_title,movie_info,sentiment_vader,Label_HF,Score_HF,Sentiment_HF
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",0.9837,POSITIVE,0.998247,0.998247
1,A Dog's Way Home,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",0.9237,POSITIVE,0.999534,0.999534
2,A Tuba to Cuba,"The leader of New Orleans' famed Preservation Hall Jazz Band seeks to fulfill his late father's dream of retracing their musical roots to the shores of Cuba in search of the indigenous music that gave birth to New Orleans jazz. A TUBA TO CUBA celebrates the triumph of the human spirit expressed through the universal language of music and challenges us to resolve to build bridges, not walls.",0.936,POSITIVE,0.999443,0.999443
3,A Vigilante,"A once abused woman, Sadie (Olivia Wilde), devotes herself to ridding victims of their domestic abusers while hunting down the husband she must kill to truly be free. A Vigilante is a thriller inspired by the strength and bravery of real domestic abuse survivors and the incredible obstacles to safety they face.",-0.0334,POSITIVE,0.99946,0.99946
4,After,"Based on Anna Todd's best-selling novel which became a publishing sensation on social storytelling platform Wattpad, AFTER follows Tessa (Langford), a dedicated student, dutiful daughter and loyal girlfriend to her high school sweetheart, as she enters her first semester in college. Armed with grand ambitions for her future, her guarded world opens up when she meets the dark and mysterious Hardin Scott (Tiffin), a magnetic, brooding rebel who makes her question all she thought she knew about herself and what she wants out of life.",0.9349,POSITIVE,0.997202,0.997202
5,Aladdin,"A street rat frees a genie from a lamp, granting all of his wishes and transforming himself into a charming prince in order to marry a beautiful princess. But soon, an evil sorcerer becomes hell-bent on securing the lamp for his own sinister purposes.",-0.6249,NEGATIVE,0.651462,-0.651462
6,Alita: Battle Angel,"From visionary filmmakers James Cameron (AVATAR) and Robert Rodriguez (SIN CITY), comes ALITA: BATTLE ANGEL, an epic adventure of hope and empowerment. When Alita (Rosa Salazar) awakens with no memory of who she is in a future world she does not recognize, she is taken in by Ido (Christoph Waltz), a compassionate doctor who realizes that somewhere in this abandoned cyborg shell is the heart and soul of a young woman with an extraordinary past. As Alita learns to navigate her new life and the treacherous streets of Iron City, Ido tries to shield her from her mysterious history while her street-smart new friend Hugo (Keean Johnson) offers instead to help trigger her memories. But it is only when the deadly and corrupt forces that run the city come after Alita that she discovers a clue to her past - she has unique fighting abilities that those in power will stop at nothing to control. If she can stay out of their grasp, she could be the key to saving her friends, her family and the world she's grown to love.",0.9035,POSITIVE,0.997783,0.997783
7,All Is True,"The year is 1613, Shakespeare is acknowledged as the greatest writer of the age. But disaster strikes when his renowned Globe Theatre burns to the ground, and devastated, Shakespeare returns to Stratford, where he must face a troubled past and a neglected family. Haunted by the death of his only son Hamnet, he struggles to mend the broken relationships with his wife and daughters. In so doing, he is ruthlessly forced to examine his own failings as husband and father. His very personal search for the truth uncovers secrets and lies within a family at war.",-0.9955,POSITIVE,0.520493,0.520493


In [8]:
df[['movie_title', 'movie_info', 'sentiment_vader', 'Label_HF', 'Score_HF', 'Sentiment_HF']].sort_values(by='Sentiment_HF', ascending=True).head(5)

Unnamed: 0,movie_title,movie_info,sentiment_vader,Label_HF,Score_HF,Sentiment_HF
22,Braid,"Two wanted women decide to rob their wealthy yet mentally unstable friend who lives in a fantasy world they all created as children. To take her money, the girls must take part in a deadly and perverse game of make believe throughout a sprawling yet decaying estate. As things become increasingly violent and hallucinatory, they realize that obtaining the money may be the least of their concerns.",-0.8316,NEGATIVE,0.999203,-0.999203
103,Spider-Man: Far From Home,"Peter Parker returns in Spider-Man: Far From Home, the next chapter of the Spider-Man: Homecoming series! Our friendly neighborhood Super Hero decides to join his best friends Ned, MJ, and the rest of the gang on a European vacation. However, Peter's plan to leave super heroics behind for a few weeks are quickly scrapped when he begrudgingly agrees to help Nick Fury uncover the mystery of several elemental creature attacks, creating havoc across the continent!",0.9722,NEGATIVE,0.998805,-0.998805
34,Dragged Across Concrete,"DRAGGED ACROSS CONCRETE follows two police detectives who find themselves suspended when a video of their strong-arm tactics is leaked to the media. With little money and no options, the embittered policemen descend into the criminal underworld and find more than they wanted waiting in the shadows.",-0.9015,NEGATIVE,0.998734,-0.998734
165,Yesterday,"Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed.",0.1365,NEGATIVE,0.998447,-0.998447
102,Skin,"A white supremacist reforms his life after falling in love but saying goodbye to his skinhead life isn't a clean process. He must betray his former gang and work alongside the FBI in order to remove the body ink that has represented his identity for so long, as well as the burden of the gang's crimes he has carried.",-0.8377,NEGATIVE,0.996846,-0.996846


## 2. Named Entity Recognition

1. Read in the children's books data set (_childrens_books.csv_)
2. Apply NER to the Description column
3. Create a list of all named entities
4. Only include the people (PER)
5. _Extra credit:_ Exclude the authors as well

In [9]:
df_childrens = pd.read_csv('../Data/childrens_books.csv')
df_childrens.head(2)

Unnamed: 0,Ranking,Title,Author,Year,Rating,Description
0,1,Where the Wild Things Are,Maurice Sendak,1963,4.25,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story."
1,2,The Very Hungry Caterpillar,Eric Carle,1969,4.34,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education."


In [10]:
# find the named entities in each review
ner_analyzer = pipeline("ner",
                        model="dbmdz/bert-large-cased-finetuned-conll03-english",
                        device='mps',
                        aggregation_strategy='SIMPLE')

In [11]:
# apply to description column
ner_analyzer(df_childrens.Description[0])

[{'entity_group': 'MISC',
  'score': np.float32(0.9462515),
  'word': 'Where the Wild Things Are',
  'start': 0,
  'end': 25},
 {'entity_group': 'PER',
  'score': np.float32(0.9990614),
  'word': 'Max',
  'start': 34,
  'end': 37},
 {'entity_group': 'PER',
  'score': np.float32(0.9984414),
  'word': 'Max',
  'start': 175,
  'end': 178},
 {'entity_group': 'PER',
  'score': np.float32(0.9789461),
  'word': 'Sendak',
  'start': 380,
  'end': 386}]

In [12]:
# extract the words and filtering by people (PER)
[entity['word'] for entity in ner_analyzer(df_childrens.Description[0]) if entity['entity_group'] == 'PER']

['Max', 'Max', 'Sendak']

In [13]:
# apply to all description
df_childrens['Named_Entities'] = df_childrens['Description'].apply(lambda x: [entity['word'] for entity in ner_analyzer(x) if entity['entity_group'] == 'PER'])

In [14]:
# view the named entities
df_childrens[['Description', 'Named_Entities']].head()

Unnamed: 0,Description,Named_Entities
0,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.","[Max, Max, Sendak]"
1,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education.","[##pi, Eric Carle]"
2,"The Giving Tree is a touching and bittersweet story about a tree that gives everything it has to a boy over the course of his life. As the boy grows up, he takes more from the tree, and the tree continues to give, even when it has little left. Silverstein’s minimalist text and illustrations convey deep themes of unconditional love, selflessness, and the passage of time. It has sparked much discussion about relationships and sacrifice.",[Silverstein]
3,"In Green Eggs and Ham, Sam-I-Am tries to convince a reluctant character to try a dish of green eggs and ham, despite his resistance. Through repetition and rhyme, Dr. Seuss’s classic story about being open to new experiences encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers.","[Sam - I - Am, Dr. Seuss]"
4,"Goodnight Moon is a gentle, rhythmic bedtime story where a little bunny says goodnight to everything in his room, from the moon to the ""quiet old lady whispering hush."" Its repetitive structure and comforting tone make it ideal for young children. The simple illustrations by Clement Hurd complement the soothing nature of the story, making it a beloved classic for sleep-time reading.",[Clement Hurd]


In [15]:
# create a unique list of named entities
named_entities = list(set(df_childrens.Named_Entities.explode().dropna().tolist()))
named_entities[:10]

['Bilbo Baggins',
 'Grover',
 'Sam - I - Am',
 'Leslie Burke',
 'Ramona',
 'Huck Finn',
 'Bilbo',
 'Milo',
 'Baum',
 'Hu']

In [16]:

len(named_entities)

165

In [17]:
# exclude subwords from the list
named_entities_clean = [entity for entity in named_entities if '#' not in entity]
named_entities_clean[:10]

['Bilbo Baggins',
 'Grover',
 'Sam - I - Am',
 'Leslie Burke',
 'Ramona',
 'Huck Finn',
 'Bilbo',
 'Milo',
 'Baum',
 'Hu']

In [18]:
len(named_entities_clean)

154

In [19]:
df_childrens.Author.head()

0         Maurice Sendak
1             Eric Carle
2       Shel Silverstein
3              Dr. Seuss
4    Margaret Wise Brown
Name: Author, dtype: object

In [20]:
authors = list(set(df_childrens.Author.to_list())) # set to get unique authors
authors[:10]

['Judith Viorst',
 'Ezra Jack Keats',
 'Virginia Lee Burton',
 'Kevin Henkes',
 'Louis Sachar',
 'E.B. White',
 'Arlene Mosel',
 'Laura Ingalls Wilder',
 'E.L. Konigsburg',
 'Peggy Parish']

In [21]:
named_entities_without_authors = [entity for entity in named_entities_clean if entity not in authors]
named_entities_without_authors

['Bilbo Baggins',
 'Grover',
 'Sam - I - Am',
 'Leslie Burke',
 'Ramona',
 'Huck Finn',
 'Bilbo',
 'Milo',
 'Baum',
 'Hu',
 'White Witch',
 'Jamie',
 'Sendak',
 'Gandalf',
 'De Brunhoff',
 'Lisa',
 'Big Friendly Giant',
 'Cord',
 'Olivia',
 'Falconer',
 'Atreyu',
 'Aslan',
 'Rabbit',
 'Sal',
 'Wizard',
 'Grinch',
 'Freeman',
 'Tin Man',
 'Elizabeth',
 'Tom',
 'Wilbur',
 'Dorothy',
 'White',
 'Camilla Cream',
 'James',
 'S',
 'Eeyore',
 'the G',
 'Piglet',
 'Twain',
 'Winnie - the',
 'Burton',
 'Brown',
 'Hermione',
 'Corduroy',
 'Alice',
 'Alexander',
 'Harold',
 'Tikki Tikki Tembo',
 'Sc',
 'Bastian',
 'Miss Nelson',
 'Lorax',
 'Muth',
 'Laura',
 'Willems',
 'Milne',
 'Viola Swamp',
 'Joe Harper',
 'A. Milne',
 'Jonas',
 'Despereaux',
 'Santa',
 'Charlie',
 'Emily Elizabeth',
 'Ramon',
 'Harry Potter',
 'Silverstein',
 'Dr',
 'Meg Murry',
 'DiCamillo',
 'Bemelmans',
 'Amelia Bedelia',
 'Basil E. Frankweiler',
 'Cleary',
 'A. Rey',
 'Big Nutbrown Hare',
 'B',
 'Keats',
 'H',
 'Clifford

In [22]:
len(named_entities_without_authors)

145

## 3. Zero-Shot Classification

1. Apply zero-shot classification to the Description column using these five categories:
* adventure & fantasy
* animals & nature
* mystery
* humor
* non-fiction
2. Find the number of books in each category and check a few to see if the results make sense

In [23]:
df_childrens.head(2)

Unnamed: 0,Ranking,Title,Author,Year,Rating,Description,Named_Entities
0,1,Where the Wild Things Are,Maurice Sendak,1963,4.25,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.","[Max, Max, Sendak]"
1,2,The Very Hungry Caterpillar,Eric Carle,1969,4.34,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education.","[##pi, Eric Carle]"


In [24]:
# remember our topics from the machine learning section: 'order', 'taste & texture', 'good', 'flavor', 'health'
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli",
                      device='mps')

In [25]:
# try on one description
classifier(df_childrens.Description[0], ['Adventure & Fantasy', 'Mystery', 'Humor', 'Non-Fiction'])

{'sequence': "Where the Wild Things Are\xa0follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.",
 'labels': ['Non-Fiction', 'Adventure & Fantasy', 'Humor', 'Mystery'],
 'scores': [0.37945783138275146,
  0.32252395153045654,
  0.17798499763011932,
  0.12003319710493088]}

In [26]:
# extract just the top label
classifier(df_childrens.Description[0], ['Adventure & Fantasy', 'Mystery', 'Humor', 'Non-Fiction'])['labels'][0]

'Non-Fiction'

In [27]:
# Apply to all descriptions and extract top label
df_childrens['Predicted_Genre'] = df_childrens['Description'].apply(lambda x: classifier(x, ['Adventure & Fantasy', 'Mystery', 'Humor', 'Non-Fiction'])['labels'][0])

In [28]:
df_childrens[['Description', 'Predicted_Genre']].head()

Unnamed: 0,Description,Predicted_Genre
0,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.",Non-Fiction
1,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education.",Non-Fiction
2,"The Giving Tree is a touching and bittersweet story about a tree that gives everything it has to a boy over the course of his life. As the boy grows up, he takes more from the tree, and the tree continues to give, even when it has little left. Silverstein’s minimalist text and illustrations convey deep themes of unconditional love, selflessness, and the passage of time. It has sparked much discussion about relationships and sacrifice.",Non-Fiction
3,"In Green Eggs and Ham, Sam-I-Am tries to convince a reluctant character to try a dish of green eggs and ham, despite his resistance. Through repetition and rhyme, Dr. Seuss’s classic story about being open to new experiences encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers.",Humor
4,"Goodnight Moon is a gentle, rhythmic bedtime story where a little bunny says goodnight to everything in his room, from the moon to the ""quiet old lady whispering hush."" Its repetitive structure and comforting tone make it ideal for young children. The simple illustrations by Clement Hurd complement the soothing nature of the story, making it a beloved classic for sleep-time reading.",Non-Fiction


In [29]:
# Find the number of books in each predicted genre
df_childrens['Predicted_Genre'].value_counts()

Predicted_Genre
Non-Fiction            49
Humor                  33
Adventure & Fantasy    11
Mystery                 7
Name: count, dtype: int64

In [30]:
# Check a few Adventure predictions
df_childrens[df_childrens['Predicted_Genre'] == 'Humor'].head(3)

Unnamed: 0,Ranking,Title,Author,Year,Rating,Description,Named_Entities,Predicted_Genre
3,4,Green Eggs and Ham,Dr. Seuss,1960,4.31,"In Green Eggs and Ham, Sam-I-Am tries to convince a reluctant character to try a dish of green eggs and ham, despite his resistance. Through repetition and rhyme, Dr. Seuss’s classic story about being open to new experiences encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers.","[Sam - I - Am, Dr. Seuss]",Humor
5,6,Charlotte’s Web,E.B. White,1952,4.2,"Charlotte’s Web tells the story of a pig named Wilbur and his friendship with Charlotte, a clever spider who saves his life. Set on a farm, the novel explores themes of friendship, loyalty, and the cycle of life. Through Charlotte’s wise words and actions, Wilbur learns about love and sacrifice. White’s writing is filled with warmth and humor, and it’s a perfect read for both children and adults, dealing with timeless themes.","[Wilbur, Charlotte, Charlotte, Wilbur, White]",Humor
8,9,If You Give a Mouse a Cookie,Laura Joffe Numeroff,1985,4.29,"In If You Give a Mouse a Cookie, a little mouse asks for a cookie, and from there, a series of increasingly funny and unlikely events follow. Each request leads to another, showing the mouse's insatiable appetite for things he doesn’t necessarily need. This circular story uses humor to teach children about cause and effect while entertaining them with its playful illustrations and charming narrative.",[],Humor


## 4. Text Summarization

1. Apply text summarization to the Description column
2. Review the results to see if they make sense

In [32]:
# text summarization
summarizer = pipeline("summarization",
                      model="facebook/bart-large-cnn",
                      device='mps')

In [38]:
# Summarize the first description
summarizer(df_childrens.Description[0], min_length=10, max_length=50, early_stopping=True, length_penalty=.8)[0]['summary_text']

'Where the Wild Things Are follows Max, a young boy who imagines sailing to an island filled with wild creatures. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood.'

In [39]:
# Summarize all descriptions and create a new column
df_childrens['Summary'] = df_childrens['Description'].apply(lambda x: summarizer(x, min_length=10, max_length=50, early_stopping=True, length_penalty=.8)[0]['summary_text'])
df_childrens[['Description', 'Summary']].head()

Unnamed: 0,Description,Summary
0,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.","Where the Wild Things Are follows Max, a young boy who imagines sailing to an island filled with wild creatures. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood."
1,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education.","Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of met"
2,"The Giving Tree is a touching and bittersweet story about a tree that gives everything it has to a boy over the course of his life. As the boy grows up, he takes more from the tree, and the tree continues to give, even when it has little left. Silverstein’s minimalist text and illustrations convey deep themes of unconditional love, selflessness, and the passage of time. It has sparked much discussion about relationships and sacrifice.","Silverstein’s minimalist text and illustrations convey deep themes of unconditional love, selflessness, and the passage of time. It has sparked much discussion about relationships and sacrifice."
3,"In Green Eggs and Ham, Sam-I-Am tries to convince a reluctant character to try a dish of green eggs and ham, despite his resistance. Through repetition and rhyme, Dr. Seuss’s classic story about being open to new experiences encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers.",Dr. Seuss’s classic story encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers.
4,"Goodnight Moon is a gentle, rhythmic bedtime story where a little bunny says goodnight to everything in his room, from the moon to the ""quiet old lady whispering hush."" Its repetitive structure and comforting tone make it ideal for young children. The simple illustrations by Clement Hurd complement the soothing nature of the story, making it a beloved classic for sleep-time reading.","Goodnight Moon is a gentle, rhythmic bedtime story. The simple illustrations by Clement Hurd complement the soothing nature of the story."


## 5. Document Similarity

1. Turn the Description column into embeddings using feature extraction
2. Compare the cosine similarity of Harry Potter and the Sorcerer’s Stone compared to all other books
3. Return the top 5 most similar books

In [40]:
# Turn the description column into embeddings using feature extraction
# modify the column width
pd.set_option('display.max_colwidth', None)

# read in the movies data
df_childrens.head(2)

Unnamed: 0,Ranking,Title,Author,Year,Rating,Description,Named_Entities,Predicted_Genre,Summary
0,1,Where the Wild Things Are,Maurice Sendak,1963,4.25,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.","[Max, Max, Sendak]",Non-Fiction,"Where the Wild Things Are follows Max, a young boy who imagines sailing to an island filled with wild creatures. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood."
1,2,The Very Hungry Caterpillar,Eric Carle,1969,4.34,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education.","[##pi, Eric Carle]",Non-Fiction,"Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of met"


In [41]:
# extract the embedding representation for each review
feature_extractor = pipeline("feature-extraction",
                             model="sentence-transformers/all-MiniLM-L6-v2",
                             device='mps')

embeddings = df_childrens['Description'].apply(lambda x: feature_extractor(x)[0][0])
embeddings.head(2)

0    [0.17135246098041534, 0.2192009538412094, 0.023439912125468254, 0.14603574573993683, 0.09078329056501389, -0.07354305684566498, 0.025495829060673714, -0.08237512409687042, -0.4054499566555023, -0.0031997691839933395, -0.10306098312139511, -0.00379374623298645, -0.2821389436721802, 0.24613454937934875, 0.2104647010564804, -0.09948407113552094, 0.30823788046836853, -0.0019043795764446259, -0.05988626182079315, -0.08923130482435226, -0.08314350247383118, 0.2819327116012573, 0.18756720423698425, -0.010089416056871414, -0.23478339612483978, -0.014897886663675308, -0.36665794253349304, -0.11721494048833847, 0.18829520046710968, -0.9992688894271851, -0.1453481912612915, -0.23778413236141205, 0.059362806379795074, -0.15320424735546112, -0.026063669472932816, 0.09265749901533127, -0.2304612398147583, -0.23014582693576813, 0.14133743941783905, -0.035516154021024704, -0.06441076844930649, 0.07449246942996979, -0.08746039867401123, 0.102985680103302, -0.2006939798593521, -0.08079877495765686,

In [47]:
# view one movie - Harry Potter
df_childrens[df_childrens['Title'].str.contains('Harry Potter', case=False)]

Unnamed: 0,Ranking,Title,Author,Year,Rating,Description,Named_Entities,Predicted_Genre,Summary
13,14,"Harry Potter and the Sorcerer's Stone (Harry Potter, #1)",J.K. Rowling,1997,4.47,"Harry Potter and the Sorcerer’s Stone introduces readers to Harry Potter, an orphan who discovers that he is a wizard and attends the magical Hogwarts School of Witchcraft and Wizardry. Along with his new friends, Harry uncovers mysteries surrounding his past and the dark wizard who killed his parents. This book starts the beloved series and sets the stage for Harry’s journey, filled with magic, adventure, and friendship.","[Harry Potter, Harry, Harry]",Mystery,"Harry Potter and the Sorcerer’s Stone introduces readers to Harry Potter, an orphan who discovers that he is a wizard. Along with his new friends, Harry uncovers mysteries surrounding his past and the dark wizard who killed"
97,98,"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)",J.K. Rowling,1999,4.58,"Harry Potter and the Prisoner of Azkaban is the third book in the Harry Potter series, where Harry returns to Hogwarts for his third year and uncovers secrets about his past. With the arrival of the mysterious Sirius Black, Harry must navigate dark truths and face his fears. This thrilling installment explores themes of loyalty, friendship, and identity, marking a turning point in the magical world of Harry Potter.","[Harry, Sirius Black, Harry]",Mystery,"Harry Potter and the Prisoner of Azkaban is the third book in the Harry Potter series. This thrilling installment explores themes of loyalty, friendship, and identity."
98,99,"Harry Potter and the Chamber of Secrets (Harry Potter, #2)",J.K. Rowling,1998,4.43,"Harry Potter and the Chamber of Secrets is the second book in the Harry Potter series, where Harry returns to Hogwarts for his second year and uncovers a hidden chamber within the school. As mysterious events unfold, Harry and his friends Ron and Hermione uncover dark secrets about the school’s past. Themes of courage, friendship, and standing up for what’s right are explored in this gripping magical adventure.","[Harry, Harry, Ron, Hermione]",Mystery,"Harry Potter and the Chamber of Secrets is the second book in the Harry Potter series. Themes of courage, friendship, and standing up for what’s right are explored in this gripping magical adventure."


In [48]:
embeddings[13]

[-0.1166880875825882,
 0.22490964829921722,
 0.06056506186723709,
 0.0976678654551506,
 -0.35292649269104004,
 -0.06798211485147476,
 0.05228119343519211,
 -0.014416945166885853,
 -0.03666859120130539,
 -0.5411457419395447,
 -0.0257106926292181,
 -0.013886462897062302,
 -0.4430098235607147,
 -0.03487422689795494,
 -0.3365682363510132,
 -0.3443391025066376,
 0.10596868395805359,
 0.2932366728782654,
 0.053656451404094696,
 0.07165796309709549,
 0.07116420567035675,
 0.11632060259580612,
 0.14712665975093842,
 -0.18948404490947723,
 0.3270193636417389,
 -0.08084936439990997,
 0.19062568247318268,
 -0.05463521182537079,
 0.07546772062778473,
 -0.9492775797843933,
 -0.07882417738437653,
 -0.13350403308868408,
 -0.24860209226608276,
 -0.051666393876075745,
 -0.3311570882797241,
 0.22189638018608093,
 -0.2208981215953827,
 0.5838286876678467,
 0.526788592338562,
 -0.029043162241578102,
 -0.10845305770635605,
 0.055085599422454834,
 -0.4616174101829529,
 0.46640753746032715,
 -0.0006685070693

In [52]:
# save the embedding for that movie
import numpy as np

# 2D Array (Row Vector): [[0.1, 0.5, ...]]
# This is explicitly 1 row containing many columns
embedding_hp = np.array(embeddings[13]).reshape(1, -1)
embedding_hp.shape

(1, 384)

In [53]:
# save the embeddings for all movies
embeddings_books = np.vstack(embeddings)
embeddings_books.shape

(100, 384)

In [54]:
# calculate the cosine similarity scores
from sklearn.metrics.pairwise import cosine_similarity

similarity_scores_hp = cosine_similarity(embedding_hp, embeddings_books)
similarity_scores_hp_series = pd.Series(similarity_scores_hp.flatten(), name='similarity_score')
similarity_scores_hp_series.head()

0    0.720350
1    0.697348
2    0.690898
3    0.645620
4    0.705942
Name: similarity_score, dtype: float64

In [55]:
# combine movie titles, descriptions and scores
similarity_scores_hp_df = pd.concat([df_childrens[['Title', 'Description']], similarity_scores_hp_series], axis=1)
similarity_scores_hp_df.head()

Unnamed: 0,Title,Description,similarity_score
0,Where the Wild Things Are,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story.",0.72035
1,The Very Hungry Caterpillar,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education.",0.697348
2,The Giving Tree,"The Giving Tree is a touching and bittersweet story about a tree that gives everything it has to a boy over the course of his life. As the boy grows up, he takes more from the tree, and the tree continues to give, even when it has little left. Silverstein’s minimalist text and illustrations convey deep themes of unconditional love, selflessness, and the passage of time. It has sparked much discussion about relationships and sacrifice.",0.690898
3,Green Eggs and Ham,"In Green Eggs and Ham, Sam-I-Am tries to convince a reluctant character to try a dish of green eggs and ham, despite his resistance. Through repetition and rhyme, Dr. Seuss’s classic story about being open to new experiences encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers.",0.64562
4,Goodnight Moon,"Goodnight Moon is a gentle, rhythmic bedtime story where a little bunny says goodnight to everything in his room, from the moon to the ""quiet old lady whispering hush."" Its repetitive structure and comforting tone make it ideal for young children. The simple illustrations by Clement Hurd complement the soothing nature of the story, making it a beloved classic for sleep-time reading.",0.705942


In [56]:
# view the top 5 most similar movies
similarity_scores_hp_df.sort_values('similarity_score', ascending=False).head()

Unnamed: 0,Title,Description,similarity_score
13,"Harry Potter and the Sorcerer's Stone (Harry Potter, #1)","Harry Potter and the Sorcerer’s Stone introduces readers to Harry Potter, an orphan who discovers that he is a wizard and attends the magical Hogwarts School of Witchcraft and Wizardry. Along with his new friends, Harry uncovers mysteries surrounding his past and the dark wizard who killed his parents. This book starts the beloved series and sets the stage for Harry’s journey, filled with magic, adventure, and friendship.",1.0
97,"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)","Harry Potter and the Prisoner of Azkaban is the third book in the Harry Potter series, where Harry returns to Hogwarts for his third year and uncovers secrets about his past. With the arrival of the mysterious Sirius Black, Harry must navigate dark truths and face his fears. This thrilling installment explores themes of loyalty, friendship, and identity, marking a turning point in the magical world of Harry Potter.",0.872638
98,"Harry Potter and the Chamber of Secrets (Harry Potter, #2)","Harry Potter and the Chamber of Secrets is the second book in the Harry Potter series, where Harry returns to Hogwarts for his second year and uncovers a hidden chamber within the school. As mysterious events unfold, Harry and his friends Ron and Hermione uncover dark secrets about the school’s past. Themes of courage, friendship, and standing up for what’s right are explored in this gripping magical adventure.",0.855368
63,The Witches,"The Witches tells the story of a young boy and his grandmother who uncover a secret society of witches who despise children and plot to turn them all into mice. With the help of his grandmother, the boy must outwit the witches and save the children. The book is known for its dark humor, thrilling suspense, and memorable characters. Though it can be a bit scary, it is beloved for its unique blend of fear, adventure, and courage.",0.799051
55,"The Wonderful Wizard of Oz (Oz, #1)","The Wonderful Wizard of Oz is the first book in Baum's Oz series and tells the story of Dorothy, a young girl from Kansas who is swept away to the magical land of Oz. Along with her new friends—the Scarecrow, Tin Man, and Cowardly Lion—Dorothy embarks on a journey to meet the Wizard and find her way home. The book is filled with themes of friendship, courage, and the belief in oneself, and has become an iconic tale in American literature.",0.788523


In [None]:
# It can also be donde by a function for any book