# Hugging Face Transformers Assignments

## 1. Sentiment Analysis

1. Create a new _nlp_transformers_ environment
2. Launch Jupyter Notebook
3. Read in the movie reviews data set including the VADER sentiment scores (_movie_reviews_sentiment.csv_)
4. Apply sentiment analysis to the _movie_info_ column using transformers
5. Compare the transformers sentiment scores with the VADER sentiment scores

In [1]:
import pandas as pd

pd.set_option('display.max_colwidth', None) # Default is 50, None shows all text

movies = pd.read_csv('../Data/movie_reviews_sentiment.csv')
movies.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment_vader
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237


In [2]:
from transformers import pipeline, logging

logging.set_verbosity_error()

sentiment_analyzer = pipeline('sentiment-analysis', 
                              model='distilbert/distilbert-base-uncased-finetuned-sst-2-english',
                              device=0 # -1 to use CPU
                             )

In [3]:
sentiment_scores = movies.movie_info.apply(sentiment_analyzer)

In [4]:
movies['label_hf'] = sentiment_scores.apply(lambda x: x[0]['label'])
movies['score_hf'] = sentiment_scores.apply(lambda x: x[0]['score'])
movies['sentiment_hf'] = movies.apply(lambda row: row['score_hf'] if row['label_hf'] == 'POSITIVE' else -row['score_hf'], axis=1)

In [5]:
movies[['movie_title', 'movie_info', 'sentiment_vader', 'sentiment_hf']].head()

Unnamed: 0,movie_title,movie_info,sentiment_vader,sentiment_hf
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",0.9837,0.998247
1,A Dog's Way Home,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",0.9237,0.999534
2,A Tuba to Cuba,"The leader of New Orleans' famed Preservation Hall Jazz Band seeks to fulfill his late father's dream of retracing their musical roots to the shores of Cuba in search of the indigenous music that gave birth to New Orleans jazz. A TUBA TO CUBA celebrates the triumph of the human spirit expressed through the universal language of music and challenges us to resolve to build bridges, not walls.",0.936,0.999443
3,A Vigilante,"A once abused woman, Sadie (Olivia Wilde), devotes herself to ridding victims of their domestic abusers while hunting down the husband she must kill to truly be free. A Vigilante is a thriller inspired by the strength and bravery of real domestic abuse survivors and the incredible obstacles to safety they face.",-0.0334,0.99946
4,After,"Based on Anna Todd's best-selling novel which became a publishing sensation on social storytelling platform Wattpad, AFTER follows Tessa (Langford), a dedicated student, dutiful daughter and loyal girlfriend to her high school sweetheart, as she enters her first semester in college. Armed with grand ambitions for her future, her guarded world opens up when she meets the dark and mysterious Hardin Scott (Tiffin), a magnetic, brooding rebel who makes her question all she thought she knew about herself and what she wants out of life.",0.9349,0.997202


In [6]:
movies[['movie_title', 'movie_info', 'sentiment_vader', 'sentiment_hf']].sort_values('sentiment_hf').head()

Unnamed: 0,movie_title,movie_info,sentiment_vader,sentiment_hf
22,Braid,"Two wanted women decide to rob their wealthy yet mentally unstable friend who lives in a fantasy world they all created as children. To take her money, the girls must take part in a deadly and perverse game of make believe throughout a sprawling yet decaying estate. As things become increasingly violent and hallucinatory, they realize that obtaining the money may be the least of their concerns.",-0.8316,-0.999203
103,Spider-Man: Far From Home,"Peter Parker returns in Spider-Man: Far From Home, the next chapter of the Spider-Man: Homecoming series! Our friendly neighborhood Super Hero decides to join his best friends Ned, MJ, and the rest of the gang on a European vacation. However, Peter's plan to leave super heroics behind for a few weeks are quickly scrapped when he begrudgingly agrees to help Nick Fury uncover the mystery of several elemental creature attacks, creating havoc across the continent!",0.9722,-0.998805
34,Dragged Across Concrete,"DRAGGED ACROSS CONCRETE follows two police detectives who find themselves suspended when a video of their strong-arm tactics is leaked to the media. With little money and no options, the embittered policemen descend into the criminal underworld and find more than they wanted waiting in the shadows.",-0.9015,-0.998734
165,Yesterday,"Jack Malik (Himesh Patel, BBC's Eastenders) is a struggling singer-songwriter in a tiny English seaside town whose dreams of fame are rapidly fading, despite the fierce devotion and support of his childhood best friend, Ellie (Lily James, Mamma Mia! Here We Go Again). Then, after a freak bus accident during a mysterious global blackout, Jack wakes up to discover that The Beatles have never existed... and he finds himself with a very complicated problem, indeed.",0.1365,-0.998447
102,Skin,"A white supremacist reforms his life after falling in love but saying goodbye to his skinhead life isn't a clean process. He must betray his former gang and work alongside the FBI in order to remove the body ink that has represented his identity for so long, as well as the burden of the gang's crimes he has carried.",-0.8377,-0.996846


## 2. Named Entity Recognition

1. Read in the children's books data set (_childrens_books.csv_)
2. Apply NER to the Description column
3. Create a list of all named entities
4. Only include the people (PER)
5. _Extra credit:_ Exclude the authors as well

In [7]:
#2.1
books = pd.read_csv('../Data/childrens_books.csv')
books.head()

Unnamed: 0,Ranking,Title,Author,Year,Rating,Description
0,1,Where the Wild Things Are,Maurice Sendak,1963,4.25,"Where the Wild Things Are follows Max, a young boy who, after being sent to his room for misbehaving, imagines sailing to an island filled with wild creatures. As their king, Max tames the beasts and eventually returns home to find his supper waiting for him. This iconic book explores themes of imagination, adventure, and the complex emotions of childhood, all captured through Sendak's whimsical illustrations and story."
1,2,The Very Hungry Caterpillar,Eric Carle,1969,4.34,"The Very Hungry Caterpillar tells the story of a caterpillar who eats through a variety of foods before eventually becoming a butterfly. Eric Carle’s use of colorful collage illustrations and rhythmic text has made this book a beloved classic for young readers. The simple, engaging story introduces children to days of the week, counting, and the concept of metamorphosis. It’s a staple in early childhood education."
2,3,The Giving Tree,Shel Silverstein,1964,4.38,"The Giving Tree is a touching and bittersweet story about a tree that gives everything it has to a boy over the course of his life. As the boy grows up, he takes more from the tree, and the tree continues to give, even when it has little left. Silverstein’s minimalist text and illustrations convey deep themes of unconditional love, selflessness, and the passage of time. It has sparked much discussion about relationships and sacrifice."
3,4,Green Eggs and Ham,Dr. Seuss,1960,4.31,"In Green Eggs and Ham, Sam-I-Am tries to convince a reluctant character to try a dish of green eggs and ham, despite his resistance. Through repetition and rhyme, Dr. Seuss’s classic story about being open to new experiences encourages children to be adventurous and try things outside their comfort zone. The playful illustrations and humorous dialogue make it a fun and educational read for young readers."
4,5,Goodnight Moon,Margaret Wise Brown,1947,4.31,"Goodnight Moon is a gentle, rhythmic bedtime story where a little bunny says goodnight to everything in his room, from the moon to the ""quiet old lady whispering hush."" Its repetitive structure and comforting tone make it ideal for young children. The simple illustrations by Clement Hurd complement the soothing nature of the story, making it a beloved classic for sleep-time reading."


In [8]:
#2.2
ner_analyzer = pipeline('ner',
                        model='dbmdz/bert-large-cased-finetuned-conll03-english',
                        device=0, # GPU
                        aggregation_strategy='SIMPLE'
                       )

In [9]:
ner_analyzer(books.Description[0])

[{'entity_group': 'MISC',
  'score': 0.94625187,
  'word': 'Where the Wild Things Are',
  'start': 0,
  'end': 25},
 {'entity_group': 'PER',
  'score': 0.9990614,
  'word': 'Max',
  'start': 34,
  'end': 37},
 {'entity_group': 'PER',
  'score': 0.9984414,
  'word': 'Max',
  'start': 175,
  'end': 178},
 {'entity_group': 'PER',
  'score': 0.9789461,
  'word': 'Sendak',
  'start': 380,
  'end': 386}]

In [10]:
[entity['word'] for entity in ner_analyzer(books.Description[0]) if entity['entity_group'] == 'PER']

['Max', 'Max', 'Sendak']

In [11]:
# 2.3
named_entities = books.Description.apply(lambda row: [entity['word'] for entity in ner_analyzer(row) if entity['entity_group'] == 'PER'])
named_entities

0                  [Max, Max, Sendak]
1                  [##pi, Eric Carle]
2                       [Silverstein]
3           [Sam - I - Am, Dr. Seuss]
4                      [Clement Hurd]
                   ...               
95                [Jon J. Muth, Muth]
96    [Shel Silverstein, Silverstein]
97       [Harry, Sirius Black, Harry]
98      [Harry, Harry, Ron, Hermione]
99                          [Galdone]
Name: Description, Length: 100, dtype: object

In [12]:
named_entities = list(set(named_entities.explode().dropna().tolist()))
named_entities

['Seuss',
 'Henkes',
 'Sam - I - Am',
 'Jamie',
 'Eeyore',
 'Amelia Bedelia',
 '##y',
 'Bemelmans',
 'Atreyu',
 'Milo',
 'Despereaux Tilling',
 '##crow',
 'Huck Finn',
 'McGregor',
 'Jonas',
 'Tom',
 'Cord',
 'Harry Potter',
 'A',
 'Mary',
 'C',
 'Pooh',
 'Peter',
 'Thing',
 'Piglet',
 'Babar',
 'Ferdinand',
 'Wizard',
 'Hu',
 'Basil E. Frankweiler',
 'Keats',
 'Little Bear',
 'Twain',
 'Ramona',
 '##ch',
 'Sal',
 'Silverstein',
 'Beatrix Potter',
 'Dr',
 'Falconer',
 'Harold',
 'Laura Ingalls',
 'Laura Ingalls Wilder',
 'Miss Honey',
 'Grover',
 'Winnie - the',
 'Sachar',
 'Jon J. Muth',
 'Sendak',
 'Meg Murry',
 'De Brunhoff',
 '##ula',
 'Crockett Johnson',
 'Hermione',
 'Cat in the Hat',
 'H',
 'Tock',
 'Leslie Burke',
 'Muth',
 'Big Bad Wolf',
 'Burton',
 'Baum',
 'Ramon',
 'Charlie',
 'A. Rey',
 'Frog',
 'Anne',
 'Gandalf',
 'Cleary',
 'Roald Dahl',
 'Lorax',
 'S',
 '##G',
 'Laura',
 'Charles Wallace',
 'Shel Silverstein',
 'Winnie - the - Pooh',
 'Ron',
 'Arnold Lobel',
 'Viola S

In [13]:
named_entities = [entity for entity in named_entities] # if '#' not in entity
named_entities[:10]

['Seuss',
 'Henkes',
 'Sam - I - Am',
 'Jamie',
 'Eeyore',
 'Amelia Bedelia',
 '##y',
 'Bemelmans',
 'Atreyu',
 'Milo']

In [14]:
len(named_entities)

165

In [15]:
#2.5
authors = list(set(books.Author.tolist()))
authors[:10]

['Eric Carle',
 'Jon J. Muth',
 'Jon Stone',
 'E.B. White',
 'Marcus Pfister',
 'Michael Ende',
 'E.L. Konigsburg',
 'Judi Barrett',
 'Lewis Carroll',
 'Katherine Paterson']

In [16]:
named_entities_clean = [entity for entity in named_entities if entity not in authors and '#' not in entity]
named_entities_clean

['Seuss',
 'Henkes',
 'Sam - I - Am',
 'Jamie',
 'Eeyore',
 'Amelia Bedelia',
 'Bemelmans',
 'Atreyu',
 'Milo',
 'Despereaux Tilling',
 'Huck Finn',
 'McGregor',
 'Jonas',
 'Tom',
 'Cord',
 'Harry Potter',
 'A',
 'Mary',
 'C',
 'Pooh',
 'Peter',
 'Thing',
 'Piglet',
 'Babar',
 'Ferdinand',
 'Wizard',
 'Hu',
 'Basil E. Frankweiler',
 'Keats',
 'Little Bear',
 'Twain',
 'Ramona',
 'Sal',
 'Silverstein',
 'Dr',
 'Falconer',
 'Harold',
 'Laura Ingalls',
 'Miss Honey',
 'Grover',
 'Winnie - the',
 'Sachar',
 'Sendak',
 'Meg Murry',
 'De Brunhoff',
 'Hermione',
 'Cat in the Hat',
 'H',
 'Tock',
 'Leslie Burke',
 'Muth',
 'Big Bad Wolf',
 'Burton',
 'Baum',
 'Ramon',
 'Charlie',
 'A. Rey',
 'Frog',
 'Anne',
 'Gandalf',
 'Cleary',
 'Lorax',
 'S',
 'Laura',
 'Charles Wallace',
 'Winnie - the - Pooh',
 'Ron',
 'Viola Swamp',
 'George',
 'Matthew Cuthbert',
 'Stellaluna',
 'T',
 'Horton',
 'Bastian',
 'Little Nutbrown Hare',
 'Big Nutbrown Hare',
 'Dorothy',
 'Bilbo Baggins',
 'Ralph',
 'A. Milne

In [17]:
len(named_entities_clean)

145

## 3. Zero-Shot Classification

1. Apply zero-shot classification to the Description column using these five categories:
* adventure & fantasy
* animals & nature
* mystery
* humor
* non-fiction
2. Find the number of books in each category and check a few to see if the results make sense

## 4. Text Summarization

1. Apply text summarization to the Description column
2. Review the results to see if they make sense

## 5. Document Similarity

1. Turn the Description column into embeddings using feature extraction
2. Compare the cosine similarity of Harry Potter and the Sorcerer’s Stone compared to all other books
3. Return the top 5 most similar books