<a href="https://colab.research.google.com/github/alankent/OrdinaryCartoonMaker/blob/main/Simple_Text_Classification_by_Character_and_Common_Actions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A simple text classification demonstration

Challenge: Given a description of a camera shot in a screenplay, can we work out the characters in the shot and the actions they are taking?

https://practicaldatascience.co.uk/machine-learning/how-to-classify-customer-service-emails-with-bart-mnli

Install the required python libraries

In [38]:
!pip install transformers pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Pick the classifications we want to pick for each segment of text. Here I include the names of the characters and some common actions they perform for which I have animation clips.

In [50]:
labels = ['Abigail', 'Amy', 'Bear', 'Cathy', 'Deb', 'DebInBlack', 'Doug', 'Elanor', 'Hank', 'Helen', 'Ivy', 'Liana', 'Mandy', 'MrsB', 'MrsShort', 'Nathan', 'Sam', 'Terry', 'Tom',
          'sit', 'stand', 'walk', 'jog', 'run', 'wave', 'happy', 'angry', 'eye roll', 'embarassed', 'shy', 'worried', 'shocked', 'bored']


Load up the Hugging Face pipeline

In [40]:
from transformers import pipeline


Create a classifiers. Zero-shot classification can work out new classification categories without prior training. It uses the text of the labels to come up with classifications on the fly.

In [41]:
classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')


Helper function to call the classifier then print all the labels with a score of 0.5 or higher.

In [42]:
def analyze(text):
  result = classifier(text, labels, multi_label=True)
  return (text, [label for index, label in enumerate(result['labels']) if result['scores'][index] >= 0.5])

A nice easy starting point. It guessed emotions (incorrectly), but got the actions right.

In [43]:
analyze("Sam ran to the chair and sat down.")

('Sam ran to the chair and sat down.', ['Sam', 'sit', 'run', 'worried', 'shy'])

It picks up the mentioning of "Hank's chair" as a reference to Hank.

In [44]:
analyze("Sam ran to Hank's chair and sat down.")

("Sam ran to Hank's chair and sat down.",
 ['sit', 'Hank', 'Sam', 'run', 'worried'])

It picked up that he was angry without using that word directly.

In [45]:
analyze("Sam ran to his desk and sat down. He was upset that he was late again to school.")

('Sam ran to his desk and sat down. He was upset that he was late again to school.',
 ['Sam', 'angry', 'worried', 'sit', 'embarassed', 'run'])

Note that it picked up "MrsB" from the labels that was written "Mrs B" in the text.

In [51]:
analyze("Sam walked beside Mrs B, and smiling, asked if he could help her.")

('Sam walked beside Mrs B, and smiling, asked if he could help her.',
 ['Sam', 'happy', 'walk', 'MrsB', 'run', 'stand', 'shy'])

It correctly picked up Mrs Short instead of Mrs B, and got some emotions from the act of throwing the book. More emtions than wanted, but it did pick "angry" without that word appearing in the text. Maybe keep the first 5 labels (if score is over 0.5), but keep all people references. That might filter down the list a bit better.

In [47]:
analyze("Mrs Short threw the book at Hank and told him to sit down in his seat.")

('Mrs Short threw the book at Hank and told him to sit down in his seat.',
 ['sit',
  'Hank',
  'angry',
  'shocked',
  'worried',
  'MrsShort',
  'embarassed',
  'run',
  'shy'])

Picked up the eye roll, but had a lot of extra wrong actions (like running instead of walking to leave the room). Note the mention of "joke" did not trigger "happy" as an emotion.

In [48]:
analyze("Hank rolled his eyes at the terrible joke Sam made and left the room.")

('Hank rolled his eyes at the terrible joke Sam made and left the room.',
 ['Hank',
  'eye roll',
  'embarassed',
  'Sam',
  'shocked',
  'angry',
  'worried',
  'run'])

In [49]:
classifier("Hank rolled his eyes at the terrible joke Sam made and left the room.", labels, multi_label=True)

{'sequence': 'Hank rolled his eyes at the terrible joke Sam made and left the room.',
 'labels': ['Hank',
  'eye roll',
  'embarassed',
  'Sam',
  'shocked',
  'angry',
  'worried',
  'run',
  'walk',
  'wave',
  'jog',
  'bored',
  'shy',
  'sit',
  'stand',
  'DebInBlack',
  'MrsB',
  'smile',
  'Elanor',
  'MrsShort',
  'Bear',
  'Helen',
  'Abigail',
  'Liana',
  'Nathan',
  'Deb',
  'Mandy',
  'Tom',
  'Amy',
  'Doug',
  'Terry',
  'Cathy',
  'Ivy'],
 'scores': [0.9942629933357239,
  0.9917135238647461,
  0.9904573559761047,
  0.9770090579986572,
  0.8545532822608948,
  0.8418623208999634,
  0.8180244565010071,
  0.5128270387649536,
  0.18781821429729462,
  0.1679617315530777,
  0.1172691211104393,
  0.10005620121955872,
  0.08684126287698746,
  0.07791844010353088,
  0.028663260862231255,
  0.008142070844769478,
  0.007150366436690092,
  0.004290085285902023,
  0.001344656222499907,
  0.0012983978958800435,
  0.0009933232795447111,
  0.000341181323165074,
  0.00030627395608462393