<a href="https://colab.research.google.com/github/caitlyncrow/CS195-Portfolio/blob/main/Crow_F1_1_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS 195: Natural Language Processing
## Introduction to the Hugging Face Transformers Library

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericmanley/f23-CS195NLP/blob/main/F1_1_HuggingFace.ipynb)


## References

Hugging Face *Quicktour*: https://huggingface.co/docs/transformers/quicktour

Hugging Face *Run Inference with Pipelines tutorial*: https://huggingface.co/docs/transformers/pipeline_tutorial

Hugging Face *NLP Course, Chapter 2*: https://huggingface.co/learn/nlp-course/chapter2/1

## What is Hugging Face?

Hugging Face is a private company
* Founded in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond, and Thomas Wolf
* Based in New York City

Provide a popular free, open-source Python library called **transformers** for NLP (and other) tasks

Host *hundreds of thousands of models* that you can use in your own programs

## Installing the transformers module

This is my favored way of installing packages from a Jupyter Notebook

If you have lots of Python distributions installed, it should use the right one

It may take a few minutes, but *you should only have to do this once*

In [None]:
import sys
!{sys.executable} -m pip install transformers

Collecting transformers
  Downloading transformers-4.32.1-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m36.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m59.1 MB/s[0m eta [36m0:00:0

## Using the sentiment analysis pipeline

**Sentiment analysis** attempts to identify the overall feeling intended by the writer of some text

The creators of this model **trained** it on lots of examples of text that were labeled as either *positive* or *negative*

A **pipeline** is a series of steps for performing **inference**
* tokenize and preprocess the input text (more on this later)
* ask the model for a prediction
* post-process model's result and turn it into something you can use

![full_nlp_pipeline.svg](https://github.com/ericmanley/f23-CS195NLP/blob/main/images/full_nlp_pipeline.svg?raw=1)
image source: https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt

We *are* specifying the kind of task: `sentiment-analysis`

We *are not* asking for a specific model, so it picks one of many it has by default

The first time you do this, it will have to download the model - this can take some time depending on your network connection

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

classifier("I love how easy it is to build sentiment-aware applications with the transformers library!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9984305500984192}]

**Test it out:** Try changing the input to get different labels/scores

## Working with batches of text

To get classifications of many different examples, pass in a list of strings.

In [None]:
results = classifier(["It's really cool that you can get classifications for a whole batch of text",
                      "I wonder if the rest of the class will be this easy.",
                     "Spolier alert: it won't be."])
print(results)

[{'label': 'POSITIVE', 'score': 0.9991173148155212}, {'label': 'NEGATIVE', 'score': 0.9557349681854248}, {'label': 'NEGATIVE', 'score': 0.9962737560272217}]


Note that the results come back as a list of dictionaries, so you can manipulate it in the normal ways.

In [None]:
print("The sentence had",results[0]["label"],"sentiment, with a score of",results[0]["score"])

The sentence had POSITIVE sentiment, with a score of 0.9991173148155212


## Exercise: Specifying a model

Now try asking for a specific model.

Replace one line of code in your earlier example.

In [None]:
classifier = pipeline("sentiment-analysis", model="SamLowe/roberta-base-go_emotions")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/380 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [None]:
results = classifier(["It's really cool that you can get classifications for a whole batch of text",
                      "There's no way I'm going in that creepy hallway.",
                     "Nothing in this world like beats Folger's coffee!",
                     "I hope there are still parking spaces left.",
                      "I hope there are still parking spaces left!",
                      "Nobody out pizzas the Hut!"])
print(results)

[{'label': 'admiration', 'score': 0.5409085750579834}, {'label': 'disapproval', 'score': 0.4904636740684509}, {'label': 'neutral', 'score': 0.4120086431503296}, {'label': 'optimism', 'score': 0.899702250957489}, {'label': 'optimism', 'score': 0.9017206430435181}, {'label': 'neutral', 'score': 0.9167095422744751}]


How is this model different from the first model?

Create a cell in this notebook and note the differences you see


The results are types of emotions and the score seems to have a larger range of possible values. Puncuation seems to affect the classification. Neutral can have a surpisingly high

## Applied Exploration

The `roberta-base-go_emotions` model is documented here: https://huggingface.co/SamLowe/roberta-base-go_emotions

Answer some questions about this:
* What is `roberta-base`? Write down some things you can learn about it from the documentation.
The model is multi-classification with an average accuarcy of 0.41, but individual labels perform differently. Perhaps in part by the differences in the number of examples for each label.

* What is `go_emotions`? Write down some things you can learn about it from the documentation.
It uses dummy variables for its classification. In addition to the 28 possible labels and the example text, there are 8 other fields, although they all appear to be meta data.

Go to the Hugging Face models page: https://huggingface.co/models
* click `Text Classification`
* Try some additional models
    - test out at least one more sentiment/emotions model
    - test out at least two other kinds of models - like news topic classification or spam detection
    - write down some info about the models you found
        - what is it for?
        - who made it?
        - what kind of data was it trained on?
        - are they based on some other model and trained on new data (*fine-tuned*) for a specific task?

In [None]:
#j-hartmann/emotion-english-distilroberta-base
# Emotion detector with 6 possible emotions
from transformers import pipeline

pipe = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base")

results = pipe(["I can't believe you did that!",
                "That hallway is creepy.",
                "That hallway is creepy!",
                "I don't want to trip",
                "Please stop!",
                "Please stop."
                ])
print(results)

[{'label': 'surprise', 'score': 0.921724796295166}, {'label': 'disgust', 'score': 0.7788035869598389}, {'label': 'disgust', 'score': 0.7577598094940186}, {'label': 'sadness', 'score': 0.9428707361221313}, {'label': 'anger', 'score': 0.8825500011444092}, {'label': 'anger', 'score': 0.6266079545021057}]


In [None]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("d4data/bias-detection-model")
model = TFAutoModelForSequenceClassification.from_pretrained("d4data/bias-detection-model")

classifier = pipeline('text-classification', model=model, tokenizer=tokenizer) # cuda = 0,1 based on gpu availability
#classifier("The irony, of course, is that the exhibit that invites people to throw trash at vacuuming Ivanka Trump lookalike reflects every stereotype feminists claim to stand against, oversexualizing Ivanka’s body and ignoring her hard work.")
classifier("I loved Biden's hat. It was a neat hat.")

Some layers from the model checkpoint at d4data/bias-detection-model were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at d4data/bias-detection-model and are newly initialized: ['dropout_39']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'label': 'Non-biased', 'score': 0.5795091986656189}]

In [None]:
#CogComp/bart-faithful-summary-detector
# Use a pipeline as a high-level helper
from transformers import pipeline

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="CogComp/bart-faithful-summary-detector")

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("CogComp/bart-faithful-summary-detector")


# model = AutoModelForSequenceClassification.from_pretrained("CogComp/bart-faithful-summary-detector")

article = "Ban Ki-Moon was re-elected for a second term by the UN General Assembly, unopposed and unanimously, on 21 June 2011."

bad_summary = "Ban Ki-moon was elected for a second term in 2007."
good_summary = "Ban Ki-moon was elected for a second term in 2011."

bad_pair = tokenizer(text=bad_summary, text_pair=article)
good_pair = tokenizer(text=good_summary, text_pair=article)

bad_score = pipe(bad_pair)
good_score = pipe(good_pair)

print(good_score[0][:, 1] > bad_score[0][:, 1]) # True, label mapping: "0" -> "Hallucinated" "1" -> "Faithful"



You passed along `num_labels=3` with an incompatible id to label map: {'0': 'HALLUCINATED', '1': 'FAITHFUL'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'HALLUCINATED', '1': 'FAITHFUL'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'HALLUCINATED', '1': 'FAITHFUL'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'HALLUCINATED', '1': 'FAITHFUL'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'HALLUCINATED', '1': 'FAITHFUL'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'HALLUCINATED', '1': 'FAITHFUL'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label

ValueError: ignored