# Text Sentiment Classification with Hugging Face Pipeline
* Notebook by Adam Lang
* Date: 12/3/24

# Overview
* This is an implementation of a text sentiment classification using hugging face pipelines.


# Install Dependencies
* We have to install `Sacremoses'.
  * Sacremoses is a Python library that provides a port of the Moses tokenizer, truecaser, and other text normalization tools used in natural language processing (NLP).
  * link: https://pypi.org/project/sacremoses/

In [2]:
!pip install -U transformers #upgrades
!pip install -U sentencepiece #upgrades
!pip install -U sacremoses #upgrades

Collecting transformers
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.46.3-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m98.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.46.2
    Uninstalling transformers-4.46.2:
      Successfully uninstalled transformers-4.46.2
Successfully installed transformers-4.46.3


In [3]:
# imports
from transformers import pipeline
import pandas as pd


# Setup Text Classification Pipeline
* If you don't give the pipeline a model it will choose the default model for text classification which in this case is:
  * `distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f`(https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)

## Default Transformer model overview
* This model was pre-trained on `distilbert-base-uncased`
* The model was then finetunedon the `sst-2-english` dataset: https://huggingface.co/datasets/stanfordnlp/sst2

In [5]:
## pipeline for text classification
classifier = pipeline("text-classification")

# demo with dummy data
text = "I despise you"
outputs = classifier(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Unnamed: 0,label,score
0,NEGATIVE,0.992819


# Using a task specific text classification model
* In this example I am going to use a task specific model: `SamLowe/roberta-base-go_emotions`
* This model will give multi-label output for emotions detected in your data.
* Model card: https://huggingface.co/SamLowe/roberta-base-go_emotions
* If we look at the `config.json` file of the model we can see there are 28 different labels:
```"id2label": {
    "0": "admiration",
    "1": "amusement",
    "2": "anger",
    "3": "annoyance",
    "4": "approval",
    "5": "caring",
    "6": "confusion",
    "7": "curiosity",
    "8": "desire",
    "9": "disappointment",
    "10": "disapproval",
    "11": "disgust",
    "12": "embarrassment",
    "13": "excitement",
    "14": "fear",
    "15": "gratitude",
    "16": "grief",
    "17": "joy",
    "18": "love",
    "19": "nervousness",
    "20": "optimism",
    "21": "pride",
    "22": "realization",
    "23": "relief",
    "24": "remorse",
    "25": "sadness",
    "26": "surprise",
    "27": "neutral"
  },
```

In [8]:
# pipeline setup
classifier_2 = pipeline("text-classification",
                        model="SamLowe/roberta-base-go_emotions")

## demo with dummy data
text_2 = "wow that is amazing!"
outputs_2 = classifier_2(text_2)
pd.DataFrame(outputs_2)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Unnamed: 0,label,score
0,admiration,0.862785


# Summary
* This was a basic example of a pipeline using text classification.
* The great thing about pipelines is that you can take a pre-trained model and run it out of the box for your task(s) of choice.
* You can then fine-tune the model on your data as needed.