In [1]:
# Based on Huggingface interface
# - https://huggingface.co/transformers/notebooks.html
# - https://github.com/huggingface/notebooks/blob/master/transformers_doc/quicktour.ipynb
# - More examples with LLMs at: https://github.com/biplav-s/course-tai/blob/3a37536b00a0b386d32cb29da61b1ce68f72cfdb/sample-code/l13-l16-supervised-text/l15-langmodel-commontasks.ipynb

In [2]:
# Transformers installation, if needed
#! pip install transformers datasets

In [3]:
# Using Huggingface pipeline abstraction for common tasks
# - Pipelines: https://huggingface.co/docs/transformers/main_classes/pipelines
# - Sentiment based handling: https://huggingface.co/blog/sentiment-analysis-python

In [4]:
# Default model used is - "distilbert-base-uncased-finetuned-sst-2-english"
from transformers import pipeline
classifier = pipeline('sentiment-analysis')

In [5]:
data = ["She is angry most of the time, but she makes me happy"]

In [6]:
# Now run to see sentiments
results = classifier(data)
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9998


In [7]:
# Some more general data

In [8]:
data2 = ["this is good",
        "this is not bad",
        "this is bad bad bad",
        "this is too good",
         "this is too bad",
         "this is not bad",
        "No one did a bad action", 
         "Jamil did a bad action",         
         "John did a bad action"]

In [9]:
# Defining a convenient function to run on a list of inputs 
def sentimClassifyPrint(alist):
    results = classifier(alist)
    for result in results:
        print(alist[results.index(result)] + f" <-> label: {result['label']}, with score: {round(result['score'], 4)}")

In [10]:
sentimClassifyPrint(data2)

this is good <-> label: POSITIVE, with score: 0.9998
this is not bad <-> label: POSITIVE, with score: 0.9997
this is bad bad bad <-> label: NEGATIVE, with score: 0.9998
this is too good <-> label: POSITIVE, with score: 0.9997
this is too bad <-> label: NEGATIVE, with score: 0.9998
this is not bad <-> label: POSITIVE, with score: 0.9997
No one did a bad action <-> label: POSITIVE, with score: 0.9989
Jamil did a bad action <-> label: NEGATIVE, with score: 0.9982
John did a bad action <-> label: NEGATIVE, with score: 0.9989


In [11]:
# Now some data from water domain

In [12]:
data3 = ["NSDWRs (or secondary standards) are non-enforceable guidelines regulating contaminants that may cause cosmetic effects (such as skin or tooth discoloration) or aesthetic effects (such as taste, odor, or color) in drinking water.",
        " EPA recommends secondary standards to water systems but does not require systems to comply with the standard. ",
        "However, states may choose to adopt them as enforceable standards."]


In [13]:
sentimClassifyPrint(data3)

NSDWRs (or secondary standards) are non-enforceable guidelines regulating contaminants that may cause cosmetic effects (such as skin or tooth discoloration) or aesthetic effects (such as taste, odor, or color) in drinking water. <-> label: NEGATIVE, with score: 0.9922
 EPA recommends secondary standards to water systems but does not require systems to comply with the standard.  <-> label: POSITIVE, with score: 0.9548
However, states may choose to adopt them as enforceable standards. <-> label: NEGATIVE, with score: 0.9046


In [14]:
# Many statements above are more neutral than positive or negative but the model is confident

# Other common sentiment models to use
'''
- From: https://github.com/huggingface/notebooks/blob/main/transformers_doc/quicktour.ipynb
    1. Twitter-roberta-base-sentiment is a roBERTa model trained on ~58M tweets 
          and fine-tuned for sentiment analysis. Fine-tuning is the process of taking a 
          pre-trained large language model (e.g. roBERTa in this case) and then tweaking it 
          with additional training data to make it perform a second similar task (e.g. 
          sentiment analysis).
    2. Bert-base-multilingual-uncased-sentiment is a model fine-tuned for 
          sentiment analysis on product reviews in six languages: English, Dutch, German, 
          French, Spanish and Italian.
    3. Distilbert-base-uncased-emotion is a model fine-tuned for detecting emotions 
          in texts, including sadness, joy, love, anger, fear and surprise.
'''