# Software Engineering
## Sentiment Analysis Module (roBERTa)
- https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment
- https://huggingface.co/roberta-base
- https://www.kaggle.com/code/robikscube/sentiment-analysis-python-youtube-tutorial/notebook


In [2]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 6.6 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 37.7 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 48.2 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 9.9 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstall

In [3]:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax

### Method 1
- Tokenizer
- Tensor

In [4]:
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment" ## Add the model to use
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

Downloading:   0%|          | 0.00/747 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/476M [00:00<?, ?B/s]

In [5]:
# Roberta
text = "I bought this for my husband who plays the piano. He is having a wonderful time playing these old hymns. The music is at times hard to read because we think the book was published for singing from more than playing from. Great purchase though!"
encoded_text = tokenizer(text, return_tensors='pt')
#print(encoded_text)
output = model(**encoded_text)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
scores_dict = {
    'negative': scores[0],
    'neutral': scores[1],
    'positive': scores[2]
}
print(scores_dict)

{'negative': 0.0052546244, 'neutral': 0.045305263, 'positive': 0.9494401}


In [6]:
def polarity_scores_roberta(example):
  # Roberta
  encoded_text = tokenizer(example, return_tensors='pt')
  #print(encoded_text)
  output = model(**encoded_text)
  scores = output[0][0].detach().numpy()
  scores = softmax(scores)
  scores_dict = {
      'negative': scores[0],
      'neutral': scores[1],
      'positive': scores[2]
  }
  return scores_dict

### Main Code
Run this part of code to get the results of the sentiment analysis model.

In [7]:
text = "I am happy"
result = polarity_scores_roberta(text)
print("Negative: ", result['negative'])
print("Neutral: ", result['neutral'])
print("Positive: ", result['positive'])

Negative:  0.0018413567
Neutral:  0.018975863
Positive:  0.9791829


### Method 2
- The Transformers Pipeline
- Model: https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english

In [8]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis") # Use distilbert

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [9]:
result = classifier('I love sentiment analysis')
print(result)

[{'label': 'POSITIVE', 'score': 0.999736487865448}]


In [10]:
classifier('I dont like this song')

[{'label': 'NEGATIVE', 'score': 0.812928318977356}]

## Change the model to use
- https://huggingface.co/models?sort=downloads&search=roberta

In [11]:
# You can change the model that you will use
model_name = "distilbert-base-uncased-finetuned-sst-2-english" #HuggingFace
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

#Main
classifier = pipeline("sentiment-analysis", model=model_name)

## Other method using other model
- Tokenizer
- Model

In [12]:
# You can change the model that you will use
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

In [13]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

In [14]:
tokens = tokenizer.tokenize("We are happy yo show you the Transformers library.")
token_ids = tokenizer.convert_tokens_to_ids(tokens)
inputs_id = tokenizer("We are happy yo show you the Transformers library.")

In [15]:
print(f' Tokens: {tokens}')
print(f' Token IDs: {token_ids}') #Numerical representation that our model understand
print(f' Input IDs: {inputs_id}')

 Tokens: ['we', 'are', 'happy', 'yo', 'show', 'you', 'the', 'transformers', 'library', '.']
 Token IDs: [2057, 2024, 3407, 10930, 2265, 2017, 1996, 19081, 3075, 1012]
 Input IDs: {'input_ids': [101, 2057, 2024, 3407, 10930, 2265, 2017, 1996, 19081, 3075, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [16]:
x_train = ["We are happy to show you Transformers library",
           "We hope you dont hate it","Hello, good morning"]
batch = tokenizer(x_train, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad():
  outputs = model(**batch)
  #predictions = F.softmax(outputs.logits, dim=1)
  #print(predictions)
  #labels = torch.argmax(predictions, dim=1)
  label_ids = torch.argmax(outputs.logits, dim=1)
  print(label_ids)
  labels = [model.config.id2label[label_id] for label_id in label_ids.tolist()]
  print(labels)

tensor([1, 0, 1])
['POSITIVE', 'NEGATIVE', 'POSITIVE']


# Load Main Data