<a href="https://colab.research.google.com/github/DJCordhose/ml-resources/blob/main/notebooks/foundation/transformers-sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers: sentiment analysis using pretrained models

* https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment
* https://huggingface.co/facebook/bart-large-mnli

In [1]:
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
tf.__version__

'2.8.0'

In [2]:
# when we are not training, we do not need a GPU
!nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



In [3]:
# https://huggingface.co/transformers/installation.html
!pip install -q transformers

In [4]:
import transformers
transformers.__version__

'4.18.0'

In [5]:
sequence_0 = "I don't think its a good idea to have people driving 40 miles an hour through a light that *just* turned green, especially with the number of people running red lights, or the number of pedestrians running across at the last minute being obscured by large cars in the lanes next to you."
sequence_1 = 'MANY YEARS ago, When I was a teenager, I delivered pizza. I had a friend who, just for the fun of it, had a CB. While on a particular channel, he could key the mike with quick taps and make the light right out in front of the pizza place turn green. It was the only light that it worked on, and I was in the car with him numerous times to confirm that it worked. It was sweet.'
sequence_2 = 'The "green" thing to do is not to do anything ever, don\'t even breath!  Oh, and if you are not going to take that ridiculous standpoint then I guess this is relevant to Green because it uses Bio-fuels in one of the most harsh environments in the world, showing that dependence on tradition fuels is a choice not a necessity.'

## bert-base-multilingual-uncased-sentiment

Version for TensorFlow

https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment

In [6]:
%%time 

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

model = TFAutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model.name_or_path

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


CPU times: user 8.82 s, sys: 1.27 s, total: 10.1 s
Wall time: 13.5 s


In [7]:
# paraphrase = tokenizer(sequence_0, return_tensors="tf")
# paraphrase = tokenizer(sequence_1, return_tensors="tf")
paraphrase = tokenizer(sequence_2, return_tensors="tf")

paraphrase_classification_logits = model(paraphrase)[0]
paraphrase_results = tf.nn.softmax(paraphrase_classification_logits, axis=1).numpy()[0]
stars = paraphrase_results.argmax() + 1
paraphrase_classification_logits, paraphrase_results, stars

(<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
 array([[ 1.8335618 ,  0.9494924 , -0.21574138, -1.1014451 , -1.1619096 ]],
       dtype=float32)>,
 array([0.60787815, 0.2511135 , 0.07830969, 0.03229678, 0.03040184],
       dtype=float32),
 1)

## bart-large-mnli

Version for Pytorch (TensorFlow is not available)

https://huggingface.co/facebook/bart-large-mnli

In [8]:
%%time

from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")
classifier.model.name_or_path

CPU times: user 5.88 s, sys: 1.99 s, total: 7.87 s
Wall time: 13.7 s


In [9]:
# sequence_to_classify = sequence_0
# sequence_to_classify = sequence_1
sequence_to_classify = sequence_2

candidate_labels = ['positive', 'negative', 'ironic']
classifier(sequence_to_classify, candidate_labels, multi_label=True)

{'labels': ['ironic', 'negative', 'positive'],
 'scores': [0.9157786965370178, 0.5182933807373047, 0.1775151491165161],
 'sequence': 'The "green" thing to do is not to do anything ever, don\'t even breath!  Oh, and if you are not going to take that ridiculous standpoint then I guess this is relevant to Green because it uses Bio-fuels in one of the most harsh environments in the world, showing that dependence on tradition fuels is a choice not a necessity.'}