# Behind the pipeline (TensorFlow)

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [1]:
!pip install datasets evaluate transformers[sentencepiece]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.5.1-py3-none-any.whl (431 kB)
[K     |████████████████████████████████| 431 kB 5.0 MB/s 
[?25hCollecting evaluate
  Downloading evaluate-0.2.2-py3-none-any.whl (69 kB)
[K     |████████████████████████████████| 69 kB 7.7 MB/s 
[?25hCollecting transformers[sentencepiece]
  Downloading transformers-4.22.2-py3-none-any.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 46.7 MB/s 
[?25hCollecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0.10.0-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 56.8 MB/s 
Collecting xxhash
  Downloading xxhash-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 53.6 MB/s 
Collecting multiprocess
  Downloading multiprocess-0.70.13-py37-none-any.whl (115 kB)
[K     |██████████████████████

In [14]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(
    ["The Bose QuietComfort 35 II has some of the best noise cancelling in the business, and it's regarded as one of the most comfortable headsets.",
     "With a great default frequency response, travel-friendly design, and solid mic system, there's plenty to love about this headset.",
     "Not only do Bose have a quality issue but they have poor customer service.",
     "Do NOT buy their products, clearly you pay a high price for a low quality product."
    ]
)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9995784163475037},
 {'label': 'POSITIVE', 'score': 0.9996507167816162},
 {'label': 'NEGATIVE', 'score': 0.999603807926178},
 {'label': 'NEGATIVE', 'score': 0.9986317753791809}]

In [15]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [17]:
raw_inputs =     ["The Bose QuietComfort 35 II has some of the best noise cancelling in the business, and it's regarded as one of the most comfortable headsets.",
     "With a great default frequency response, travel-friendly design, and solid mic system, there's plenty to love about this headset.",
     "Not only do Bose have a quality issue but they have poor customer service.",
     "Do NOT buy their products, clearly you pay a high price for a low quality product."
    ]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="tf")
print(inputs)

{'input_ids': <tf.Tensor: shape=(4, 35), dtype=int32, numpy=
array([[  101,  1996, 21299,  4251,  9006, 13028,  3486,  2462,  2038,
         2070,  1997,  1996,  2190,  5005, 17542,  2989,  1999,  1996,
         2449,  1010,  1998,  2009,  1005,  1055,  5240,  2004,  2028,
         1997,  1996,  2087,  6625,  4641,  8454,  1012,   102],
       [  101,  2007,  1037,  2307, 12398,  6075,  3433,  1010,  3604,
         1011,  5379,  2640,  1010,  1998,  5024, 23025,  2291,  1010,
         2045,  1005,  1055,  7564,  2000,  2293,  2055,  2023,  4641,
         3388,  1012,   102,     0,     0,     0,     0,     0],
       [  101,  2025,  2069,  2079, 21299,  2031,  1037,  3737,  3277,
         2021,  2027,  2031,  3532,  8013,  2326,  1012,   102,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0],
       [  101,  2079,  2025,  4965,  2037,  3688,  1010,  4415,  2017,
         3477,  1037,  2152,  3976

In [18]:
from transformers import TFAutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModel.from_pretrained(checkpoint)

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertModel: ['dropout_19', 'classifier', 'pre_classifier']
- This IS expected if you are initializing TFDistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertModel were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.


In [19]:
outputs = model(inputs)
print(outputs.last_hidden_state.shape)

(4, 35, 768)


In [20]:
from transformers import TFAutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(inputs)

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_77']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [21]:
print(outputs.logits.shape)

(4, 2)


In [22]:
print(outputs.logits)

tf.Tensor(
[[-3.7867115  3.9842193]
 [-3.8609347  4.0983863]
 [ 4.320976  -3.512229 ]
 [ 3.6102726 -2.982627 ]], shape=(4, 2), dtype=float32)


In [23]:
import tensorflow as tf

predictions = tf.math.softmax(outputs.logits, axis=-1)
print(predictions)

tf.Tensor(
[[4.2164265e-04 9.9957842e-01]
 [3.4926823e-04 9.9965072e-01]
 [9.9960381e-01 3.9619621e-04]
 [9.9863178e-01 1.3681875e-03]], shape=(4, 2), dtype=float32)


In [25]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}