In [None]:
!pip install datasets transformers[sentencepiece]


In [2]:
from transformers import pipeline

sent_class = pipeline("sentiment-analysis")
sent_class(
    [
     " i am excited about this course",
     " i do not think this path is the best for me"
    ]
)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.999842643737793},
 {'label': 'NEGATIVE', 'score': 0.9995841383934021}]

[{'label': 'POSITIVE', 'score': 0.999842643737793},
 {'label': 'NEGATIVE', 'score': 0.9995841383934021}]
 

The pipeline groups together three steps: 
1. preprocessing, 
2. passing the inputs through the model, and 
3. postprocessing:

1. preprocessing the tokenizer
To do this we use a tokenizer, which will be responsible for:

a. Splitting the input into words, subwords, or symbols (like punctuation) that are called tokens
b. Mapping each token to an integer
c. Adding additional inputs that may be useful to the model

In [3]:
from transformers import AutoTokenizer

checkpoints = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoints)

In [4]:
example_inputs = [
                  " i am excited about this course" ,
                  " i do not think this path is the best for me"

]

In [6]:
input = tokenizer(example_inputs, padding = True, truncation = True , return_tensors = "pt" )
print(input)

{'input_ids': tensor([[ 101, 1045, 2572, 7568, 2055, 2023, 2607,  102,    0,    0,    0,    0,
            0],
        [ 101, 1045, 2079, 2025, 2228, 2023, 4130, 2003, 1996, 2190, 2005, 2033,
          102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


In [7]:
from transformers import AutoModel

model = AutoModel.from_pretrained(checkpoints)

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertModel: ['pre_classifier.bias', 'classifier.weight', 'classifier.bias', 'pre_classifier.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [9]:
outputs = model(**input)
print(outputs.last_hidden_state.shape)

torch.Size([2, 13, 768])


In [18]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoints)
outputs = model(**input)

In [19]:
print(outputs.logits.shape)


torch.Size([2, 2])


In [20]:
print(outputs.logits)

tensor([[-4.2239,  4.5328],
        [ 4.2997, -3.4851]], grad_fn=<AddmmBackward0>)


In [21]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

model.config.id2label

tensor([[1.5738e-04, 9.9984e-01],
        [9.9958e-01, 4.1586e-04]], grad_fn=<SoftmaxBackward0>)


{0: 'NEGATIVE', 1: 'POSITIVE'}

First sentence: NEGATIVE: 0.000157, POSITIVE: 0.9984
Second sentence: NEGATIVE: 0.9998, POSITIVE: 0.00041586

✏️ Try it out! Choose two (or more) texts of your own and run them through the sentiment-analysis pipeline. Then replicate the steps you saw here yourself and check that you obtain the same results!

In [22]:
text1 = [
         "i have so many experience with dating yet i keep falling for the wrong person",
         " i will eat healthy and protect my mental health"
]

text2 = [
         " the probability of my happiness is great",
         " i think i will just quit forever"
]

In [24]:
sent_class(text1)


[{'label': 'NEGATIVE', 'score': 0.9921945333480835},
 {'label': 'POSITIVE', 'score': 0.9995775818824768}]

In [25]:
sent_class(text2)

[{'label': 'POSITIVE', 'score': 0.9998756647109985},
 {'label': 'NEGATIVE', 'score': 0.999275267124176}]

In [29]:
input1 = tokenizer(text1,  padding = True, truncation = True , return_tensors = "pt")

output1 = model(**input1)

prediction1 = torch.nn.functional.softmax(output1.logits, dim=-1)
print(prediction1)

model.config.id2label

tensor([[9.9219e-01, 7.8055e-03],
        [4.2247e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward0>)


{0: 'NEGATIVE', 1: 'POSITIVE'}

In [30]:
input2 = tokenizer(text2,  padding = True, truncation = True , return_tensors = "pt")

output2 = model(**input2)

prediction2 = torch.nn.functional.softmax(output2.logits, dim=-1)
print(prediction2)

model.config.id2label

tensor([[1.2433e-04, 9.9988e-01],
        [9.9928e-01, 7.2478e-04]], grad_fn=<SoftmaxBackward0>)


{0: 'NEGATIVE', 1: 'POSITIVE'}