**INITIALIZATION:**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**LIBRARIES AND DEPENDENCIES:**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ INSTALLING DEPENDENCIES: UNCOMMENT BELOW: 
# !pip install transformers[sentencepiece]

In [18]:
#@ DOWNLOADING LIBRARIES AND DEPENDENCIES:
import torch
import transformers
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModel
from transformers import AutoModelForSequenceClassification

**SENTIMENT ANALYSIS:**

In [6]:
#@ IMPLEMENTATION OF SENTIMENT ANALYSIS PIPELINE:
classifier = pipeline("sentiment-analysis")                                 # Initializing Classifier Object. 
classifier("I've started the HuggingFace course which fascinates me.")      # Inspecting Sentiment.

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9997233748435974}]

**PREPROCESSING WITH TOKENIZER:**
- The **tokenizer** will be responsible for: 
    - Splitting the input into words, subwords, or symbols like puncutation that are also called tokens. 
    - Mapping each token to an integer. 
    - Adding additional inputs that may be useful to the model. 

In [8]:
#@ INITIALIZATION OF AUTOTOKENIZER: 
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"                  # Initialization. 
tokenizer = AutoTokenizer.from_pretrained(checkpoint)                           # Initializing Tokenizer. 

In [9]:
#@ CONVERTING INTO TENSORS:
raw_inputs = ["I've started the HuggingFace course which fascinates me.",
              "I will no longer read it's documentation.",
              "I think the course is awesome!"]                                 # Initializing Input Text. 
inputs = tokenizer(raw_inputs, padding=True, truncation=True,                   # Converting into Equal Size. 
                   return_tensors="pt")                                         # Getting PyTorch Tensors. 
print(inputs)                                                                   # Inspecting Tensors with Attention Mask. 

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2318,  1996, 17662, 12172,  2607,  2029,
          6904, 11020, 28184,  2033,  1012,   102],
        [  101,  1045,  2097,  2053,  2936,  3191,  2009,  1005,  1055, 12653,
          1012,   102,     0,     0,     0,     0],
        [  101,  1045,  2228,  1996,  2607,  2003, 12476,   999,   102,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])}


**GOING THROUGH MODEL:**
- The vector output by the Transformer module is usually large. It generally has 3 dimensions:
    - **Batch size:** The number of sequences processed at a time. 
    - **Sequence length:** The length of the numerical representation of the sequence. 
    - **Hidden size:** The vector dimension of each model input. 

In [12]:
#@ INITIALIZATION OF TRANSFORMER MODEL:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"                  # Initialization. 
model = AutoModel.from_pretrained(checkpoint)                                   # Initializing Model. 

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertModel: ['classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'pre_classifier.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [13]:
#@ INSPECTING VECTOR DIMENSIONS:
output = model(**inputs)                                                        # Initializing Output. 
print(output.last_hidden_state.shape)                                           # Inspecting Shape. 

torch.Size([3, 16, 768])


In [16]:
#@ INITIALIZATION OF SEQUENCE CLASSIFICATION TRANSFORMER MODEL:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"                  # Initialization. 
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)          # Initializing Model.
output = model(**inputs)                                                        # Initializing Output. 
print(output.logits.shape)                                                      # Inspecting Shape.  

torch.Size([3, 2])


**POSTPROCESSING THE OUTPUT:**

In [17]:
#@ INSPECTING THE OUTPUT:
print(output.logits)

tensor([[-3.9449,  4.2477],
        [ 4.1948, -3.3446],
        [-4.2149,  4.5673]], grad_fn=<AddmmBackward0>)


In [20]:
#@ CONVERTING LOGITS INTO PROBABILITIES:
predictions = torch.nn.functional.softmax(output.logits, dim=-1)                # Applying Softmax Function. 
print(predictions)                                                              # Inspecting Prediction Probabilities. 

tensor([[2.7661e-04, 9.9972e-01],
        [9.9947e-01, 5.3144e-04],
        [1.5341e-04, 9.9985e-01]], grad_fn=<SoftmaxBackward0>)


In [22]:
#@ INSPECTING MODEL ATTRIBUTE:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}