# Import Required Libraries

In [2]:
# Import Required Libraries
# Importing the AutoTokenizer and AutoModelForSequenceClassification from the transformers library
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
# Importing the softmax function from scipy to calculate the probabilities
from scipy.special import softmax

# Load Pre-trained Model and Tokenizer
###  Loads the pre-trained model and tokenizer. The specified model is designed for sentiment analysis on tweets.

In [3]:
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

In [5]:
Test = 'A wonderful little production. The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. The actors are extremely well chosen- Michael Sheen not only "has got all the polari" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams\' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master\'s of comedy and his life. The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional \'dream\' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell\'s murals decorating every surface) are terribly well done.'
print(Test)

A wonderful little production. The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. The actors are extremely well chosen- Michael Sheen not only "has got all the polari" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life. The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.


## Tokenization
### The input text is converted into a format that the model can understand

In [16]:
encoded_text = tokenizer(Test, return_tensors='pt')
encoded_text

{'input_ids': tensor([[    0,   250,  4613,   410,   931,     4,    20,  9293,  9205,    16,
           182,   542, 34730,    12,   182,   793,    12,   958,    12, 28713,
          2734,     8,  2029,    10, 29090,     6,     8,  2128, 19535,   154,
             6,  1472,     9, 35402,     7,     5,  1445,  2125,     4,    20,
          5552,    32,  2778,   157,  4986,    12,   988, 38172,    45,   129,
            22,  7333,   300,    70,     5,  8385,  1512,   113,    53,    37,
            34,    70,     5,  6820,   159, 10512,   350,   328,   370,    64,
          3127,   192,     5, 19745,  5390, 10346,    30,     5, 13115,     7,
          1604,   108, 25694, 11410,     6,    45,   129,    16,    24,   157,
           966,     5,  2494,    53,    24,    16,    10, 14353,   352,  1982,
             8,  3744,  2125,     4,    83,  4710,  2650,   931,    59,    65,
             9,     5,   372,  4710,    18,     9,  5313,     8,    39,   301,
             4,    20, 35402,   269,  

## Model Prediction step
### The Model Prediction step involves the pre-trained model generating raw sentiment scores from the tokenized input. These scores are then normalized using the softmax function in the Softmax Transformation phase. Finally, the Results Display stage presents the sentiment scores in a user-friendly dictionary format, indicating the percentage likelihood for negative, neutral, and positive sentiments.

In [14]:
output = model(**encoded_text) # Get the model's predictions
scores = output[0][0].detach().numpy() # Detach the scores and convert them to a NumPy array
scores = softmax(scores) # Apply softmax to the scores to convert them to probabilities
scores_dict = {
    'negative' : scores[0]*100,
    'neutral' : scores[1]*100,
    'positive ' : scores[2]*100,
}
print(scores_dict)

{'negative': 0.38185848388820887, 'neutral': 3.4171227365732193, 'positive ': 96.20102047920227}


### In this project, we demonstrated how to use a pre-trained Transformer model for sentiment analysis. We tokenized input text, processed it through the model to obtain sentiment scores, and normalized these scores using the softmax function. This approach allows for versatile sentiment evaluation across various text types, such as reviews and social media comments.