# Sentiment Analysis with Hugging Face

Hugging Face is an open-source and platform provider of machine learning technologies. You can use install their package to access some interesting pre-built models to use them directly or to fine-tune (retrain it on your dataset leveraging the prior knowledge coming with the first training), then host your trained models on the platform, so that you may use them later on other devices and apps.

Please, [go to the website and sign-in](https://huggingface.co/) to access all the features of the platform.

[Read more about Text classification with Hugging Face](https://huggingface.co/tasks/text-classification)

The Hugging face models are Deep Learning based, so will need a lot of computational GPU power to train them. Please use [Colab](https://colab.research.google.com/) to do it, or your other GPU cloud provider, or a local machine having NVIDIA GPU.

In [None]:
!pip install huggingface_hub transformers datasets 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting huggingface_hub
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m65.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m84.4 MB/s[0m eta [36m0:00:00[0m
Collecting xxh

The code above installs several python packages necessary in this project.
The huggingface_hub: a library that provides a way to store, version, and share trained models and other assets from Hugging Face's transformers library.
The transformers: a popular Python library for natural language processing (NLP) tasks, built on top of PyTorch and TensorFlow.


In [None]:
!huggingface-cli login   #this code is to authenticate huggingface using CLI


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid.
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
import os
import uuid
from scipy.special import softmax   
import pandas as pd
import numpy as np
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from transformers import (
    AutoTokenizer,
    AutoConfig, 
    AutoModelForSequenceClassification,
    IntervalStrategy,
    TrainingArguments,
    EarlyStoppingCallback,
    pipeline,
    Trainer
) 



AutoTokenizer: A class for tokenizing input data into sequences that can be fed into a Transformer model. It automatically selects the appropriate tokenizer based on the name of the pre-trained model.

AutoConfig: A class for loading the configuration of a pre-trained model, including its architecture and hyperparameters.

AutoModelForSequenceClassification: A class for loading a pre-trained model for sequence classification. It automatically selects the appropriate pre-trained model based on the name of the model.

IntervalStrategy: An enum class that defines the interval for printing progress updates during training.

TrainingArguments: A class that contains various hyperparameters and settings for training a model, including the number of epochs, the learning rate, and the batch size.

EarlyStoppingCallback: A callback function that stops training if a certain criterion is met, such as the validation loss not improving for a certain number of epochs.

pipeline: A function that creates a simple pipeline for performing inference with a pre-trained model. The pipeline can be used to perform tasks such as text classification or named entity recognition.

Trainer: A high-level API for training a Transformer model, which includes features such as gradient accumulation, learning rate scheduling, and mixed-precision training.

## Application of Hugging Face Text classification model Fune-tuning

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Find below a simple example, with just `3 epochs of fine-tuning`. 

Read more about the fine-tuning concept : [here](https://deeplizard.com/learn/video/5T-iXNNiwIs#:~:text=Fine%2Dtuning%20is%20a%20way,perform%20a%20second%20similar%20task.)

This code sets the environment variable "WANDB_DISABLED" to "true", which disables the use of the Weights and Biases (W&B) tool. W&B is a third-party tool that can be used to track and visualize the training progress of machine learning models. By setting this environment variable, you are telling your code to not use this tool.

In [None]:
# Disabe W&B
os.environ["WANDB_DISABLED"] = "true"

In [None]:
# Load the dataset and display some values
# create a file path 
file_path = "/content/sample_data/"
# Load the CSV file into a DataFrame

url = "https://github.com/Azubi-Africa/Career_Accelerator_P5-NLP/raw/master/zindi_challenge/data/Train.csv"

df = pd.read_csv(url)


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10001 entries, 0 to 10000
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   tweet_id   10001 non-null  object 
 1   safe_text  10001 non-null  object 
 2   label      10000 non-null  float64
 3   agreement  9999 non-null   float64
dtypes: float64(2), object(2)
memory usage: 312.7+ KB


In [None]:
# Select rows with missing values
df.isnull().sum()

tweet_id     0
safe_text    0
label        1
agreement    2
dtype: int64

In [None]:
# Select rows with missing values
df[df.isnull().any(axis=1)]

Unnamed: 0,tweet_id,safe_text,label,agreement
4798,RQMQ0L2A,#lawandorderSVU,,
4799,I cannot believe in this day and age some pare...,1,0.666667,


In [None]:
# Extract complete text from 'safe_text' column
complete_text = df.iloc[4798]['safe_text']
complete_text

'#lawandorderSVU '

In [None]:
# Select row by index and assign values to columns
df.loc[4798, 'label'] = 0
df.loc[4798, 'agreement'] = 0.666667

# Use .iloc[] and .iat[] to select and update safe_text column
df.iloc[4798, df.columns.get_loc('safe_text')] = complete_text


In [None]:
df.iloc[4798]

tweet_id             RQMQ0L2A
safe_text    #lawandorderSVU 
label                     0.0
agreement            0.666667
Name: 4798, dtype: object

In [None]:
# Generate random UUID string for tweet_id
'''UUIDs are often used in software applications for various purposes such as generating unique IDs for entities, 
tracking unique user sessions, or creating unique file names'''
rand_tweet_id = str(uuid.uuid4())

# Select row by index and assign values to columns
row_index = 4799
df.loc[row_index, 'tweet_id'] = rand_tweet_id
df.loc[row_index, 'label'] = 1
df.loc[row_index, 'agreement'] = 0.666667

# Use .iloc[] and .iat[] to select and update safe_text column
df.iloc[row_index, df.columns.get_loc('safe_text')] = df.iloc[row_index, 1]


In [None]:
df.iloc[4799]

tweet_id     3f03028b-732f-487a-b512-f834a5d6a108
safe_text                                       1
label                                         1.0
agreement                                0.666667
Name: 4799, dtype: object

In [None]:
df[df.duplicated()].sum()

tweet_id     0.0
safe_text    0.0
label        0.0
agreement    0.0
dtype: float64

I manually split the training set to have a training subset ( a dataset the model will learn on), and an evaluation subset ( a dataset the model with use to compute metric scores to help use to avoid some training problems like [the overfitting](https://www.ibm.com/cloud/learn/overfitting) one ). 

There are multiple ways to do split the dataset. You'll see two commented line showing you another one.

In [None]:
# Split the train data => {train, eval}
train, eval = train_test_split(df, test_size=0.2, random_state=42, stratify=df['label'])

In [None]:
train.head()

Unnamed: 0,tweet_id,safe_text,label,agreement
1641,CQDD6QLM,"New <user> ""Hey Love"" #MMR #ManyMenRecords #Yo...",0.0,1.0
3907,5GV8NEZS,S1256 [NEW] Extends exemption from charitable ...,0.0,1.0
336,I4D043ST,<user> esp when mercury free vaccines are avai...,1.0,0.666667
6861,CKX52Y8G,"My Life, Your Entertainment #YOTC #MMR @ Exoti...",0.0,1.0
720,07S3NL2T,Baby Luna is sore from her vaccines :( #poorpuppy,0.0,0.666667


In [None]:
eval.head()

Unnamed: 0,tweet_id,safe_text,label,agreement
5818,Y8PQ0BT7,So nervous... The baby's getting vaccines... (...,1.0,0.666667
7842,C9Z6JBSS,AIDS N : A malaria vaccine in children with HI...,0.0,0.666667
880,0VE4NWWQ,Measles Outbreak Hits Texas Church That Preach...,1.0,0.666667
9072,RHQRUF14,Thank you <user> for mtg with your staff. We l...,1.0,1.0
288,ZWEP2IL4,Health district offers no-cost immunizations f...,1.0,0.666667


In [None]:
print(f"new dataframe shapes: train is {train.shape}, eval is {eval.shape}")

new dataframe shapes: train is (8000, 4), eval is (2001, 4)


In [None]:
# Save splitted subsets
train.to_csv(os.path.join(file_path, "train_subset.csv"), index=False)
eval.to_csv(os.path.join(file_path, "eval_subset.csv"), index=False)

In [None]:
# Load the CSV files into a dataset

from datasets import load_dataset

dataset = load_dataset('csv', data_files={
    'train': file_path + 'train_subset.csv',
    'eval': file_path + 'eval_subset.csv'
}, encoding='ISO-8859-1')

Downloading and preparing dataset csv/default to /root/.cache/huggingface/datasets/csv/default-be19376cc48e0190/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating eval split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-be19376cc48e0190/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
tokenizer_roberta = AutoTokenizer.from_pretrained('roberta-base')

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
def transform_labels(label):

    label = label['label']
    num = 0
    if label == -1: #'Negative'
        num = 0
    elif label == 0: #'Neutral'
        num = 1
    elif label == 1: #'Positive'
        num = 2

    return {'labels': num}

#define a fuction tokenizer for the text data
def tokenize_data(example):
     #Extract the data we want to tokenize
    return tokenizer_roberta(example['safe_text'], padding='max_length')


#transforming the tokenized data and the label using the map method
# Change the tweets to tokens that the models can exploit
dataset_out = dataset.map(transform_labels)
dataset_roberta = dataset_out.map(tokenize_data, batched=True)

# Transform labels and remove the useless columns
remove_columns = ['tweet_id', 'label', 'safe_text', 'agreement']

dataset_roberta = dataset_roberta.map(transform_labels, remove_columns=remove_columns)

Map:   0%|          | 0/8000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2001 [00:00<?, ? examples/s]

Map:   0%|          | 0/8000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2001 [00:00<?, ? examples/s]

Map:   0%|          | 0/8000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2001 [00:00<?, ? examples/s]

The columns specified in remove_columns are removed from the dataset because they are not needed for the subsequent analysis or model training.

tweet_id: This column contains unique identifiers for each tweet, which are not relevant for the analysis or modeling.

label: This column contains the original label values, which have already been transformed into numerical values using the transform_labels function.

safe_text: This column contains the preprocessed text data that has already been tokenized and encoded, so it is not needed for subsequent analysis or modeling.

agreement: This column indicates the level of agreement among the annotators for each tweet. While this information might be useful for some analyses, it is not necessary for the sentiment analysis task at hand.

By removing these columns, the resulting dataset is more compact and easier to work with, while retaining all the relevant information for the sentiment analysis task.

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['tweet_id', 'safe_text', 'label', 'agreement'],
        num_rows: 8000
    })
    eval: Dataset({
        features: ['tweet_id', 'safe_text', 'label', 'agreement'],
        num_rows: 2001
    })
})

In [None]:
# Define the training arguments
training_args = TrainingArguments(
    output_dir='./results',                          # Directory where the model checkpoints and evaluation results will be stored
    evaluation_strategy=IntervalStrategy.STEPS,      # Interval for evaluating the model during training (every specified number of steps)
    save_strategy=IntervalStrategy.STEPS,            # Interval for saving the model during training (every specified number of steps)
    save_steps=500,                                  # Number of steps between two saves
    load_best_model_at_end=True,                     # Whether to load the best model at the end of training
    num_train_epochs=10,                              # Number of training epochs
    per_device_train_batch_size=2,                   # Batch size per GPU for training
    per_device_eval_batch_size=2,                    # Batch size per GPU for evaluation
    learning_rate=3e-5,                              # Learning rate
    weight_decay=0.01,                               # Weight decay
    warmup_steps=500,                                # Number of warmup steps
    logging_steps=500,                               # Number of steps between two logs
    fp16=True,                                       # Whether to use 16-bit precision
    gradient_accumulation_steps=16,                  # Number of steps to accumulate gradients before performing an optimizer step
    dataloader_num_workers=2,                        # Number of workers to use for loading data
    push_to_hub=True,                                # Whether to push the model checkpoints to the Hugging Face hub
    hub_model_id="Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model",  # Model ID to use when pushing the model to the Hugging Face hub 
)

# Define the early stopping callback
early_stopping = EarlyStoppingCallback(
    early_stopping_patience=3,                       # Number of epochs with no improvement before stopping training
    early_stopping_threshold=0.01,                   # Minimum improvement in the metric for considering an improvement
)

# Combine the training arguments and the early stopping callback
training_args.callbacks = [early_stopping]


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Explanation:

from transformers import IntervalStrategy, TrainingArguments: Importing the IntervalStrategy and TrainingArguments classes from the transformers library.

training_args = TrainingArguments(: Creating a TrainingArguments object and assigning it to the variable training_args.

output_dir='./results': Specifies the directory where the training results will be saved.

evaluation_strategy=IntervalStrategy.STEPS: Specifies how often the model will be evaluated during training. In this case, the model will be evaluated at specific intervals.

save_strategy=IntervalStrategy.STEPS: Specifies how often the model will be saved during training. In this case, the model will be saved at specific intervals.

save_steps=500: Specifies how often the model will be saved during training, in terms of the number of steps taken. In this case, the model will be saved every 500 steps.

load_best_model_at_end=True: Specifies whether to load the best model at the end of training. If set to True, the best model will be loaded; if set to False, the last model will be loaded.

num_train_epochs=3: Specifies the number of epochs for training the model. In this case, the model will be trained for 3 epochs.

per_device_train_batch_size=2: Specifies the batch size for training. In this case, each training batch will contain 2 examples.

per_device_eval_batch_size=2: Specifies the batch size for evaluation. In this case, each evaluation batch will contain 2 examples.

In [None]:
# Loading a pretrain model while specifying the number of labels in our dataset for fine-tuning
model_roberta = AutoModelForSequenceClassification.from_pretrained('roberta-base', num_labels=3)

Downloading pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifi

In [None]:
train_dataset_roberta = dataset_roberta['train'].shuffle(seed=10) #.select(range(40000)) # to select a part
eval_dataset_roberta = dataset_roberta['eval'].shuffle(seed=10)


In [None]:
#creating a fuction of the evaluation metrics
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    rmse = np.sqrt(np.mean((predictions - labels)**2))
    return {"rmse": rmse}

In [None]:
from transformers import Trainer

trainer_roberta = Trainer(
     model=model_roberta,
     args=training_args,
     train_dataset=train_dataset_roberta, 
     eval_dataset=eval_dataset_roberta,
     compute_metrics=compute_metrics
)
 


/content/./results is already a clone of https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model. Make sure you pull the latest changes with `repo.git_pull()`.


In [None]:
# Launch the learning process: training 
trainer_roberta.train()



Step,Training Loss,Validation Loss,Rmse
500,0.3238,0.67579,0.598181
1000,0.2412,0.769849,0.59483
1500,0.1352,1.156966,0.673627
2000,0.0711,1.363311,0.601929
2500,0.0365,1.453615,0.598599


TrainOutput(global_step=2500, training_loss=0.16155206985473633, metrics={'train_runtime': 3910.8349, 'train_samples_per_second': 20.456, 'train_steps_per_second': 0.639, 'total_flos': 2.104907341824e+16, 'train_loss': 0.16155206985473633, 'epoch': 10.0})

In [None]:
# Evaluate the model
eval_results = trainer_roberta.evaluate()

# Create a dictionary of the evaluation results
results_dict = {
    "Model": "roberta_base",
    "Loss": eval_results["eval_loss"],
    "RMSE": eval_results["eval_rmse"],
    "Runtime": eval_results["eval_runtime"],
    "Samples Per Second": eval_results["eval_samples_per_second"],
    "Steps Per Second": eval_results["eval_steps_per_second"],
    "Epoch": eval_results["epoch"]
}

# Create a pandas DataFrame from the dictionary
results_df = pd.DataFrame([results_dict])

# Print the results
print(results_df)


          Model     Loss      RMSE  Runtime  Samples Per Second  \
0  roberta_base  0.67579  0.598181  30.2707              66.103   

   Steps Per Second  Epoch  
0            33.068   10.0  


Some checkpoints of the model are automatically saved locally in `test_trainer/` during the training.

You may also upload the model on the Hugging Face Platform... [Read more](https://huggingface.co/docs/hub/models-uploading)

This notebook is inspired by an article: [Fine-Tuning Bert for Tweets Classification ft. Hugging Face](https://medium.com/mlearning-ai/fine-tuning-bert-for-tweets-classification-ft-hugging-face-8afebadd5dbf)

Do not hesitaite to read more and to ask questions, the Learning is a lifelong activity.

Pushing the model to hugging face. Note: the push should only be done when satisfied with the model performance

In [None]:

 # Push the final fine-tuned model to the Hugging Face model hub

trainer_roberta.push_to_hub("Pendo/finetuned-Sentiment-classfication-ROBERTA-Base--model")


Upload file runs/Apr28_20-17-59_4017ae5aee85/events.out.tfevents.1682713118.4017ae5aee85.642.3:   0%|         …

Upload file runs/Apr28_20-17-59_4017ae5aee85/events.out.tfevents.1682717068.4017ae5aee85.642.5:   0%|         …

To https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model
   a9b7513..a9bce24  main -> main

   a9b7513..a9bce24  main -> main

To https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model
   a9bce24..0f10c05  main -> main

   a9bce24..0f10c05  main -> main



'https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model/commit/a9bce2446035bd95a80a8d530853757da002c326'

In [None]:
tokenizer_roberta.push_to_hub("Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model")

CommitInfo(commit_url='https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model/commit/4ecc06279915a7761fe86d296cad56721b36a479', commit_message='Upload tokenizer', commit_description='', oid='4ecc06279915a7761fe86d296cad56721b36a479', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
model_roberta.push_to_hub("Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model")

CommitInfo(commit_url='https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model/commit/07ce150de3905fd114a7a7e84ed5f35f2c1e3d99', commit_message='Upload RobertaForSequenceClassification', commit_description='', oid='07ce150de3905fd114a7a7e84ed5f35f2c1e3d99', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
# Load the tokenizer
tokenizer = tokenizer_roberta.from_pretrained("Pendo/finetuned-Sentiment-classfication-ROBERTA-base-model")

# Load the fine-tuned model
model = pipeline("text-classification", model="Pendo/finetuned-Sentiment-classfication-ROBERTA-base-model", tokenizer=tokenizer)



Downloading (…)okenizer_config.json:   0%|          | 0.00/351 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/889 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

In [None]:
import torch

In [None]:
label_map = {0: "negative", 1: "neutral", 2: "positive"}

# Make predictions on some example text
result = model("Of course i'll go.")

# Map the numerical label to the corresponding class name
#result[0]["label"] = label_map[int(result[0]["label"].split("_")[1])]

# Map the numerical label to the corresponding class name
predicted_label = label_map[int(result[0]["label"].split("_")[1])]

# Print the predicted label and score
#print(result)
# Convert the result to a PyTorch tensor
scores_tensor = torch.tensor(result[0]["score"])

# Calculate the probabilities of all labels using the softmax function
probabilities = torch.softmax(scores_tensor, dim=0).tolist()
    
# Print the predicted label and score
#probabilities = result[0]["score"].softmax(dim=0).tolist()
print(f"Predicted label: {predicted_label}")
print(f"Probabilities: {probabilities}")

Predicted label: neutral
Probabilities: 1.0
