Yelp Polarity dataset is a dataset for the classification of text which has been derived from reviews written in the Yelp platform that is meant for analysis of sentiments. The dataset is commonly used to create and assess systems which automatically categorize texts based on their emotions especially in terms of finding out if it’s an approving or disapproving comment.
In below code we will start to install the transformers datasets.

In [None]:
!pip install datasets transformers

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.5.0,>=2023.1.0 (from fsspec[http]<=2024.5.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.5.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Here we import the libraries which will help us to work with different parameters and also plays a major role when we would like to see the real world example.

In [None]:
from datasets import load_dataset
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torch
import pandas as pd

In [None]:
#Load the Yelp polarity dataset
dataset = load_dataset("yelp_polarity")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/8.93k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/256M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/560000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/38000 [00:00<?, ? examples/s]

In [None]:
# Converting the dataset to pandas DataFrames
df_train = pd.DataFrame(dataset['train'])
df_test = pd.DataFrame(dataset['test'])

In [None]:
# to see the head of the train and test data
print("Training Data Head:")
print(df_train.head())
print("\nTesting Data Head:")
print(df_test.head())

Training Data Head:
                                                text  label
0  Unfortunately, the frustration of being Dr. Go...      0
1  Been going to Dr. Goldberg for over 10 years. ...      1
2  I don't know what Dr. Goldberg was like before...      0
3  I'm writing this review to give you a heads up...      0
4  All the food is great here. But the best thing...      1

Testing Data Head:
                                                text  label
0  Contrary to other reviews, I have zero complai...      1
1  Last summer I had an appointment to get new ti...      0
2  Friendly staff, same starbucks fair you get an...      1
3  The food is good. Unfortunately the service is...      0
4  Even when we didn't have a car Filene's Baseme...      1


In [None]:
# actual size of the number of rows
print(f"Training dataset size: {len(df_train)}")
print(f"Testing dataset size: {len(df_test)}")

Training dataset size: 560000
Testing dataset size: 38000


# **Methodology**
Below, we have employed DistilBERT model for the sequence classification task. First, pre-trained DistilBERT tokenizer (distilbert-base-uncased) was used to initialize the tokenizer. Then, a custom function was used to tokenize text data by applying padding and truncation to maintain equal input lengths. The dataset was tokenized in batches and then converted into PyTorch tensors with a split into training and testing subsets: 560000 samples for training and 38000 for testing purposes. Lastly, pre-trained DistilBERT model was loaded and fine-tuned for binary classification tasks.

In [None]:
#Initialize the tokenizer
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [None]:
# Tokenization function
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

# Tokenize the dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/560000 [00:00<?, ? examples/s]

Map:   0%|          | 0/38000 [00:00<?, ? examples/s]

In [None]:
# Convert to PyTorch tensors
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(20000))
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(5000))

In [None]:
#Load the pre-trained DistilBERT model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# T r a i n i n g  and F i n e-t u n i n g with  t h e  m o d e **l**
HERE, WE DICUSSED THE KEY POINTS OF TRAINING AND FINE TUNING.
Specifically, using the TrainingArguments class we established distinctive training recommendations to fine-tune the DistilBERT model. Major parameters for fine-tuning included setting the learning rate at 2e-5, either for batch sizes in training or evaluation as 16 per epoch in a cycle of three cycles. Zero point zero-one was used as a weight decay to avoid overfitting while logging was set to record metrics every 10 steps. At the end of every epoch, model checkpoints were stored.
For evaluation, we created a personalized compute_metrics function which aimed at identifying how well the model performed. The function was able to calculate accuracy, precision, recall and F1-score accordingly from predictions made thus making it an all-encompassing way of assessing the models performance especially for binary classification tasks.

In [None]:
# training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    save_strategy="epoch",
)



In [None]:
#Define metrics
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average="binary")
    acc = accuracy_score(labels, preds)
    return {
        "accuracy": acc,
        "f1": f1,
        "precision": precision,
        "recall": recall,
    }


In [None]:
#Initialize the Trainer for fine-tuning
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

In [None]:
#Fine-tune the model
trainer.train()


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.2786,0.141305,0.9522,0.951863,0.945957,0.957844
2,0.1423,0.169236,0.9536,0.953748,0.938407,0.969599
3,0.002,0.194722,0.959,0.958409,0.959383,0.957438


TrainOutput(global_step=3750, training_loss=0.11238063922723135, metrics={'train_runtime': 3076.6971, 'train_samples_per_second': 19.501, 'train_steps_per_second': 1.219, 'total_flos': 7948043919360000.0, 'train_loss': 0.11238063922723135, 'epoch': 3.0})

# ***R E S U L T S***
Above we can see that The DistilBERT model was fine-tuned over three epochs. The training process showed a consistent decrease in training loss across epochs, starting at 0.2786 in the first epoch and dropping to 0.0020 by the third epoch. Despite a slight increase in validation loss from 0.1413 in the first epoch to 0.1947 in the third, the model's performance metrics remained strong. The accuracy steadily improved, reaching 95.9% by the final epoch. F1-score, precision, and recall were also consistently high, with the model achieving a final F1-score of 0.9584, precision of 0.9594, and recall of 0.9574. These results indicate that the model maintained robust performance throughout the fine-tuning process, effectively balancing precision and recall.

In [None]:
#Evaluate the model on the test dataset
results = trainer.evaluate()
print(results)

{'eval_loss': 0.19472192227840424, 'eval_accuracy': 0.959, 'eval_f1': 0.9584094136741732, 'eval_precision': 0.9593826157595451, 'eval_recall': 0.9574381840291852, 'eval_runtime': 86.486, 'eval_samples_per_second': 57.813, 'eval_steps_per_second': 3.619, 'epoch': 3.0}


In [None]:
#Move the model to the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

# ***Sentiment Prediction:***

To predict the sentiment of new reviews, we defined a predict_sentiment function. This function first tokenizes the input review using the DistilBERT tokenizer, applying padding and truncation to ensure a consistent input format. The tokenized inputs are then moved to the appropriate device (CPU or GPU). The model is set to evaluation mode, and predictions are generated without updating the model's parameters using torch.no_grad(). The logits output from the model is used to determine the predicted class, which corresponds to either a "positive" or "negative" sentiment.

We tested this function on a few example reviews. The function correctly identified sentiments based on the content of each review, demonstrating its ability to distinguish between positive and negative sentiments in real-world scenarios.

In [None]:
#Function to predict the sentiment of new reviews
def predict_sentiment(review):
    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True, max_length=512)
    inputs = {key: value.to(device) for key, value in inputs.items()}
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=-1).item()
    sentiment = "positive" if predicted_class == 1 else "negative"
    return sentiment

In [None]:
# Example
reviews = [
    "This restaurant is amazing! The food was delicious, and the ambiance was perfect.",
    "The service was terrible, and the food was undercooked. I will never come back.",
    "The pasta was great, but the dessert was disappointing."
]

# Predict sentiment  each example
for i, review in enumerate(reviews, 1):
    sentiment = predict_sentiment(review)
    print(f"Review {i}: '{review}'")
    print(f"Predicted Sentiment: {sentiment}\n")

Review 1: 'This restaurant is amazing! The food was delicious, and the ambiance was perfect.'
Predicted Sentiment: positive

Review 2: 'The service was terrible, and the food was undercooked. I will never come back.'
Predicted Sentiment: negative

Review 3: 'The pasta was great, but the dessert was disappointing.'
Predicted Sentiment: negative



# ***Description upon review.***
The sentiment predictions for the sample reviews,

Review 1: "This restaurant is amazing! The food was delicious, and the ambiance was perfect."
Predicted Sentiment: Positive
The model correctly identifies the overwhelmingly positive tone of this review, highlighting satisfaction with the food and ambiance.

Review 2: "The service was terrible, and the food was undercooked. I will never come back."
Predicted Sentiment: Negative
The model accurately detects the negative sentiment, driven by the poor service and undercooked food, along with the reviewer's strong dissatisfaction.

Review 3: "The pasta was great, but the dessert was disappointing."
Predicted Sentiment: Negative
Despite the positive comment about the pasta, the model identifies the overall sentiment as negative, likely because of the strong negative reaction to the dessert.

***From the results, we can see that The model effectively distinguishes between positive and negative sentiments, even when the review contains mixed feedback, as seen in Review 3.***

In [None]:
import ipywidgets as widgets
from IPython.display import display
import torch

# Function to predict the sentiment of a review
def predict_sentiment(review):
    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True, max_length=512)
    inputs = {key: value.to(device) for key, value in inputs.items()}
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=-1).item()
    sentiment = "positive" if predicted_class == 1 else "negative"
    return sentiment

# Create a Textarea widget for user input
text_area = widgets.Textarea(
    value='',
    placeholder='Type your review here...',
    description='Review:',
    disabled=False,
    layout=widgets.Layout(width='50%')
)

# Create a Button widget for submission
submit_button = widgets.Button(
    description='Submit',
    button_style='success',
    tooltip='Click to submit the review',
    icon='check'
)

# Function to process the input and display sentiment when the button is clicked
def on_button_click(b):
    review = text_area.value
    sentiment = predict_sentiment(review)
    print(f"Review: '{review}'")
    print(f"Predicted Sentiment: {sentiment}\n")

# Attach the on_button_click function to the button's click event
submit_button.on_click(on_button_click)

# Display the Textarea and Button widgets
display(text_area, submit_button)


Textarea(value='', description='Review:', layout=Layout(width='50%'), placeholder='Type your review here...')

Button(button_style='success', description='Submit', icon='check', style=ButtonStyle(), tooltip='Click to subm…

Review: 'This restaurant is amazing! The food was delicious,but have no space for the parking'
Predicted Sentiment: positive

Review: 'the dessert was disappointing sugar was too much'
Predicted Sentiment: negative

Review: 'The pasta was great, but the dessert was disappointing'
Predicted Sentiment: negative



# ***C o n c l u s i o n :***

The DistilBERT model was fine-tuned for sentiment analysis in this study which revealed its efficacy in classifying text reviews into positive and negative sentiments. During training, well-balanced model with high accuracy, F1 score, precision and recall throughout epochs were established indicating strong performance binary classification tasks.

Sentiment prediction function was also tested on actual reviews where it accurately recognized the sentiment of each case even if there were mixed reviews. Such ability to distinguish subtle differentials in sentiments points at its strength thus making it applicable.

Thus, conducting a sentiment analysis using DistilBERT based model is a good tool for understanding customer feedback, which results in better understanding of customer satisfaction levels hence business decisions based on these trends are more reliable when dictated by sentiments.