# YELP Reviews Sentiment Analysis

In this notebook, we'll fine-tune transformers to handle sentiment analysis task on YELP reviews. More specifically, we will classify reviews into two categories: positive and negative. Short basic explorative data analysis notebook is available in `notebooks/yelp_eda.ipynb`.

We'll use HuggingFace `transformers` to build a model. It's definition and training process are encapsulated into a separate class called `TransformersGeneric`. Additionally, some convenience utilities are defined in `utils` module, e.g. training/validation/test set splitting and text preprocessing routines.

We'll track the training process with Weights & Biases and all runs along with their metrics will be available here: https://app.wandb.ai/vasily/yelp-reviews-sentiment-analysis?workspace=user-vasily

Transformer-based models training is barely feasible on CPU due to millions of trainable parameters, thus it's recommended to use GPU.

## Installs and Imports

First, install and import the libraries used throughout the project.

In [0]:
%pip install -qq transformers
%pip install -qq wandb

In [0]:
import torch

from transformers import AlbertForSequenceClassification      # Can change to Bert..., Roberta..., etc.
from transformers import AlbertTokenizer                      # Can change to Bert..., Roberta..., etc.
from transformers import AdamW
from transformers import get_linear_schedule_with_warmup

from utils.data_loading import load_and_split_dataset
from utils.data_loading import data_loader_from_tensors
from utils.text_preprocessing import TextPreprocessor

from models.transformers_generic import TransformersGeneric

from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

## WandB Tracking

Set up WandB tracking.

In [3]:
import wandb
wandb.login()

True

Define the project and hyperparameters of the run. It is also important to make sure that `model_name` fits the model architecture imported previously and trained in the next steps. For a list model architectures and associated `model_name`s refer to https://huggingface.co/transformers/pretrained_models.html

In [4]:
config = {
  'model_name': 'albert-base-v2',
  'learning_rate': 2e-5,
  'epochs': 4
}

wandb.init(project='yelp-reviews-sentiment-analysis', magic=True, config=config)

W&B Run: https://app.wandb.ai/vasily/yelp-reviews-sentiment-analysis/runs/3fy9wmwp

## Data Loading and Preprocessing

YELP dataset contains huge reviews corpus which is impossible to use with big and resource-demanding transformer models on a standard PC or laptop. Thus, we have previously extracted a subsample of the data used in this project. It contains 125,000 reviews which we'll use for training (60%), validation (20%) and testing(20%).

In [0]:
# Load the dataset, data/yelp_reviews.json.gz by default.
# Use 60% of the data for training, 20% for validation and 20% for testing purposes.
(X_train, y_train), (X_val, y_val), (X_test, y_test) = load_and_split_dataset('../data/yelp_reviews.json.gz')

We'll define 4- and 5-stars as positive and the rest as negative reviews. 

In [0]:
# 1, 2 or 3 stars - negative
# 4 or 5 stars - positive
y_train_sent = torch.tensor(list(map(lambda x: int(x > 3), y_train)))
y_val_sent = torch.tensor(list(map(lambda x: int(x > 3), y_val)))
y_test_sent = torch.tensor(list(map(lambda x: int(x > 3), y_test)))

Text preprocessing includes tokenization and padding/truncating of the input sequences to the same length. In addition, we are creating data loaders for further use in training, validation and testing.

In [0]:
# Prepare the data for Transformers
preprocessor = TextPreprocessor(tokenizer=AlbertTokenizer, vocab_file=wandb.config.model_name)
train_tensor, train_masks_tensor = preprocessor.preprocess(X_train, fit=True)
val_tensor, val_masks_tensor = preprocessor.preprocess(X_val)
test_tensor, test_masks_tensor = preprocessor.preprocess(X_test)

In [0]:
# Create data loaders
train_data_loader = data_loader_from_tensors(train_tensor, train_masks_tensor, y_train_sent, batch_size=16)
val_data_loader = data_loader_from_tensors(val_tensor, val_masks_tensor, y_val_sent, batch_size=16)
test_data_loader = data_loader_from_tensors(test_tensor, test_masks_tensor, y_test_sent, batch_size=16)

## Model Creation and Training

`TransformersGeneric` class accept several parameters (see docstring), however, there are three more important ones:

1. `num_classes` - number of classes in the task. In case of sentiment analysis we have only two classes.
2. `transformers_model` - **classification** model architecture from `transformers` library.
3. `model_name` - pre-defined model name available in `transformers`. Full list is available here (shortcut name): https://huggingface.co/transformers/pretrained_models.html

In [9]:
# Initialize a model
generic = TransformersGeneric(num_classes=2, 
                              transformers_model=AlbertForSequenceClassification,
                              model_name=wandb.config.model_name)

wandb.watch(generic.model, log='all')

Tesla P100-PCIE-16GB is used...


[<wandb.wandb_torch.TorchGraph at 0x7f830730e828>]

Next, we set up an optimizer and a scheduler. We'll use Adam with weight decay and linearly schedule learning rate to decrease over time without warmup. We follow the recommendation in the original [BERT paper](https://arxiv.org/abs/1810.04805) and use 4 epochs for training and learning rate equal to `2e-5`.

In [0]:
# Define optimizer
optimizer = AdamW(generic.get_parameters(), lr=wandb.config.learning_rate, eps=1e-8)

# Total number of training steps is number of batches * number of epochs
total_steps = len(train_data_loader) * wandb.config.epochs

# Learning rate scheduler
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps=0,
                                            num_training_steps=total_steps)

Train the model and store the training metrics in a variable.

In [11]:
train_metrics = generic.train(train_data_loader, val_data_loader, optimizer, scheduler, wandb.config.epochs)


-------------------------	1	-------------------------
Training...
	Batch 25/4688, elapsed 00:00:21.32
	Batch 50/4688, elapsed 00:00:43.19
	Batch 75/4688, elapsed 00:01:05.08
	Batch 100/4688, elapsed 00:01:26.95
	Batch 125/4688, elapsed 00:01:49.21
	Batch 150/4688, elapsed 00:02:11.06
	Batch 175/4688, elapsed 00:02:32.95
	Batch 200/4688, elapsed 00:02:54.82
	Batch 225/4688, elapsed 00:03:17.08
	Batch 250/4688, elapsed 00:03:38.97
	Batch 275/4688, elapsed 00:04:00.84
	Batch 300/4688, elapsed 00:04:22.74
	Batch 325/4688, elapsed 00:04:45.00
	Batch 350/4688, elapsed 00:05:06.87
	Batch 375/4688, elapsed 00:05:28.76
	Batch 400/4688, elapsed 00:05:50.62
	Batch 425/4688, elapsed 00:06:12.88
	Batch 450/4688, elapsed 00:06:34.77
	Batch 475/4688, elapsed 00:06:56.64
	Batch 500/4688, elapsed 00:07:18.53
	Batch 525/4688, elapsed 00:07:40.79
	Batch 550/4688, elapsed 00:08:02.72
	Batch 575/4688, elapsed 00:08:24.62
	Batch 600/4688, elapsed 00:08:46.52
	Batch 625/4688, elapsed 00:09:08.80
	Batch 650/

Log the metrics to WandB.

In [0]:
# Log to WandB
for metrics in train_metrics:
  wandb.log(metrics)

## Model Evaluation

Test the model on unseen data and log accuracy and F1 score to WandB.

In [13]:
results = generic.evaluate(test_data_loader, **{'test_f1': f1_score, 'test_accuracy': accuracy_score})
wandb.log(results)
results

{'test_accuracy': 0.9139475367882278, 'test_f1': 0.8829677842334718}