Install dependencies.

In [1]:
import pandas as pd
import numpy as np
import torch
from torch import nn
from transformers import BertModel, BertTokenizer
from tqdm import tqdm
from torch.utils.data import DataLoader, TensorDataset, Dataset

  from .autonotebook import tqdm as notebook_tqdm


Load in the dataset with latin1 encoding.

In [3]:
df = pd.read_csv("/home/ryler/Datasets/Mcdonalds-Review-Text-Classification/McDonald_s_Reviews.csv", encoding="latin1")
# df = pd.read_csv("/home/rynutty/Documents/DataSets/Mcdonalds-Reviews/McDonald_s_Reviews.csv", encoding="latin1")

Review the features.

In [4]:
df.head()

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
0,1,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Why does it look like someone spit on my food?...,1 star
1,2,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,It'd McDonalds. It is what it is as far as the...,4 stars
2,3,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,Made a mobile order got to the speaker and che...,1 star
3,4,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a month ago,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,5 stars
4,5,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,2 months ago,"I repeat my order 3 times in the drive thru, a...",1 star


Get a sense of the size of the dataset.

In [5]:
df.shape

(33396, 10)

I saw someone else modify the ratings label into three classes. 2 = positive rating, 1 = neutral rating, 0 = negative rating. I think this makes a lot more sense than predicting exact start ratings because it's never implied in a review whether the rating would be 4 or 5 starts, but it's easy to tell whether the rating is positive, neutral, or negative.

In [6]:
def clean_ratings(ratings):
    ratings = [int(rate[0]) for rate in ratings]
    cleaned_ratings = []

    for rating in ratings:
        if rating >= 4:
            cleaned_ratings.append(2)
        elif rating == 3:
            cleaned_ratings.append(1)
        else:
            cleaned_ratings.append(0)

    return cleaned_ratings

Split our dataset into evaluation and training.

In [15]:
train_cutoff = int(len(df) * 0.8)

train = df.iloc[:train_cutoff, :]
eval = df.iloc[train_cutoff:, :].reset_index(drop=True)

x_train = train["review"]
y_train = clean_ratings(train["rating"])

x_eval = eval["review"]
y_eval = clean_ratings(eval["rating"])

MLP class just to make the code a little bit easier to read. Not very necessary.

In [None]:
class MLP(nn.Module):

    def __init__(self, in_features, out_features):
        super().__init__()

        self.inference = nn.Sequential(
            nn.Linear(in_features, out_features),
            nn.ReLU()
        )

    def forward(self, x):
        return self.inference(x)

This is where I made some transformer encoder blocks for contextualising our embeddings, this is what actually processes the data.

The embeddings start off carrying no contextual information, each token only knows about itself. Transformers use a mechanism known as Attention. Attention makes each word in the sentence look at eachother, for example the last word in a sentence may capture the meaning of the entire sentence because it now carries information about all of the words that came before it.

Think of how you would read a paragraph, you start at the beginning knowing nothing, but by the end you've built up enough context so that by the time you reach the last word, you understand the meaning behind the paragraph, that's similar to how transformers behave for NLP tasks.

And once you have that embedding that carries the whole meaning of a sentence or paragraph, you can filter that through a MLP to get a good classification.

In [16]:
class Encoder(nn.Module):

    def __init__(self, d_model, num_heads, num_encoder_blocks):
        super().__init__()

        blocks = [EncoderBlock(d_model=d_model, num_heads=num_heads) for _ in range(num_encoder_blocks)]
        self.inference = nn.Sequential(*blocks)

    def forward(self, x):
        return self.inference(x)
    

class EncoderBlock(nn.Module):

    def __init__(self, d_model, num_heads):
        super().__init__()

        self.layer_norm = nn.LayerNorm(d_model)
        self.mha = nn.MultiheadAttention(embed_dim=d_model, num_heads=num_heads, dropout=0.2, batch_first=True, device="cuda")
        self.mlp = MLP(in_features=d_model, out_features=d_model)

    def forward(self, x):
        x = self.layer_norm(x)
        identity = x
        x, attention_weights = self.mha(x, x, x)
        x = x + identity
        x = self.layer_norm(x)
        identity = x
        x = self.mlp(x)
        x = x + identity

        return x

This model just ties together my encoder with a classification head that takes the contextualised embeddings and makes classification with them.

In [None]:
class SequenceClassifier(nn.Module):

    def __init__(self):
        super().__init__()
            
        self.encoder = Encoder(d_model=768, num_heads=12, num_encoder_blocks=2)
        self.head = nn.Linear(in_features=768, out_features=3)

    def forward(self, x):
        x = self.encoder(x)
        x = x[:, -1, :]
        x = self.head(x)
        
        return x

I decided to use a pretrained embedding space from bert because I didn't want to take a long time training my own, I didn't let bert process the embeddings, I just got positional encodings and base embeddings.

In [9]:
class Embedder():

    def __init__(self):

        self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
        self.model = BertModel.from_pretrained("bert-base-uncased")

        for param in self.model.parameters():
            param.requires_grad = False
    
    def __call__(self, sentences: str):

        if isinstance(sentences, torch.Tensor):
            sentences = list(sentences)
            
        input_ids = self.tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")["input_ids"]
        embeddings = self.model.embeddings(input_ids)

        return embeddings

Dataset class for my datasets.

In [10]:
class ReviewDataset(Dataset):

    def __init__(self, reviews, ratings):
        super().__init__()

        self.reviews = reviews
        self.ratings = ratings
        self.embedder = Embedder()

    def __len__(self):
        return len(self.reviews)
    
    def __getitem__(self, idx):
        return self.reviews[idx], self.ratings[idx]
    

My model initializaton, optimizer initialization, and loss function initialization. You can also see I moved my model to my GPU for faster training.

In [11]:
model = SequenceClassifier().to("cuda")
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001, weight_decay=0.001)
loss_fn = nn.CrossEntropyLoss()

Created my data loaders with my dataset class so that training becomes more simple, and it batches for me automatically, it also shuffles too which is nice.

In [None]:
train_dataset = ReviewDataset(reviews=x_train, ratings=y_train)
test_dataset = ReviewDataset(reviews=x_eval, ratings=y_eval)

train_dataloader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(dataset=test_dataset, batch_size=32, shuffle=True)

Clear anything off of my GPU that we don't need so it doesn't mess with training.

In [13]:
torch.cuda.empty_cache()

Run the training loop for 20 epochs.

In [14]:
epochs = 20
embedder = Embedder()

for epoch in range(1, epochs+1):
    print("Starting Training...")

    running_train_loss = 0
    train_total = 0
    train_correct = 0

    model.train()
    for reviews, ratings in tqdm(train_dataloader):
        embeddings = embedder(reviews).to("cuda")
        ratings = ratings.to("cuda")

        optimizer.zero_grad()
        logits = model(embeddings)

        loss = loss_fn(logits, ratings)
        loss.backward()
        optimizer.step()

        running_train_loss += loss.item()
        train_total += len(reviews)
        prediction = torch.argmax(logits, dim=1)
        train_correct += (prediction == ratings).sum().item()

    print(f"Avg loss: {running_train_loss / train_total}, accuracy: {train_correct / train_total}")

    running_eval_loss = 0
    eval_total = 0
    eval_correct = 0

    model.eval()
    with torch.no_grad():
        for reviews, ratings in tqdm(test_dataloader):
            embeddings = embedder(reviews).to("cuda")
            ratings = ratings.to("cuda")

            logits = model(embeddings)
            loss = loss_fn(logits, ratings)

            running_eval_loss += loss.item()
            eval_total += len(reviews)
            predicted = torch.argmax(logits, dim=1)
            eval_correct += (predicted == ratings).sum().item()

    print(f"Avg loss: {running_eval_loss / eval_total}, accuracy: {eval_correct / eval_total}")



Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.79it/s]


Avg loss: 0.01979332622227099, accuracy: 0.7558766282377601


100%|██████████| 209/209 [00:10<00:00, 20.37it/s]


Avg loss: 0.01728005724663506, accuracy: 0.7901197604790419
Starting Training...


100%|██████████| 835/835 [01:26<00:00,  9.63it/s]


Avg loss: 0.01685548608704034, accuracy: 0.7900509058242252


100%|██████████| 209/209 [00:10<00:00, 19.85it/s]


Avg loss: 0.01689126124817454, accuracy: 0.7925149700598803
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.79it/s]


Avg loss: 0.015891447748086967, accuracy: 0.8001946399161551


100%|██████████| 209/209 [00:09<00:00, 21.72it/s]


Avg loss: 0.015023360452698377, accuracy: 0.8247005988023952
Starting Training...


100%|██████████| 835/835 [01:21<00:00, 10.19it/s]


Avg loss: 0.015306451312304079, accuracy: 0.8072690522533313


100%|██████████| 209/209 [00:09<00:00, 21.07it/s]


Avg loss: 0.014783857747198578, accuracy: 0.8203592814371258
Starting Training...


100%|██████████| 835/835 [01:22<00:00, 10.17it/s]


Avg loss: 0.014963731971204165, accuracy: 0.8111618505764336


100%|██████████| 209/209 [00:09<00:00, 21.00it/s]


Avg loss: 0.014278978698267908, accuracy: 0.8252994011976048
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.77it/s]


Avg loss: 0.014652793409604301, accuracy: 0.8140814493187603


100%|██████████| 209/209 [00:10<00:00, 19.80it/s]


Avg loss: 0.015495571784095136, accuracy: 0.8143712574850299
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.75it/s]


Avg loss: 0.014437289531202548, accuracy: 0.8176373708639018


100%|██████████| 209/209 [00:10<00:00, 19.51it/s]


Avg loss: 0.014371763437451003, accuracy: 0.8199101796407186
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.78it/s]


Avg loss: 0.014215425828446402, accuracy: 0.8231022608174876


100%|██████████| 209/209 [00:10<00:00, 19.84it/s]


Avg loss: 0.0145841494581835, accuracy: 0.8194610778443113
Starting Training...


100%|██████████| 835/835 [01:24<00:00,  9.90it/s]


Avg loss: 0.013999511107527972, accuracy: 0.8228776762988471


100%|██████████| 209/209 [00:10<00:00, 19.95it/s]


Avg loss: 0.01410243453424491, accuracy: 0.828443113772455
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.72it/s]


Avg loss: 0.01384698737511415, accuracy: 0.825759844288067


100%|██████████| 209/209 [00:10<00:00, 19.83it/s]


Avg loss: 0.014030677207305046, accuracy: 0.8288922155688623
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.73it/s]


Avg loss: 0.013626957607561108, accuracy: 0.8277062434496182


100%|██████████| 209/209 [00:10<00:00, 19.97it/s]


Avg loss: 0.014332331254141416, accuracy: 0.8315868263473054
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.79it/s]


Avg loss: 0.013519834683543463, accuracy: 0.8277811049558317


100%|██████████| 209/209 [00:10<00:00, 19.96it/s]


Avg loss: 0.01366175369409744, accuracy: 0.8345808383233533
Starting Training...


100%|██████████| 835/835 [01:24<00:00,  9.86it/s]


Avg loss: 0.013355428243351833, accuracy: 0.8317487647851475


100%|██████████| 209/209 [00:10<00:00, 20.27it/s]


Avg loss: 0.013377636305229392, accuracy: 0.8392215568862276
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.77it/s]


Avg loss: 0.013218331532080135, accuracy: 0.8353421170833957


100%|██████████| 209/209 [00:10<00:00, 19.84it/s]


Avg loss: 0.013688191314211149, accuracy: 0.834131736526946
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.80it/s]


Avg loss: 0.013053038635942765, accuracy: 0.8339571792184459


100%|██████████| 209/209 [00:10<00:00, 20.16it/s]


Avg loss: 0.01346589008758882, accuracy: 0.8339820359281437
Starting Training...


100%|██████████| 835/835 [01:24<00:00,  9.83it/s]


Avg loss: 0.012912344770359054, accuracy: 0.835154963317862


100%|██████████| 209/209 [00:10<00:00, 20.04it/s]


Avg loss: 0.013326317963084418, accuracy: 0.8407185628742515
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.78it/s]


Avg loss: 0.012803539453036129, accuracy: 0.8364276089234916


100%|██████████| 209/209 [00:10<00:00, 20.72it/s]


Avg loss: 0.013685762770697028, accuracy: 0.8327844311377246
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.79it/s]


Avg loss: 0.012694661106581299, accuracy: 0.8384488695912562


100%|██████████| 209/209 [00:10<00:00, 19.93it/s]


Avg loss: 0.01300896715573565, accuracy: 0.8401197604790419
Starting Training...


100%|██████████| 835/835 [01:24<00:00,  9.92it/s]


Avg loss: 0.01262327447965315, accuracy: 0.8398338074562061


100%|██████████| 209/209 [00:10<00:00, 20.57it/s]


Avg loss: 0.013015102334097474, accuracy: 0.8422155688622754
Starting Training...


100%|██████████| 835/835 [01:25<00:00,  9.79it/s]


Avg loss: 0.012554030250219693, accuracy: 0.8393472076658183


100%|██████████| 209/209 [00:10<00:00, 20.27it/s]

Avg loss: 0.013161573931574822, accuracy: 0.8423652694610778





Thank you for reading through this if you did, this is my first post on kaggle so hopefully it went alright! I also noticed my model didn't end up overfitting which I love to see.