### Transformer using Hugging Face pipelines

In [1]:
import pandas as pd
import numpy as np

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
valid = pd.read_csv("valid.csv")

train.loc[train["review_score"]==-1, "review_score"]=0
test.loc[test["review_score"]==-1, "review_score"]=0
valid.loc[valid["review_score"]==-1, "review_score"]=0

In [6]:
import torch
torch.cuda.is_available()

True

In [16]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis", device=0)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [17]:
test["predicted_score"] = sentiment_pipeline(test["review_text"].tolist(), truncation=True)

In [18]:
test.iloc[0]["predicted_score"]

{'label': 'POSITIVE', 'score': 0.9997923970222473}

In [20]:
str_to_int_score = {"POSITIVE" : 1, "NEGATIVE" : 0}

test["model_predictions"] = test["predicted_score"].apply(lambda x: str_to_int_score[x["label"]])

In [21]:
test.head()

Unnamed: 0.1,Unnamed: 0,review_text,review_score,predicted_score,model_predictions
0,1265039,I love the Fact you can do what EVER you want ...,1,"{'label': 'POSITIVE', 'score': 0.9997923970222...",1
1,3132003,Tony Hawk's without the Pro Skater. Finding ou...,1,"{'label': 'POSITIVE', 'score': 0.9989967942237...",1
2,880195,It's pretty good.,1,"{'label': 'POSITIVE', 'score': 0.9998482465744...",1
3,717128,This the best dungeon game I have played since...,1,"{'label': 'POSITIVE', 'score': 0.9998807907104...",1
4,5221356,Totally awesome game alone or with a friend. I...,1,"{'label': 'POSITIVE', 'score': 0.9998763799667...",1


In [22]:
def get_metrics():
    df = test
    predictions = df["model_predictions"].to_numpy()
    true_values = df["review_score"].to_numpy()
    accuracy = np.sum(np.rint(predictions) == true_values)/len(true_values)
    TN_count = len(df.query("`review_score`==0 and `model_predictions`==0").index)
    TP_count = len(df.query("`review_score`==1 and `model_predictions`==1").index)
    FP_count = len(df.query("`review_score`==0 and `model_predictions`==1").index)
    FN_count = len(df.query("`review_score`==1 and `model_predictions`==0").index)
    precision = TP_count/(TP_count+FP_count)
    recall = TP_count/(TP_count+FN_count)
    F1_score = (2*precision*recall)/(precision+recall)
    print(f"Accuracy: {accuracy:.2f}")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1 Score: {F1_score:.2f}")
get_metrics()

Accuracy: 0.77
Precision: 0.97
Recall: 0.75
F1 Score: 0.84


The default model (distilbert/distilbert-base-uncased-finetuned-sst-2-english) is (according to the model card) designed for topic classification. Let's try a review sentiment analysis model instead:

In [26]:
sentiment_pipeline = pipeline(model="nlptown/bert-base-multilingual-uncased-sentiment", device=0)



config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/669M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [27]:
sentiment_pipeline(test.iloc[0]["review_text"])

[{'label': '5 stars', 'score': 0.8000338673591614}]

In [28]:
test["predicted_score"] = sentiment_pipeline(test["review_text"].tolist(), truncation=True)

In [29]:
test["predicted_score"] = test["predicted_score"].apply(lambda x : x["label"])
test["predicted_score"].value_counts()

predicted_score
5 stars    6183
4 stars    3952
1 star     2399
3 stars    1883
2 stars    1299
Name: count, dtype: int64

In [31]:
# The model predicts stars, not "positve / negative". Thus for this dataset the output needs to be converted.
str_to_int_score = {"5 stars" : 1, "4 stars" : 1, "3 stars": 1, "2 stars": 0, "1 star": 0} # The choice of this decision boundary is arbitrary.

test["model_predictions"] = test["predicted_score"].apply(lambda x: str_to_int_score[x])

In [32]:
get_metrics()

Accuracy: 0.86
Precision: 0.95
Recall: 0.88
F1 Score: 0.91


The results are now better. Compared to the LSTM, this model has slightly lower precision and higher recall, so more reviews will be considered positive (including false positives).

In [38]:
def test_review_text(sentence):
    model_output = sentiment_pipeline([sentence])
    score = str_to_int_score[model_output[0]["label"]]
    print(score)
    if score==0:
        print("Negative review")
    else:
        print("Positive review")

In [39]:
test_review_text("A buggy, uninspired mess")

0
Negative review


In [40]:
test_review_text("This game is bad")

0
Negative review


In [41]:
test_review_text("This game destroyed my life")

0
Negative review


In [42]:
test_review_text("Best game I've ever played")

1
Positive review


In [43]:
test_review_text("Fun cooperative play with scalable difficulty. Rapid path to get into a game with friends or open public games. ")

1
Positive review


In [44]:
test_review_text("Deliriously buggy. Fun if/when it works properly. Wait and see if they actually QA the next few patches before you play.")

0
Negative review
