In [1]:
import pandas as pd
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf

In [2]:
df = pd.read_csv("data/BA_reviews_proc.csv", index_col=0)

In [3]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)

All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


In [4]:
reviews_tok = tokenizer(df.reviews.tolist(), 
                        max_length=512, 
                        truncation=True, 
                        return_tensors="tf", 
                        padding=True
                    )
reviews_ds = tf.data.Dataset.from_tensor_slices((dict(reviews_tok)))

In [5]:
preds = model.predict(reviews_ds.batch(256))

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


In [6]:
preds_np = tf.nn.softmax(preds.logits, axis=1).numpy()

In [7]:
preds_np[:,1].shape

(3451,)

In [8]:
df['Sentiment'] = preds_np[:,1]

In [9]:
df.head(5)

Unnamed: 0,reviews,Verified,Sentiment
0,Probably the worst business class experience I...,1,0.000232
1,"Definitely not recommended, especially for bus...",1,0.003603
2,BA shuttle service across the UK is still surp...,1,0.99367
3,I must admit like many others I tend to avoid ...,1,0.009478
4,When will BA update their Business class cabin...,0,0.000529


In [10]:
for row in df.sample(5).iterrows():
    print('REVIEW:')
    print(f'SENTIMENT RATING: {row[1].Sentiment:.4f}')
    print(f'REVIEW: {row[1].reviews}')
    print('-'*30)

REVIEW 1:
SENTIMENT RATING: 0.9620
REVIEW: Gatwick-Naples returning Barcelona-Gatwick. Excellent flights as usual. Especially loved the stylish and comfortable interiors of the refitted A319 on the Naples run. Both flights left late but it didn't seem to seriously affect arrival time. Only complaint is the food/drinks service sandwich is not to my taste and I now refuse them - but other passengers seem pleased. Should offer an alternative such as crisps or nuts to go with wine.
------------------------------
REVIEW 1:
SENTIMENT RATING: 0.0002
REVIEW: London to St Petersburgh. Huge disappointment for BA business class service in European routes. Seat pitch has been reduced to 30" over the last years (from 34") at the expense of passenger comfort. Seat width is the same as economy with empty middle seat. Food for me was half-empty plates representing a deli experience, and leaving me hungry. Out of the three options offered, two were already out of stock, so stranded with the least favou

In [11]:
df.to_csv("data/BA_reviews_sentiments.csv")

# Summary of results

In [12]:
print('Number of positive reviews: ', df[df['Sentiment'] > 0.8].shape[0])
print('Number of negative reviews: ', df[df['Sentiment'] < 0.2].shape[0])

Number of positive reviews:  938
Number of negative reviews:  2383
