In [1]:
from transformers import pipeline
import pandas as pd 

In [2]:
#Let's read the data
sent_data = pd.read_csv('amazon_reviews.csv')

In [3]:
#We'll need the strings in the reviews as a list 
sent_data = sent_data['reviewText'].values.tolist()

# Models

In [4]:
#Let's get the three models we will use
model_1 = pipeline("sentiment-analysis", model="distilbert-base-uncased")
model_2 = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
model_3 = pipeline("sentiment-analysis", model="juliensimon/reviews-sentiment-analysis")

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.bias', 'pre_classifi

These models are from the Hugging Face Transformers library. The three models are:<br>
model 1: based on the DistilBERT architecture <br>
model 2: a variant of the DistilBERT model fine-tuned on the Stanford Sentiment Treebank (SST-2) dataset<br>
model 3: Distilbert model fine-tuned on English language product reviews by Julie Simon<br>
These models will be called model_1, model_2, and model_3 respectively

In [5]:
#Let's see the sentiment of a random review for all 3 models
text = sent_data[2]
text

'it works as expected. I should have sprung for the higher capacity.  I think its made a bit cheesier than the earlier versions; the paint looks not as clean as before'

In [6]:
model_1(text)

[{'label': 'LABEL_1', 'score': 0.5094072222709656}]

In [7]:
model_2(text)

[{'label': 'NEGATIVE', 'score': 0.9990612864494324}]

In [8]:
model_3(text)

[{'label': 'LABEL_1', 'score': 0.7103031873703003}]

For model_1 and model_3, LABEL_1 = positive and LABEL_0 = negative
model_1 is more neutral with its score.
model_2 has an incredibly high negative score, even though the review was nowhere near that negative. 
model_3 has an incredibly high positive score, even though the review was not that positive. However, I would say that the review is more positive than negative.

# Sentiment Analysis

In [9]:
# Let's print the sentiment predictions for the first 20 reviews
try:
    for i, text in enumerate(sent_data):
        prediction = model_1(text)
        print(f"Sentence: {text}")
        print(f"Sentiment Prediction: {prediction[0]['label']} with confidence {prediction[0]['score']:.4f}")
        print("=" * 20)
except KeyboardInterrupt:
    print('user ended script')

Sentence: No issues.
Sentiment Prediction: LABEL_1 with confidence 0.5161
Sentence: Purchased this for my device, it worked as advertised. You can never have too much phone memory, since I download a lot of stuff this was a no brainer for me.
Sentiment Prediction: LABEL_1 with confidence 0.5203
Sentence: it works as expected. I should have sprung for the higher capacity.  I think its made a bit cheesier than the earlier versions; the paint looks not as clean as before
Sentiment Prediction: LABEL_1 with confidence 0.5094
Sentence: This think has worked out great.Had a diff. bran 64gb card and if went south after 3 months.This one has held up pretty well since I had my S3, now on my Note3.*** update 3/21/14I've had this for a few months and have had ZERO issue's since it was transferred from my S3 to my Note3 and into a note2. This card is reliable and solid!Cheers!
Sentiment Prediction: LABEL_1 with confidence 0.5277
Sentence: Bought it with Retail Packaging, arrived legit, in a orange 

In [10]:
try:
    for i, text in enumerate(sent_data):
        prediction = model_2(text)
        print(f"Sentence: {text}")
        print(f"Sentiment Prediction: {prediction[0]['label']} with confidence {prediction[0]['score']:.4f}")
        print("=" * 20)
except KeyboardInterrupt:
    print('user ended script')

Sentence: No issues.
Sentiment Prediction: POSITIVE with confidence 0.7489
Sentence: Purchased this for my device, it worked as advertised. You can never have too much phone memory, since I download a lot of stuff this was a no brainer for me.
Sentiment Prediction: POSITIVE with confidence 0.9793
Sentence: it works as expected. I should have sprung for the higher capacity.  I think its made a bit cheesier than the earlier versions; the paint looks not as clean as before
Sentiment Prediction: NEGATIVE with confidence 0.9991
Sentence: This think has worked out great.Had a diff. bran 64gb card and if went south after 3 months.This one has held up pretty well since I had my S3, now on my Note3.*** update 3/21/14I've had this for a few months and have had ZERO issue's since it was transferred from my S3 to my Note3 and into a note2. This card is reliable and solid!Cheers!
Sentiment Prediction: POSITIVE with confidence 0.9992
Sentence: Bought it with Retail Packaging, arrived legit, in a ora

In [11]:
try:
    for i, text in enumerate(sent_data):
        prediction = model_3(text)
        print(f"Sentence: {text}")
        print(f"Sentiment Prediction: {prediction[0]['label']} with confidence {prediction[0]['score']:.4f}")
        print("=" * 20)
except KeyboardInterrupt:
    print('user ended script')

Sentence: No issues.
Sentiment Prediction: LABEL_1 with confidence 0.8572
Sentence: Purchased this for my device, it worked as advertised. You can never have too much phone memory, since I download a lot of stuff this was a no brainer for me.
Sentiment Prediction: LABEL_1 with confidence 0.9663
Sentence: it works as expected. I should have sprung for the higher capacity.  I think its made a bit cheesier than the earlier versions; the paint looks not as clean as before
Sentiment Prediction: LABEL_1 with confidence 0.7103
Sentence: This think has worked out great.Had a diff. bran 64gb card and if went south after 3 months.This one has held up pretty well since I had my S3, now on my Note3.*** update 3/21/14I've had this for a few months and have had ZERO issue's since it was transferred from my S3 to my Note3 and into a note2. This card is reliable and solid!Cheers!
Sentiment Prediction: LABEL_1 with confidence 0.9583
Sentence: Bought it with Retail Packaging, arrived legit, in a orange 

It seems that:
model_1 will almost always have a score that is as close to 50% as possible<br>
model_2 will almost always be extreme as possible, either 90-99% negative or 90-99%+ positive<br>
model_3 seems to more tempered than model_1 and model_2

In [12]:
#Let's look at 2 more reviews. We'll do the reviews at index 100 and 3000
practice_text = sent_data[100]
practice_text

"This item was priced right and fit right into my cell phone without any issues. Does what it is suppose to do and I don't know what else I could say. Being a class 10 it is suppose to be plenty fast for cell phone use which I am sure speeds up camera use to some degree. Has worked well and the shipping was fast."

In [13]:
model_1(practice_text)

[{'label': 'LABEL_1', 'score': 0.5204753279685974}]

In [14]:
model_2(practice_text)

[{'label': 'POSITIVE', 'score': 0.9289764165878296}]

In [15]:
model_3(practice_text)

[{'label': 'LABEL_1', 'score': 0.968295156955719}]

model_1 is not able to be any higher than 55%, and that hurts it when looking at this review.<br>
model_2 is correct and model_3 are correct.<br>
It is interesting that model_3 has a higher score than model_2, as that is usually not the case. However, this review is quite positive, so it makes sense. 

In [16]:
practice_text2 = sent_data[3000]
practice_text2

'Plenty of extra storage can load a lot of movies in high definition, music and a boat load of other stuff.'

In [17]:
model_1(practice_text2)

[{'label': 'LABEL_1', 'score': 0.5050829648971558}]

In [18]:
model_2(practice_text2)

[{'label': 'NEGATIVE', 'score': 0.994924783706665}]

In [19]:
model_3(practice_text2)

[{'label': 'LABEL_1', 'score': 0.9126739501953125}]

Overall: <br>
model_1 has the sentiment correct, but can not handle extremes(very positive or very negative reviews). <br>
model_2 is incorrect about the sentiment for this review. It rarely seems to capture reviews that are more neutral. <br>
model_3 is correct about the sentiment and the score. It generally seems to handle extreme reviews better than model_1 and neutral reviews better than model_2.

# Created Reviews

In [20]:
#Let's try three brand new strings(one super positive, one super negative, and one neutral)
best_string = "This is the greatest sd card I have ever used, and I have used many throughout my lifetime. Enough said."
worst_string = "I really have no words for this. This piece of trash that they call a SD card is useless."
neutral_string = "It works fine enough. The file transfer speed could be faster though."

In [21]:
model_1(best_string)

[{'label': 'LABEL_0', 'score': 0.5020228028297424}]

In [22]:
model_2(best_string)

[{'label': 'POSITIVE', 'score': 0.9996123909950256}]

In [23]:
model_3(best_string)

[{'label': 'LABEL_1', 'score': 0.9721803665161133}]

In [24]:
model_1(worst_string)

[{'label': 'LABEL_1', 'score': 0.5084934234619141}]

In [25]:
model_2(worst_string)

[{'label': 'NEGATIVE', 'score': 0.9998171925544739}]

In [26]:
model_3(worst_string)

[{'label': 'LABEL_0', 'score': 0.9995878338813782}]

In [27]:
model_1(neutral_string)

[{'label': 'LABEL_1', 'score': 0.5166916251182556}]

In [28]:
model_2(neutral_string)

[{'label': 'POSITIVE', 'score': 0.998728334903717}]

In [29]:
model_3(neutral_string)

[{'label': 'LABEL_0', 'score': 0.5405436158180237}]

# Results

For the positive string, these are the models ranked:<br>
1)model_2 = (sentiment = positive, score = 99.96%)<br>
2)model_3 = (sentiment = positive, score = 97.22%)<br>
3)model_1 = (sentiment = positive, score = 50.69%)<br>

For the negative string, these are the models ranked:<br>
1)model_2 = (sentiment = negative, score = 99.98%)<br>
2)model_3 = (sentiment = negative, score = 99.96%)<br>
3)model_1 = (sentiment = positive, score = 51.08%)<br>

For the neutral string, these are the models ranked:<br>
1)model_1 = (sentiment = positive, score = 52.66%)<br>
2)model_3 = (sentiment = negative, score = 54.05%)<br>
3)model_2 = (sentiment = positive, score = 99.87%)<br>

# Conclusion

model_1 can handle neutral reviews well, but not extremely positive or extremely negative reviews well. <br>model_2 can handle extremely negative and extremely positive reviews well, but not neutral reviews well. <br>model_3 can handle extremely positive, extremely negative, and neutral reviews well. 

So, model_3 is the best model of the three models for sentiment analysis.