# Deployable Sentiment Analysis Using __[Hugging Face 🤗](https://huggingface.co/)__ Transformer Model

- The model used in this project is: __[distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english?text=I+used+to+play+this+game+years+ago+and+loved+it.+I+found+this+did+not+work+on+my+computer+even+though+it+said+it+would+work+with+Windows+7.#training)__
- The dataset used in this project is a __[large crawl of amazon reviews](https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews)__


In [71]:
from ipywidgets import FloatProgress
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import json
import pandas as pd
import os

#### Reading and preprocessing the review data

In [28]:
reviews = []
with open('data/Video_Games.json') as f:
    for line in f:
        reviews.append(json.loads(line))
    # data = json.load(f)

In [42]:
# Filtering out any reviews that aren't complete
useful_keys = ['overall', 'reviewText', 'summary']
reviews = [review for review in reviews if all(key in review for key in useful_keys)]

# Removing all data fields except useful ones
reviews = [{key: review[key] for key in useful_keys} for review in reviews]


df = pd.DataFrame.from_records(reviews)


#### Downloading the model and it's tokenizer
- BERT is a transformer model that is trained bidirectionally
    - Previous transformer models were only trained from left to right
- DistilBert is a distilled version of the popular BERT model. 
    - The distilling process allows the model to run 60% faster in exchange for losing 3% accuracy.
    

In [43]:
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")


Downloading (…)solve/main/vocab.txt: 0.00B [00:00, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

#### Applying the model to samples of the data set
- Only a sample of the data set is analyzed in order to save resources

In [87]:
def analyze_review(review, print_response=True):
    inputs = tokenizer(review['reviewText'], return_tensors="pt", max_length=512, truncation=True)
    with torch.no_grad():
        logits = model(**inputs).logits

    predicted_class_id = logits.argmax().item()
    predicted_sentiment = model.config.id2label[predicted_class_id]
    
    if print_response:
        print(f"\nCustomer Rating: {review['overall']}")
        print(f"Predicted Sentiment: {predicted_sentiment}")
        print(f"Review Text: {review['reviewText']}\n")

    return predicted_sentiment


In [53]:
analyze_review(df.iloc[1], print_response=True)

Review: The game itself worked great but the story line videos would never play, the sound was fine but the picture would freeze and go black every time.
Customer rating: 3.0
Sentiment rating: NEGATIVE


'NEGATIVE'

In [89]:
sample_size = 10
samples = df.sample(sample_size)
for row in samples.iterrows():
    analyze_review(row[1])
    print(u'\u2500' * 25)
    


Customer Rating: 1.0
Predicted Sentiment: NEGATIVE
Review Text: The charging port was broken

─────────────────────────

Customer Rating: 5.0
Predicted Sentiment: POSITIVE
Review Text: I actually like this one more than the power adapter that came with my Wii.  This has an indicator light that lets me know if it is plugged in.  It also has a Velcro strap for the cord that is very convenient.  The Wii has not had any problem with this third-party power adapter; it works completely normal.  I would highly recommend this, even over the one that comes with the Wii.

─────────────────────────

Customer Rating: 3.0
Predicted Sentiment: NEGATIVE
Review Text: I have played plenty of games in my days and this game has some pros and cons.

PROS:
Beautiful graphics. It is like I am in a movie where I am in control. Especially the water. Crystal clear, seems as if I am really swimming throughout the deep blue sea. Acres of grass, city scape is breath taking. I literally will just walk around and 

Citation for the dataset:
- Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering R. He, J. McAuley WWW, 2016
- image-based recommendations on styles and substitutes J. McAuley, C. Targett, J. Shi, A. van den Hengel SIGIR, 2015

Citation for DistilBert model:
- Victor Sanh, Lysandre Debut, Julien Chaumond, & Thomas Wolf (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108.


