# Sentiment Analysis on Yelp dataset

> NOTE: this particular notebook require GPU, as it uses transformers

## Loading libraries


In [1]:
import pandas as pd
import re
import spacy
from sklearn.metrics import classification_report


## Loading the dataset


In [2]:
IO_TRAIN = "../input/yelp-review-dataset/yelp_review_polarity_csv/train.csv"
ylp = pd.read_csv(IO_TRAIN, header=None)
ylp.columns = ["sentiment", "review"]
ylp["review"] = ylp["review"].apply(lambda rev: re.sub(r"\\n", "\n", rev))
ylp.replace({1: "NEG", 2: "POS"}, inplace=True)
ylp["sentiment"] = ylp["sentiment"].astype("category")
ylp.head()


Unnamed: 0,sentiment,review
0,NEG,"Unfortunately, the frustration of being Dr. Go..."
1,POS,Been going to Dr. Goldberg for over 10 years. ...
2,NEG,I don't know what Dr. Goldberg was like before...
3,NEG,I'm writing this review to give you a heads up...
4,POS,All the food is great here. But the best thing...


In [3]:
nlp = spacy.load("en_core_web_lg")


In [4]:
!cp ../input/yelp-sent-analysis-preprocess/* ./


## Using a pre-trained model

we had our [benchmark](./03.benchmark.ipynb) model (unigram Naive Bayes that scores at $0.875$), and tried other classical models:

- [bigram Naive Bayes](./04.bigram-naive-bayes.ipynb) (at $0.9$)
- [Logistic Regression](./06.classic-ml.ipynb) (at $0.893$)
- [SVM classifier](./06.classic-ml.ipynb) (at $0.891$)

and it can be concluded that: classic models work in the range of $90\%$ accuracy, if we are to find better results, then we should check the next level, Deep Learning.

Let's start small, and try a pre-trained model.

using spaCy pre-trained model [spacytextblob](https://spacy.io/universe/project/spacy-textblob) for sentiment analysis


In [5]:
!python3 -m pip install spacytextblob


Collecting spacytextblob
  Downloading spacytextblob-4.0.0-py3-none-any.whl (4.5 kB)
Collecting textblob<0.16.0,>=0.15.3
  Downloading textblob-0.15.3-py2.py3-none-any.whl (636 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m636.5/636.5 kB[0m [31m869.6 kB/s[0m eta [36m0:00:00[0m
Collecting typing-extensions<4.0.0.0,>=3.7.4
  Downloading typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Installing collected packages: typing-extensions, textblob, spacytextblob
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.2.0
    Uninstalling typing_extensions-4.2.0:
      Successfully uninstalled typing_extensions-4.2.0
  Attempting uninstall: textblob
    Found existing installation: textblob 0.17.1
    Uninstalling textblob-0.17.1:
      Successfully uninstalled textblob-0.17.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the sour

In [6]:
from spacytextblob.spacytextblob import SpacyTextBlob

_ = nlp.disable_pipes(nlp.pipe_names)
_ = nlp.add_pipe("spacytextblob")


In [7]:
def predict(doc: str) -> str:
    """given a document string, after transforming it using spacy, the
    document's polarity is checked, and assigned a prediction depending on the
    level, negative polarity is `NEG`, otherwise `POS`

    Parameters:
    -----------
    doc: str
        the document string to predict its sentiment

    Returns:
    --------
    out: str
        the resulting prediction, `NEG` if polarity of document is negative,
        `POS` otherwise
    """
    polarity = nlp(doc)._.blob.polarity
    if polarity < 0:
        return "NEG"
    else:
        return "POS"


In [8]:
y_pred = ylp["review"].apply(predict)

print(classification_report(ylp["sentiment"], y_pred, digits=4))


              precision    recall  f1-score   support

         NEG     0.9434    0.3868    0.5486    280000
         POS     0.6143    0.9768    0.7543    280000

    accuracy                         0.6818    560000
   macro avg     0.7789    0.6818    0.6515    560000
weighted avg     0.7789    0.6818    0.6515    560000



using basic intution, that a negative review would have a negative polarity, and anything else is positive has led to a weak accuracy (f1-score) of $0.68$, whereas the benchmark is better at $0.875$

> A note worth mentioning: this particular model was not trained to be a classifier, but to assign polarity to a given document, and as such, some positive-labelled documents have polarity of $-1$, and other negative-labelled documents have polarity of $1$

---

a simple next step is to used some `FCNN` (Fully Connected Neural Network), but I'd put that in TODO, and jump right to the big guns: **TRANSFORMERS**


In [None]:
!python3 -m pip install datasets transformers[sentencepiece]


[0m

## Using transformers

First, let's try one more time to use the pre-trained model as is, then we can try to have it fine-tuned to the task

a small reconfiguration of the labels to meet the format of the transformer


In [None]:
features = ylp["review"].values.tolist()

# a small reconfiguration of the labels to meet the format of the transformer
ylp["sentiment"].replace({"POS": "POSITIVE", "NEG": "NEGATIVE"}, inplace=True)
labels = ylp["sentiment"].values


In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", device=0)


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [None]:
predictions = classifier(features, truncation=True)
predictions = pd.DataFrame(predictions)["label"].values

print(classification_report(labels, predictions, digits=4))


              precision    recall  f1-score   support

    NEGATIVE     0.8996    0.9167    0.9080    280000
    POSITIVE     0.9151    0.8977    0.9063    280000

    accuracy                         0.9072    560000
   macro avg     0.9073    0.9072    0.9072    560000
weighted avg     0.9073    0.9072    0.9072    560000



the pre-trained model already got $0.91$ accuracy score, the highest yet, by $0.014\%$ above bigram naive bayes, and that's on the entire training set. perhaps a fine-tuned transformer might be able to score even higher.
