# Sentiment Analysis with BERT<div class="tocSkip">
    
&copy; Jens Albrecht, 2021
    
This notebook can be freely copied and modified.  
Attribution, however, is highly appreciated.

<hr/>

See also: 

Albrecht, Ramachandran, Winkler: **Blueprints for Text Analytics in Python** (O'Reilly 2020)  
Chapter 11: [Performing Sentiment Analysis on Text Data](https://learning.oreilly.com/library/view/blueprints-for-text/9781492074076/ch11.html#ch-sentiment) + [Link to Github](https://github.com/blueprints-for-text-analytics-python/blueprints-text/blob/master/README.md)

## Setup<div class='tocSkip'/>

Set directory locations. If working on Google Colab: copy files and install required libraries.

In [3]:
import sys, os
ON_COLAB = 'google.colab' in sys.modules

if ON_COLAB:
    GIT_ROOT = 'https://github.com/jsalbr/tdwi-2021-text-mining/raw/main'
    os.system(f'wget {GIT_ROOT}/notebooks/setup.py')

%run -i setup.py

You are working on a local system.
Files will be searched relative to "..".


## Load Python Settings<div class="tocSkip"/>

Common imports, defaults for formatting in Matplotlib, Pandas etc.

In [4]:
%run "$BASE_DIR/notebooks/settings.py"

%reload_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'png'

# to print output of all statements and not just the last
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# otherwise text between $ signs will be interpreted as formula and printed in italic
pd.set_option('display.html.use_mathjax', False)
pd.options.plotting.backend = "matplotlib"

# path to import blueprints packages
sys.path.append('./packages')

## Sentiment Analysis Using Huggingface Transformers

Links: 
  * [Transformers Library from Hugging Face](https://huggingface.co/transformers)
  * [Transformers Quick Tour](https://huggingface.co/transformers/quicktour.html)

### Load Data

In [5]:
df = pd.read_csv(f"{BASE_DIR}/data/reddit-autos-selfposts-prepared.csv", sep=";", decimal=".")

len(df)

24712

### Load a Model for Sentiment Analysis

For a list of models see [Hugging Face Model Hub](https://huggingface.co/models).

Model download takes a moment ...

It's stored in `~/.cache/huggingface/transformers` (see [Huggingface documentation](https://huggingface.co/docs/datasets/installation.html#caching-datasets-and-metrics)).

In [6]:
from transformers import pipeline

# classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")

In [7]:
classifier.model

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(105879, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elemen

This model was trained on product reviews in five languages. Predicts ratings from 1 to 5 stars.

In [8]:
sents = [
  'We are very happy to show you the 🤗 Transformers library.',
  'The weather today is not really what I expected.'
]

classifier(sents)

[{'label': '5 stars', 'score': 0.772534966468811},
 {'label': '3 stars', 'score': 0.6660415530204773}]

Check sentiment for "charging" in Tesla subreddit.

Look for token 'charge' in subreddit 'teslamotors' and exclude questions ('?').

In [10]:
pd.set_option('max_colwidth', 3000)

senti_df = df[
    (df['lemmas'].str.len() < 400) &
    df['lemmas'].str.lower().str.contains('charge') &
    (~df['text'].str.contains('\?')) &
    (df['subreddit']=='teslamotors')][['text']].sample(20)
senti_df.reset_index(inplace=True)

senti_df

Unnamed: 0,index,text
0,1723,"Tesla Should Inventivise Good Driving Engagement: Having logged many miles, both AP and non-AP, it would be nice to get rewards along with the penalties for driving well. In some fashion, grant supercharging credits when a driver completes a streak of not having to be reminded to touch the wheel and/or without the car beeping at you while in AP. Maybe even add a Zelda-style 3 hearts somewhere and you lose one every time you don't touch the wheel in time."
1,17133,"Car won't recognize charger - Model 3 issue: Having a lot of trouble with my Model 3 at the moment. It won't charge and the feedback I get on the touchscreen is something like this: **Charging equipment not recognized.** I have tried all the resets, but no results. Won't be able to drive far until I fix this issue. Would very much appreciate any sort of feedback or any suggestions so I can charge my car... *- and yes it is the car that is the problem! Not the charger. No red lights either.*"
2,4679,"Why isn't Tesla offering a Model X plaid+ with longer range: I have been searching and I can not find this discussed anywhere. Why isn't Tesla offering a Plaid+ version of the model X. I imagine it would be very expensive, but still I think a lot of people would still want the extra range, especially if it means not having to deal with kids while supercharging!!"
3,3419,"Tesla direct to customer is the future.: Can you imagine walking into a Chevy dealer, saying I'll pay $110k (estimated MSRP with upgrades) for your shitty C8 Z06 and them saying well.. sir we are going to charge you an additional $15k for your pre-order. This is the world we live in and that's one of the small reasons why tesla is thriving and their direct to customer is the future of car sales. My girlfriend is hell bent on a C8 Z06. For this price we should just get the new Model S!!"
4,4999,"Bad supercharger directions on the New Jersey Turnpike: Tesla supercharger directions on the NJTP need serious help. Tried to take me off the highway, outside the toll area, and to the delivery gate of two rest stops this afternoon when the superchargers were both in the rest stops themselves. Burnt up a few miles and a decent amount of tooth enamel in the process."
5,12455,"PSA: Double-check Scheduled Departure / Off-Peak Charging: Public service announcement that a recent software update (maybe 2021.4.12.6) reset my scheduled departure and off-peak charging hours. I noticed my car was immediately charging itself during the day when I plugged it in. If you take advantage of Time-of-Use rates, this can be a significant financial hit!"
6,22177,"TIL: If you click on the right scroll wheel and say “Open Butthole” it opens the charge port: I'm dead serious that this actually works. I just found out today that this opens the charge port. I will never say anything different to the car in order to open my charge port. I don't think I can even go to ""Charging"" and click ""Open Charge Port"". One must always now say ""Open Butthole"". Once again I'm dead serious... go out to your Tesla and give it a try."
7,393,"Quadlock case with new charging pad TM3/Y: Hello everyone, I'm waiting just to get some job stability to buy my first Tesla, either 3 or Y, haven't decided. I have a motorcycle and use a Quadlock to mount my phone onto my handlebar. It's a nifty system but the case has a small ridge on the back, which could be too thick for the new wireless pads and I was wondering if anyone has any experience with a Quadlock case on the new 3/Y wireless charging pads. TIA and happy new year."
8,18878,"Guidance on costs for Tesla Powerwall and installation: Good people of Reddit. I am house hunting for a modest 3 bedroom 2500 SqFt home in Scottsdale, AZ and would love to go fully green and plan to get powerwalls (and of course chargers for our two Tesla's) If anyone here has experience I would greatly appreciate your experiences and costing for both the equipment and install (as well as issues with supply)"
9,24662,"Suspend Charging Cables From Trolley Wires To Increase Accessibility: There are no dedicated EV parking spots that can be blocked. All you need to do to charge up is to be within 30 50 m of a charging tower or trolley line and you plug in where ever you can find an empty parking spot. There are at least 2 ways this can be done, first, simply by hanging all the charging cables from a single high mast, second by having a trolley line so the cables can be used at the other side of the parking lot."


Add prediction:

In [63]:
senti_df.join(pd.DataFrame(classifier(list(senti_df['text'].str.lower()))))

Unnamed: 0,index,text,label,score
0,13864,"Follow up to the prefabricated superchargers: So I posted about the prefabricated chargers the other day. This is a sweet new setup. Construction started less than 3 weeks ago, and the pedestals arrived 2 days ago. The site is now live. If Tesla can continue to build out the supercharger network at this rate, we will see some serious improvements. I think the biggest holdup is going to be permits.",4 stars,0.46
1,6813,"Tesla has a Supercharger problem: When I get bored I go charge at the supercharger. The one near me currently has 3 stalls out of order, and people try to constantly pull into them and don't know what to do when it doesn't work. I constantly have to get out and tell people which ones work. There has to be a better way. I'm thinking maybe somehow a warning could pop up if you pull in using weak Bluetooth signals.",2 stars,0.48
2,1097,Airbnb type charger sharing: View Poll,4 stars,0.33
3,12455,"PSA: Double-check Scheduled Departure / Off-Peak Charging: Public service announcement that a recent software update (maybe 2021.4.12.6) reset my scheduled departure and off-peak charging hours. I noticed my car was immediately charging itself during the day when I plugged it in. If you take advantage of Time-of-Use rates, this can be a significant financial hit!",1 star,0.25
4,4171,Driving to Taos: I'll be driving to Taos from the Bay Area this week and was wondering if I could park my model 3 at anyone's house about 30 minutes out to avoid having to drive it in the snow. I'll have a friend pick me up the moment I arrive and will not need to charge or make any contact... Unless a beer would be in order. Thanks!,5 stars,0.45
5,6106,"At home charger: Sorry if this is a question that has been asked but I'm really interested in getting a tesla, but I currently don't have the capability to have an at home charging station put in. Do you guys feel it's 100% necessary to have at home charging or is using charging stations on the road sufficient enough. Thanks in advance.",3 stars,0.43
6,4031,"Airbnb EV charger filter: For some reason Airbnb does not have EV charger filter in the search UI. However, this amenity exists and hosts can mark that they have an ev charging device. Here with one click you can see all listings with EV charger nearby _URL_",4 stars,0.32
7,6928,"Charging for free in Texas.: Charging free in Texas currently. If you can find a powered supercharger. You're doing well today. I charged Sunday started at one today and it lost power after I added 8kW. I found another and I am currently charging. I've not yet been billed for Sunday or earlier today. My current session is $0.00. Is Tesla doing a ""State of Emergency"" good will thing.",1 star,0.25
8,10352,Charging in garage or general: Prospective owner here with a question. My main fear of getting a Tesla or any EV for that matter is it's a question of when not if that my wife/kids gets in and tries to drive off without unplugging the charger. Please tell me there is some safety feature or interlock that prevents movement while connected to the charger.,3 stars,0.34
9,393,"Quadlock case with new charging pad TM3/Y: Hello everyone, I'm waiting just to get some job stability to buy my first Tesla, either 3 or Y, haven't decided. I have a motorcycle and use a Quadlock to mount my phone onto my handlebar. It's a nifty system but the case has a small ridge on the back, which could be too thick for the new wireless pads and I was wondering if anyone has any experience with a Quadlock case on the new 3/Y wireless charging pads. TIA and happy new year.",4 stars,0.44


## Question Answering

Training based on Stanford Question Answering Dataset (SQuAD 2.0).  

See
  * https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/European_Union_law.html
  * [Huggingface documentation for QA](https://huggingface.co/transformers/usage.html#extractive-question-answering)

In [11]:
from transformers import pipeline

qa_model = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
"""

question = "What is extractive question answering?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

question = "What is a good example of a question answering dataset?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

Q: What is extractive question answering?
A: the task of extracting an answer from a text given a question (confidence: 0.62)

Q: What is a good example of a question answering dataset?
A: SQuAD dataset (confidence: 0.51)



Examples from [Game of Thrones Wiki](https://gameofthrones.fandom.com/wiki):

In [12]:
from transformers import pipeline

qa_model = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

context = """
Bran is the fourth child and second son of Lady Catelyn and Lord Ned
Stark. Ned is the head of House Stark, Lord Paramount of the North,
and Warden of the North to King Robert Baratheon. The North is one of
the constituent regions of the Seven Kingdoms and House Stark is one
of the Great Houses of the realm. House Stark rules the region from
their seat of Winterfell.

Winterfell is the capital of the Kingdom of the North and the seat and 
the ancestral home of the royal House Stark. It is a very large castle 
located at the center of the North, from where the head of House Stark 
rules over his or her people. """

question = "Who is Bran?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

question = "What is Winterfell?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

question = "Where is Winterfell located?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

Q: Who is Bran?
A: the fourth child and second son of Lady Catelyn and Lord Ned
Stark (confidence: 0.66)

Q: What is Winterfell?
A: the capital of the Kingdom of the North (confidence: 0.53)

Q: Where is Winterfell located?
A: the center of the North (confidence: 0.37)

