# Sentiment Analysis with BERT<div class="tocSkip">
    
&copy; Jens Albrecht, 2021
    
This notebook can be freely copied and modified.  
Attribution, however, is highly appreciated.

<hr/>

See also: 

Albrecht, Ramachandran, Winkler: **Blueprints for Text Analytics in Python** (O'Reilly 2020)  
Chapter 11: [Performing Sentiment Analysis on Text Data](https://learning.oreilly.com/library/view/blueprints-for-text/9781492074076/ch11.html#ch-sentiment) + [Link to Github](https://github.com/blueprints-for-text-analytics-python/blueprints-text/blob/master/README.md)

## Setup<div class='tocSkip'/>

Set directory locations. If working on Google Colab: copy files and install required libraries.

In [18]:
import sys, os
ON_COLAB = 'google.colab' in sys.modules

if ON_COLAB:
    GIT_ROOT = 'https://github.com/jsalbr/tdwi-2021-text-mining/raw/master'
    os.system(f'wget {GIT_ROOT}/notebooks/setup.py')

%run -i setup.py

You are working on a local system.
Files will be searched relative to "..".


## Load Python Settings<div class="tocSkip"/>

Common imports, defaults for formatting in Matplotlib, Pandas etc.

In [19]:
%run "$BASE_DIR/notebooks/settings.py"

%reload_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'png'

# to print output of all statements and not just the last
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# otherwise text between $ signs will be interpreted as formula and printed in italic
pd.set_option('display.html.use_mathjax', False)
pd.options.plotting.backend = "matplotlib"

# path to import blueprints packages
sys.path.append('./packages')

## Sentiment Analysis Using Huggingface Transformers

Links: 
  * [Transformers Library from Hugging Face](https://huggingface.co/transformers)
  * [Transformers Quick Tour](https://huggingface.co/transformers/quicktour.html)

### Load Data

In [4]:
df = pd.read_csv(f"{BASE_DIR}/data/reddit-autos-selfposts-prepared.csv", sep=";", decimal=".")

len(df)

24712

### Load a Model for Sentiment Analysis

For a list of models see [Hugging Face Model Hub](https://huggingface.co/models).

Model download takes a moment ...

It's stored in `~/.cache/huggingface/transformers` (see [Huggingface documentation](https://huggingface.co/docs/datasets/installation.html#caching-datasets-and-metrics)).

In [20]:
from transformers import pipeline

# classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")

In [21]:
classifier.model

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(105879, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elemen

This model was trained on product reviews in five languages. Predicts ratings from 1 to 5 stars.

In [23]:
sents = [
  'We are very happy to show you the 🤗 Transformers library.',
  'The weather today is not really what I expected.'
]

classifier(sents)

[{'label': '5 stars', 'score': 0.772534966468811},
 {'label': '3 stars', 'score': 0.6660415530204773}]

Predict sentiments on BMW:

In [35]:
df.subreddit.value_counts()

motorcycles      5654
AskMechanics     2713
teslamotors      2515
BMW              2303
Audi             2053
Honda            1957
Volkswagen       1657
Hyundai          1433
mercedes_benz    1145
Toyota           1145
Harley           1101
Volvo            1036
Name: subreddit, dtype: int64

In [40]:
df[
    df['lemmas'].str.lower().str.contains('charge')][['text']]

Unnamed: 0,text
75,"I have posted a similar question but just want to be clear about best charging practices.: I do not drive a lot. I have a Gen 3 Wall Connector at home for my Model 3 LR. I charged it a ""full"" 80% about 5 days ago and still have 201 miles of range on it. Should I leave it plugged in all the time even if I'm not driving that day or wait until it gets down to 20%? One Redditor advised it is good to wait and charge when it gets to 20% for data analysis purposes so my range predictions will be more accurate. For an idea of the amount of driving I do. We bought it on 12/19. The drive home after buying was 118 miles. The car currently has 334 miles on it, so I'm averaging about 108 miles a week."
115,"Tesla and quality control before shipping: Why there is no proper quality control in factory? Probably 90% of users in this sub complain about quality and issues they had to fix after delivery, every YouTuber making fun of Tesla because of their panel gaps and fit in general and it takes forever to fix this issues because of even worse customer support. I mean, how hard it is to hire extra people to check fit and finish inside factory before shipping? * It will save tons of money to Tesla compared to after delivery fixes * Less calls to customer support and they might improve support for people with more serious issues * More time for service center and their technicians to fix more serious issues and faster turnaround * Way less bad reviews because of stupid panel gaps. Reason for posting this is that Ford is pushing hard Mach-E to YouTubers for reviews and every Mach-E review I have seen, says that Mach-E is better because of fit and finish, there's literally no other benefit to get Mach-E but when people pay $50k+, this details matter. Even Model Y they provided to MKBHD, had freaking panel gaps. How dumb person in charge for delivering car to MKBHD was to ignore this kind of issues? Am I missing something or why Tesla prefers after delivery fixes over fixing this minor details in factory? Are they willing to spend more in after delivery fixes and risk negative reviews just to increase ""delivery numbers""? It's really weird move from company that doesn't spend any money in marketing and it sales literally depends on good reviews... And I haven't seen any Tesla reviews or other EV comparison to Tesla in which reviewer doesn't mention Teslas terrible support and fit and finish."
119,"Still getting charged for supercharging?: I took delivery of my 2021 LR Model 3 on the 30th. I had gotten a text saying I would receive the free supercharging for a year, but it looks like I'm still getting charged. Is there anyone else having the same issue?"
133,"N63 high miles, help me find a solution.: I have a high mileage f10 550i xdrive it's a 2011. The n63 as we all know performs best on paper. Now I've been looking into numerous things I can do to extend the life of my car without actually replacing it being that it's paid off. First solution would of course be doing the labor intensive valve stem seal job. Which I was quoted from a local indy mechanic $3000. Which is actually a bit more reasonable than getting it done at bmw which would probably charge me $4000-5000 more Next would be getting the crate motor from bmw costing 3500 which is (again) reasonable but then to add new turbos to that and other new parts (I'm spending a lot before labor) Next would be that I could just stick with what I have and essentially do a rebuild. (Again a lot of money) Now I know all of these are costly, but I'm pretty torn on what route I should take. I take care of the car it runs as well as it could for having tired turbos and bad valve stem seals. The car again has been paid off for quite some time too. Any n63 owners that can give me any advice on what they've done outside of the bmw ccp?"
136,Why is there a fee to stay on a supercharging spot: Like if you are eating at the local restorant you will have to rush to your car to move it before ot gets to expensive
...,...
24658,"Battery Usage Meter: I don't currently own a Tesla vehicle (yet), so maybe this already exists. Is there any kind of meter that keeps track of how much energy has flown through the battery pack? This would be a very useful gauge of the life expectancy of one of the most critical (and expensive) components. An odometer is nice for ICEs but for EVs, it would be nice to know how many watt hours my new or used battery pack has used. This will be a necessary (opinion) feature if they ever roll out battery to grid or home powering like the Ford Lightning. This will also be needed for already announced features of the CT like 240v power output as in charging your camper/tiny home as Elon already disclosed."
24660,"My 1998 Honda Valkyrie will not crank: I replaced the starter, which was bad, however, I thought my battery was weak so I used my charger on start mode (100amps). Which blew the main fuse and fusible link. I also replaced the starter switch. After replacing the blown components I installed a fully charged battery. When I press the starter button it dims the headlamp and will crank or make the clicking noises. The bike is up on a stand, kick stand is up, kill switch is in run position. Should I look into the starter or ignition switch's. I would like some suggestions on where to investigate next."
24662,"Suspend Charging Cables From Trolley Wires To Increase Accessibility: There are no dedicated EV parking spots that can be blocked. All you need to do to charge up is to be within 30 50 m of a charging tower or trolley line and you plug in where ever you can find an empty parking spot. There are at least 2 ways this can be done, first, simply by hanging all the charging cables from a single high mast, second by having a trolley line so the cables can be used at the other side of the parking lot."
24695,100% at a certain time: I'm doing a road trip on Saturday morning. I'd like the car to be at 100% before we leave but for as short a time as possible. In the past I'd plug the car in the night before see how long it said it would take to 100% from 80% and stop charging. I'd set an alarm at that time and start charging at the alarm. I also tried scheduled charging / departure but that charged to 100% right away and just preconditioned the car. What's the best way to charge to 100% but scheduled so it gets to 100% right before we leave?


Look for token 'charge' in subreddit 'teslamotors' and exclude questions ('?').

In [62]:
pd.set_option('max_colwidth', 3000)

senti_df = df[
    (df['lemmas'].str.len() < 400) &
    df['lemmas'].str.lower().str.contains('charge') &
    (~df['text'].str.contains('\?')) &
    (df['subreddit']=='teslamotors')][['text']].sample(20)
senti_df.reset_index(inplace=True)

senti_df

Unnamed: 0,index,text
0,13864,"Follow up to the prefabricated superchargers: So I posted about the prefabricated chargers the other day. This is a sweet new setup. Construction started less than 3 weeks ago, and the pedestals arrived 2 days ago. The site is now live. If Tesla can continue to build out the supercharger network at this rate, we will see some serious improvements. I think the biggest holdup is going to be permits."
1,6813,"Tesla has a Supercharger problem: When I get bored I go charge at the supercharger. The one near me currently has 3 stalls out of order, and people try to constantly pull into them and don't know what to do when it doesn't work. I constantly have to get out and tell people which ones work. There has to be a better way. I'm thinking maybe somehow a warning could pop up if you pull in using weak Bluetooth signals."
2,1097,Airbnb type charger sharing: View Poll
3,12455,"PSA: Double-check Scheduled Departure / Off-Peak Charging: Public service announcement that a recent software update (maybe 2021.4.12.6) reset my scheduled departure and off-peak charging hours. I noticed my car was immediately charging itself during the day when I plugged it in. If you take advantage of Time-of-Use rates, this can be a significant financial hit!"
4,4171,Driving to Taos: I'll be driving to Taos from the Bay Area this week and was wondering if I could park my model 3 at anyone's house about 30 minutes out to avoid having to drive it in the snow. I'll have a friend pick me up the moment I arrive and will not need to charge or make any contact... Unless a beer would be in order. Thanks!
5,6106,"At home charger: Sorry if this is a question that has been asked but I'm really interested in getting a tesla, but I currently don't have the capability to have an at home charging station put in. Do you guys feel it's 100% necessary to have at home charging or is using charging stations on the road sufficient enough. Thanks in advance."
6,4031,"Airbnb EV charger filter: For some reason Airbnb does not have EV charger filter in the search UI. However, this amenity exists and hosts can mark that they have an ev charging device. Here with one click you can see all listings with EV charger nearby _URL_"
7,6928,"Charging for free in Texas.: Charging free in Texas currently. If you can find a powered supercharger. You're doing well today. I charged Sunday started at one today and it lost power after I added 8kW. I found another and I am currently charging. I've not yet been billed for Sunday or earlier today. My current session is $0.00. Is Tesla doing a ""State of Emergency"" good will thing."
8,10352,Charging in garage or general: Prospective owner here with a question. My main fear of getting a Tesla or any EV for that matter is it's a question of when not if that my wife/kids gets in and tries to drive off without unplugging the charger. Please tell me there is some safety feature or interlock that prevents movement while connected to the charger.
9,393,"Quadlock case with new charging pad TM3/Y: Hello everyone, I'm waiting just to get some job stability to buy my first Tesla, either 3 or Y, haven't decided. I have a motorcycle and use a Quadlock to mount my phone onto my handlebar. It's a nifty system but the case has a small ridge on the back, which could be too thick for the new wireless pads and I was wondering if anyone has any experience with a Quadlock case on the new 3/Y wireless charging pads. TIA and happy new year."


Add prediction:

In [63]:
senti_df.join(pd.DataFrame(classifier(list(senti_df['text'].str.lower()))))

Unnamed: 0,index,text,label,score
0,13864,"Follow up to the prefabricated superchargers: So I posted about the prefabricated chargers the other day. This is a sweet new setup. Construction started less than 3 weeks ago, and the pedestals arrived 2 days ago. The site is now live. If Tesla can continue to build out the supercharger network at this rate, we will see some serious improvements. I think the biggest holdup is going to be permits.",4 stars,0.46
1,6813,"Tesla has a Supercharger problem: When I get bored I go charge at the supercharger. The one near me currently has 3 stalls out of order, and people try to constantly pull into them and don't know what to do when it doesn't work. I constantly have to get out and tell people which ones work. There has to be a better way. I'm thinking maybe somehow a warning could pop up if you pull in using weak Bluetooth signals.",2 stars,0.48
2,1097,Airbnb type charger sharing: View Poll,4 stars,0.33
3,12455,"PSA: Double-check Scheduled Departure / Off-Peak Charging: Public service announcement that a recent software update (maybe 2021.4.12.6) reset my scheduled departure and off-peak charging hours. I noticed my car was immediately charging itself during the day when I plugged it in. If you take advantage of Time-of-Use rates, this can be a significant financial hit!",1 star,0.25
4,4171,Driving to Taos: I'll be driving to Taos from the Bay Area this week and was wondering if I could park my model 3 at anyone's house about 30 minutes out to avoid having to drive it in the snow. I'll have a friend pick me up the moment I arrive and will not need to charge or make any contact... Unless a beer would be in order. Thanks!,5 stars,0.45
5,6106,"At home charger: Sorry if this is a question that has been asked but I'm really interested in getting a tesla, but I currently don't have the capability to have an at home charging station put in. Do you guys feel it's 100% necessary to have at home charging or is using charging stations on the road sufficient enough. Thanks in advance.",3 stars,0.43
6,4031,"Airbnb EV charger filter: For some reason Airbnb does not have EV charger filter in the search UI. However, this amenity exists and hosts can mark that they have an ev charging device. Here with one click you can see all listings with EV charger nearby _URL_",4 stars,0.32
7,6928,"Charging for free in Texas.: Charging free in Texas currently. If you can find a powered supercharger. You're doing well today. I charged Sunday started at one today and it lost power after I added 8kW. I found another and I am currently charging. I've not yet been billed for Sunday or earlier today. My current session is $0.00. Is Tesla doing a ""State of Emergency"" good will thing.",1 star,0.25
8,10352,Charging in garage or general: Prospective owner here with a question. My main fear of getting a Tesla or any EV for that matter is it's a question of when not if that my wife/kids gets in and tries to drive off without unplugging the charger. Please tell me there is some safety feature or interlock that prevents movement while connected to the charger.,3 stars,0.34
9,393,"Quadlock case with new charging pad TM3/Y: Hello everyone, I'm waiting just to get some job stability to buy my first Tesla, either 3 or Y, haven't decided. I have a motorcycle and use a Quadlock to mount my phone onto my handlebar. It's a nifty system but the case has a small ridge on the back, which could be too thick for the new wireless pads and I was wondering if anyone has any experience with a Quadlock case on the new 3/Y wireless charging pads. TIA and happy new year.",4 stars,0.44


# Question Answering

Training based on Stanford Question Answering Dataset (SQuAD 2.0).  

See
  * https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/European_Union_law.html
  * [Huggingface documentation for QA](https://huggingface.co/transformers/usage.html#extractive-question-answering)

In [16]:
from transformers import pipeline

qa_model = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
"""

question = "What is extractive question answering?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

question = "What is a good example of a question answering dataset?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

Q: What is extractive question answering?
A: the task of extracting an answer from a text given a question (confidence: 0.62)

Q: What is a good example of a question answering dataset?
A: SQuAD dataset (confidence: 0.51)



Examples from [Game of Thrones Wiki](https://gameofthrones.fandom.com/wiki):

In [17]:
from transformers import pipeline

qa_model = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

context = """
Bran is the fourth child and second son of Lady Catelyn and Lord Ned
Stark. Ned is the head of House Stark, Lord Paramount of the North,
and Warden of the North to King Robert Baratheon. The North is one of
the constituent regions of the Seven Kingdoms and House Stark is one
of the Great Houses of the realm. House Stark rules the region from
their seat of Winterfell.

Winterfell is the capital of the Kingdom of the North and the seat and 
the ancestral home of the royal House Stark. It is a very large castle 
located at the center of the North, from where the head of House Stark 
rules over his or her people. """

question = "Who is Bran?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

question = "What is Winterfell?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

question = "Where is Winterfell located?"
answer = qa_model(question=question, context=context)
print("Q:", question)
print("A:", answer['answer'], f"(confidence: {answer['score']:.2f})\n")

Q: Who is Bran?
A: the fourth child and second son of Lady Catelyn and Lord Ned
Stark (confidence: 0.66)

Q: What is Winterfell?
A: the capital of the Kingdom of the North (confidence: 0.53)

Q: Where is Winterfell located?
A: the center of the North (confidence: 0.37)

