# BERT Tutorial: Using Sentence-Transformer BERT

#### **<ins>Version:</ins>**
@March 2023 / Quek Jing Hao

#### _<ins>**Objective</ins>:**_ 

Learn to extract sentence vectors from BERTModel in the sentence-transformer package. Use different machine learning classifiers on the feature vectors.


#### **<ins>Introduction:</ins>**

As seen from the bertmodel.ipynb tutorial, we can access the sentence vectors, called _last_hidden_state_ in BERTModel. However, because working with the model is extremely computationally intensive, we would require the use of GPUs. The question now becomes, what if you do not have access to GPUs? Is there a more lightweight approach to the problem? Well there is. In this tutorial, we will discuss the usage of sentence-transformer and the Bert model within it. It is more agile and do not require the use of GPUs.

In this repository, we have see how to work with BERT: using BERTmodel in the transformer package, using the powerful BertForSequenceClassification and lastly, using the sentence-transformer package. We will explore this last method in this notebook. 

This notebook is self-contained, and you do not need to download the dataset together with this notebook in the same directory.

### Environment Configuration

First, we need to set up the environment in Google Colab - we need to download the libraries that is not available in Colab

In [1]:
%%capture
!pip install fastBPE sacremoses subword_nmt sentencepiece
!pip install transformers
!pip install -U sentence-transformers

In [2]:
# import modules and dependencies
import numpy as np
import pandas as pd
import re
import torch

from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics import roc_auc_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix

from sentence_transformers import SentenceTransformer

pd.set_option('display.max_colwidth', 1000)

### Read Dataset

In this tutorial, we will use the sampled IMDB movies dataset. The original dataset consist of 50K movie reviews, each review has a sentiment - tagged positive or negative. 

Learn more about the dataset here https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

The dataset will be downloaded from my Github's Sample Datasets repository.

In [3]:
!git clone https://github.com/QuekJingHao/imdb-sample-dataset.git

Cloning into 'imdb-sample-dataset'...
remote: Enumerating objects: 6, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 6 (delta 0), reused 3 (delta 0), pack-reused 0[K
Unpacking objects: 100% (6/6), 6.39 MiB | 7.79 MiB/s, done.


In [50]:
df = pd.read_csv('/content/imdb-sample-dataset/imdb_sample.csv')
df.head(2)

Unnamed: 0.1,Unnamed: 0,review,sentiment
0,29430,"I love most movies and I'm a big fan of Sean Bean so I thought that I would at least LIKE this movie. Also, I'm Canadian and this is a mostly-Canadian movie so I was prepared to cut it some serious slack. Nothing could have prepared me for the garbage that is ""Airborne"". Steve Guttenberg as an action hero? Give me a break. The acting throughout the movie was so bad I am going to have trouble sleeping tonight. I now have only two wishes in my life.<br /><br />1. I hope that you never have to sit through this movie. 2. I wish I could get those 6 hours back. Oh wait, the movie's under 2 hours - it only seemed like 6 hours...<br /><br />Don't watch this. Seriously.",negative
1,27750,"A film that tends to get buried under prejudice and preconception - It's a remake! Doris Day is in it! She sings! - Hitchcock's second crack at 'The Man Who Knew Too Much' is his most under-rated film, and arguably a fully fledged masterpiece in its own right.<br /><br />This is, in more ways than one, Doris Day's film. Not only does she give the finest performance of her career, more than holding her own against James Stewart, but the whole film is subtly structured around her character rather than his. This is, after all, a film in which music is both motif and plot device. What better casting than the most popular singer of her generation? Consider: Day's Jo McKenna has given up her career on the stage in order to settle down with her husband and raise their son. This seems to be a mutual decision, and she doesn't appear to be unhappy. But look at the way Stewart teases her in the horse-drawn carriage over her concerns about Louis Bernard, implying that she is jealous that Berna...",positive


The next few steps will be similar to the bertmodel.ipynb tutorial. We will do some basic data cleaning and processing to make the dataframe easier to work with.

### Data Cleaning and Sampling

Usually text data is never clean. This is especially so for comments, tweets, reviews. Unless the text you are dealing with is from an authoritative source, text data typically contains lexical, grammatical and spelling erros. So, depending on the situation or the use case, the user has to perform several data cleaning steps to remove say special characters etc. Just as an example, we will defne a text processing function to perform the following actions:

1. Remove special characters (e.g. line breaks)
2. Lowercase all words in the reviews and remove all starting and sending white spaces

In [51]:
def text_processing(text):
    
    remove_breaks = r"<br />"

    text_rtn = re.sub(remove_breaks, ' ', text)

    return text_rtn.lower().strip()

In [52]:
df['clean review'] = df['review'].apply(text_processing)
print(len(df))
df.head(2)

12500


Unnamed: 0.1,Unnamed: 0,review,sentiment,clean review
0,29430,"I love most movies and I'm a big fan of Sean Bean so I thought that I would at least LIKE this movie. Also, I'm Canadian and this is a mostly-Canadian movie so I was prepared to cut it some serious slack. Nothing could have prepared me for the garbage that is ""Airborne"". Steve Guttenberg as an action hero? Give me a break. The acting throughout the movie was so bad I am going to have trouble sleeping tonight. I now have only two wishes in my life.<br /><br />1. I hope that you never have to sit through this movie. 2. I wish I could get those 6 hours back. Oh wait, the movie's under 2 hours - it only seemed like 6 hours...<br /><br />Don't watch this. Seriously.",negative,"i love most movies and i'm a big fan of sean bean so i thought that i would at least like this movie. also, i'm canadian and this is a mostly-canadian movie so i was prepared to cut it some serious slack. nothing could have prepared me for the garbage that is ""airborne"". steve guttenberg as an action hero? give me a break. the acting throughout the movie was so bad i am going to have trouble sleeping tonight. i now have only two wishes in my life. 1. i hope that you never have to sit through this movie. 2. i wish i could get those 6 hours back. oh wait, the movie's under 2 hours - it only seemed like 6 hours... don't watch this. seriously."
1,27750,"A film that tends to get buried under prejudice and preconception - It's a remake! Doris Day is in it! She sings! - Hitchcock's second crack at 'The Man Who Knew Too Much' is his most under-rated film, and arguably a fully fledged masterpiece in its own right.<br /><br />This is, in more ways than one, Doris Day's film. Not only does she give the finest performance of her career, more than holding her own against James Stewart, but the whole film is subtly structured around her character rather than his. This is, after all, a film in which music is both motif and plot device. What better casting than the most popular singer of her generation? Consider: Day's Jo McKenna has given up her career on the stage in order to settle down with her husband and raise their son. This seems to be a mutual decision, and she doesn't appear to be unhappy. But look at the way Stewart teases her in the horse-drawn carriage over her concerns about Louis Bernard, implying that she is jealous that Berna...",positive,"a film that tends to get buried under prejudice and preconception - it's a remake! doris day is in it! she sings! - hitchcock's second crack at 'the man who knew too much' is his most under-rated film, and arguably a fully fledged masterpiece in its own right. this is, in more ways than one, doris day's film. not only does she give the finest performance of her career, more than holding her own against james stewart, but the whole film is subtly structured around her character rather than his. this is, after all, a film in which music is both motif and plot device. what better casting than the most popular singer of her generation? consider: day's jo mckenna has given up her career on the stage in order to settle down with her husband and raise their son. this seems to be a mutual decision, and she doesn't appear to be unhappy. but look at the way stewart teases her in the horse-drawn carriage over her concerns about louis bernard, implying that she is jealous that bernard wasn't ..."


Furthermore, we change the mapping of the sentiments by encoding from text to integers as follwows:
- positive : 1
- negative : 0

To save computational time, we will only pick a sample of 1500 reviews out of the 15000 entire dataset

In [55]:
df['sentiment'] = df['sentiment'].replace({'positive' : 1, 
                                           'negative' : 0})
df = df.sample(n = 1500, random_state = 54)
df

Unnamed: 0.1,Unnamed: 0,review,sentiment,clean review
9816,9410,"Siskel & Ebert were terrific on this show whether you agreed with them or not because of the genuine conflict their separate professional opinions generated. Roeper took this show down a notch or two because he wasn't really a film critic and because he substituted snide for opinionated. Now, when Ben Lyons comes on I feel like I'm watching ""Teen News"" -- you know, that kids' news show, hosted by kids for kids? Manckiewitz is not much better. It's obvious they've encountered only a steady diet of mainstream films their entire lives. The idea that these two rank amateurs have anything of interest or consequence to say about motion pictures is ludicrous. If they are reviewing a non-formula film, they are completely lost. Show them something original and intelligent -- they just find it ""confusing"". Wait -- I think I get it ... ABC is owned by Disney ... Disney makes movies for kids. While Siskel, Ebert, and Roper promoted independent films and were only hit-or-miss with the big budge...",0,"siskel & ebert were terrific on this show whether you agreed with them or not because of the genuine conflict their separate professional opinions generated. roeper took this show down a notch or two because he wasn't really a film critic and because he substituted snide for opinionated. now, when ben lyons comes on i feel like i'm watching ""teen news"" -- you know, that kids' news show, hosted by kids for kids? manckiewitz is not much better. it's obvious they've encountered only a steady diet of mainstream films their entire lives. the idea that these two rank amateurs have anything of interest or consequence to say about motion pictures is ludicrous. if they are reviewing a non-formula film, they are completely lost. show them something original and intelligent -- they just find it ""confusing"". wait -- i think i get it ... abc is owned by disney ... disney makes movies for kids. while siskel, ebert, and roper promoted independent films and were only hit-or-miss with the big budge..."
472,1299,"The first hour or so of the movie was mostly boring to say the least. However it improved afterwards as the Valentine Party commenced. Apart from the twist as to the identity of the killer in the very end, the hot bath murder scene was one of the few relatively memorable aspects of this movie. The scene at the garden with Kate was well shot and so was the very last scene (the 'twist'). In those scenes, there was some genuine suspense and thrills and the hot bath murder scene had a nasty (the way slashers should be) edge to it. The earlier murders are frustratingly devoid of gore.",1,"the first hour or so of the movie was mostly boring to say the least. however it improved afterwards as the valentine party commenced. apart from the twist as to the identity of the killer in the very end, the hot bath murder scene was one of the few relatively memorable aspects of this movie. the scene at the garden with kate was well shot and so was the very last scene (the 'twist'). in those scenes, there was some genuine suspense and thrills and the hot bath murder scene had a nasty (the way slashers should be) edge to it. the earlier murders are frustratingly devoid of gore."
277,14521,"Surprisingly effective British drama about two very different people who find common ground, and in particular the ""flowering"" of one of them. An embittered, ""Spike""-type youth (McAvoy) with Deuchennes MD is placed in a home for the disabled and quickly makes friends with a youth (Robertson) with cerebral palsy. Robertson has never known anything outside of the home, but McAvoy has and he is bound and determined to get back into the real world. Together, they manage to do just that in this funny and heartwarming and often heartbreaking tale of inner strength overcoming physical shortcomings. The two leads are terrific, especially Robertson, who must surely have spent some time studying the disabled to pull off this tricky role. He appears in almost every scene, and acts up an absolute storm. To anyone who doesn't know, they might think he really has CB. Highly recommended.",1,"surprisingly effective british drama about two very different people who find common ground, and in particular the ""flowering"" of one of them. an embittered, ""spike""-type youth (mcavoy) with deuchennes md is placed in a home for the disabled and quickly makes friends with a youth (robertson) with cerebral palsy. robertson has never known anything outside of the home, but mcavoy has and he is bound and determined to get back into the real world. together, they manage to do just that in this funny and heartwarming and often heartbreaking tale of inner strength overcoming physical shortcomings. the two leads are terrific, especially robertson, who must surely have spent some time studying the disabled to pull off this tricky role. he appears in almost every scene, and acts up an absolute storm. to anyone who doesn't know, they might think he really has cb. highly recommended."
2737,32980,"***SPOILERS*** ***SPOILERS*** THE CELL / (2000) **** (out of four)<br /><br />""Do you believe there is a part of yourself, deep inside in your mind, with things you don't want other people to see? During a session when I'm inside, I get to see those things.""<br /><br />--Catherine Deane<br /><br />And so do we. One of the most visually stimulating films of the year, ""The Cell"" is a love/hate movie-either you love it or you hate it. I can understand the reasons some people dislike this production. With a story that combines disturbing serial killers with mind-probing, ""The Cell"" is too much for some viewers; others will not understand the complex actions and emotions of the film. I think it's one of the year's most engrossing films.<br /><br />Making his feature film screenwriting debut, Mark Protosevich creates an imaginative world of rich, colorful images and provocative characters. The filmmakers take advantage of every shot. Protosevich conceived ideas for ""The Cell"" in 1993 whe...",1,"***spoilers*** ***spoilers*** the cell / (2000) **** (out of four) ""do you believe there is a part of yourself, deep inside in your mind, with things you don't want other people to see? during a session when i'm inside, i get to see those things."" --catherine deane and so do we. one of the most visually stimulating films of the year, ""the cell"" is a love/hate movie-either you love it or you hate it. i can understand the reasons some people dislike this production. with a story that combines disturbing serial killers with mind-probing, ""the cell"" is too much for some viewers; others will not understand the complex actions and emotions of the film. i think it's one of the year's most engrossing films. making his feature film screenwriting debut, mark protosevich creates an imaginative world of rich, colorful images and provocative characters. the filmmakers take advantage of every shot. protosevich conceived ideas for ""the cell"" in 1993 when he decided to combine two of his major..."
11224,38019,"i saw this movie at the toronto film festival with fairly solid expectations. the movie has a great cast and was closing at the festival so it must be good, right? how wrong i was. <br /><br />i knew we were in trouble when before the film the director was talking about how when he was directing an episode of wiseguy he met an unknown actor named kevin spacey (a director/writer of wiseguy making his feature debut = blah)... well the director/writer of Edison must have some incriminating pictures of kevin spacey killing a homeless man, because i cannot see how he (along with the other actors in the film) would ever agree to be in this disaster. <br /><br />this movie is absolutely appalling! it's a mixture of every cop hard boiled cliché ever. there is nothing new with Edison. the acting was bad and the direction was even worse. it looked like that aforementioned episode of wiseguy. this was the best casted direct to video movie i've ever seen. <br /><br />some examples of just bad ...",0,"i saw this movie at the toronto film festival with fairly solid expectations. the movie has a great cast and was closing at the festival so it must be good, right? how wrong i was. i knew we were in trouble when before the film the director was talking about how when he was directing an episode of wiseguy he met an unknown actor named kevin spacey (a director/writer of wiseguy making his feature debut = blah)... well the director/writer of edison must have some incriminating pictures of kevin spacey killing a homeless man, because i cannot see how he (along with the other actors in the film) would ever agree to be in this disaster. this movie is absolutely appalling! it's a mixture of every cop hard boiled cliché ever. there is nothing new with edison. the acting was bad and the direction was even worse. it looked like that aforementioned episode of wiseguy. this was the best casted direct to video movie i've ever seen. some examples of just bad silly moments in edison... mor..."
...,...,...,...,...
1292,28841,"One thing that astonished me about this film (and not in a good way) was that Nathan Stoltzfus, who seems to pride himself on being the major historian on the topic of the Rosenstrasse, was one of the historians working on this film, considering how much of the actual events were altered or disregarded. <br /><br />Another reviewer said that von Trotta said she never meant for Lena to bed Goebbels, but in that case, why did she give every impression that that was what had happened? Why not show other possible reasons for the mens' release, such as the disaster that was Stalingrad, or the Nazis' fear that the international press, based in Berlin, would find out about the protest.<br /><br />Also, why did the whole storyline play second fiddle to a weak family bonding storyline that has been done over and over again? Surely something as awesome as this could carry its own history! In places, it was as if the film had two story lines that really seemed to have little in common.<br /><...",0,"one thing that astonished me about this film (and not in a good way) was that nathan stoltzfus, who seems to pride himself on being the major historian on the topic of the rosenstrasse, was one of the historians working on this film, considering how much of the actual events were altered or disregarded. another reviewer said that von trotta said she never meant for lena to bed goebbels, but in that case, why did she give every impression that that was what had happened? why not show other possible reasons for the mens' release, such as the disaster that was stalingrad, or the nazis' fear that the international press, based in berlin, would find out about the protest. also, why did the whole storyline play second fiddle to a weak family bonding storyline that has been done over and over again? surely something as awesome as this could carry its own history! in places, it was as if the film had two story lines that really seemed to have little in common. overall, this film failed..."
8095,46905,"This is the most ludicrous and laughable thriller I've ever seen. Oh....where to start....<br /><br />Plot (what little there is): Clayton Beresford Jr. (Hayden Christensen), a young billionaire, with a bad heart is desperately in need of a transplant. Clay has been secretly engaged to his mother's PA, Samantha, played by Jessica Alba. On the night that these two secretly get married, it just so happens that a heart donor with the same rare blood type is found. Go and figure the odds of that one! Once on the operating table, Clay finds out the anesthesia isn't working, and he can feel everything and hear everything.<br /><br />Fortunately Clay seems to be able to filter out the pain of a razor sharp scalpel cutting open his chest by simply concentrating on his memories of Samantha, which we are told he's doing through an annoying voice-over which never seems to stop.<br /><br />If you didn't burst out in laughter yet, you will surely start to when you see the surgical scenes. <br /...",0,"this is the most ludicrous and laughable thriller i've ever seen. oh....where to start.... plot (what little there is): clayton beresford jr. (hayden christensen), a young billionaire, with a bad heart is desperately in need of a transplant. clay has been secretly engaged to his mother's pa, samantha, played by jessica alba. on the night that these two secretly get married, it just so happens that a heart donor with the same rare blood type is found. go and figure the odds of that one! once on the operating table, clay finds out the anesthesia isn't working, and he can feel everything and hear everything. fortunately clay seems to be able to filter out the pain of a razor sharp scalpel cutting open his chest by simply concentrating on his memories of samantha, which we are told he's doing through an annoying voice-over which never seems to stop. if you didn't burst out in laughter yet, you will surely start to when you see the surgical scenes. how could a young billionaire agr..."
7656,22251,"Audio:<br /><br />Seriously I've never seen a movie with worse audio. There are scenes where people are walking through the grass, and you can hardly hear them over their footsteps. They must be miking their feet. <br /><br />You know how in some movies they forget a line, so they have to dub it in on a shot of the back of someone's head. Here the editors were not that clever. There is actually a scene where Shannon Tweed's character says her line without moving her lips at all!<br /><br />I'm pretty sure for their background sound they played effects loops live while shooting, because in a lot of scenes the sound effects will either be different or be absent whenever the camera changes angles.<br /><br />I could write a lot more on how bad the audio is in this movie.<br /><br />Other Nuggets:<br /><br />In this movie they probably consider the opening credits to be special effects because they seemed so challenging to produce. The main title and the first few names in the opening ...",0,"audio: seriously i've never seen a movie with worse audio. there are scenes where people are walking through the grass, and you can hardly hear them over their footsteps. they must be miking their feet. you know how in some movies they forget a line, so they have to dub it in on a shot of the back of someone's head. here the editors were not that clever. there is actually a scene where shannon tweed's character says her line without moving her lips at all! i'm pretty sure for their background sound they played effects loops live while shooting, because in a lot of scenes the sound effects will either be different or be absent whenever the camera changes angles. i could write a lot more on how bad the audio is in this movie. other nuggets: in this movie they probably consider the opening credits to be special effects because they seemed so challenging to produce. the main title and the first few names in the opening credits are in white text over a white sky, and they wobble ..."
8878,42217,"Well, I guess I'm emotionally attached to this movie since it's the first one I went to see more than 10 times in the cinema ... helping me through my master's thesis, or rather keeping me from working on it!<br /><br />But on watching it again several years (and many many movies) later - what a well-crafted little gem this is! I've never seen Gwyneth Paltrow in a more convincing performance, and Jeremy Northam is the perfect Mr Knightley - where does one meet such a man??? <<<sigh>>> Sophie Thompson's turn as Ms Bates is virtuoso acting of the finest (oh, napkins, sorry!) and the rest of the cast is no disappointment either - Toni Colette brings a lot of Muriel to her Harriet, and Ewan McGregor is convincingly charming - and Alan Cumming and Juliet Stevenson are the perfect ""impossible"" couple!<br /><br />Of course the sets and costumes, and the beautiful soundtrack contribute a lot to the feelgood, almost Hobbiton-like atmosphere of the movie - although as far as cinematography a...",1,"well, i guess i'm emotionally attached to this movie since it's the first one i went to see more than 10 times in the cinema ... helping me through my master's thesis, or rather keeping me from working on it! but on watching it again several years (and many many movies) later - what a well-crafted little gem this is! i've never seen gwyneth paltrow in a more convincing performance, and jeremy northam is the perfect mr knightley - where does one meet such a man??? <<<sigh>>> sophie thompson's turn as ms bates is virtuoso acting of the finest (oh, napkins, sorry!) and the rest of the cast is no disappointment either - toni colette brings a lot of muriel to her harriet, and ewan mcgregor is convincingly charming - and alan cumming and juliet stevenson are the perfect ""impossible"" couple! of course the sets and costumes, and the beautiful soundtrack contribute a lot to the feelgood, almost hobbiton-like atmosphere of the movie - although as far as cinematography and art decoration go..."


Let's check what is the proportion of positive and negative reviews

In [56]:
df['sentiment'].value_counts()

0    768
1    732
Name: sentiment, dtype: int64

### Using BERTModel with Sentence-Transformer

Similarly, we need to load the pretrained BERTModel from the sentence-transformer package. 

You can refer to the following for a list of pretrained models that you can use! https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models/

#### Load BERTModel 

In [57]:
model_name = 'bert-base-uncased'
model = SentenceTransformer(model_name)

Some weights of the model checkpoint at /root/.cache/torch/sentence_transformers/bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


#### Accessing Sentence Embeddings 

Now let us take a look at an example using the model. The sentences are encoded by calling model.encode() - it is as easy as that!

In [58]:
sentences = ['This framework generates embeddings for each input sentence',
             'Sentence-tranformers as very easy to use!',
             'You do not need GPU to do this.']

embeddings = model.encode(sentences)

print(embeddings)

[[-0.12563434 -0.02353902  0.09721437 ... -0.18094231 -0.3673859
   0.27124205]
 [ 0.03541811 -0.17268927  0.09896865 ... -0.11875959 -0.33384532
   0.19085121]
 [ 0.5384428   0.22281365 -0.10593469 ... -0.22980885 -0.04316803
   0.02130728]]


The usage is nearly identical! We see that the encoder changes every input sentence into a vector of floats. What dimension does this vector have?

In [59]:
embeddings[0].shape

(768,)

This is exactly the same size as the last hidden state of BERTModel in the transformer package! We see that the sentence "Sentence-tranformers as very easy to use!" is represented by the following sentence vector:

In [60]:
embeddings[1]

array([ 3.54181081e-02, -1.72689274e-01,  9.89686549e-02, -3.00843529e-02,
       -1.07976712e-01,  1.25834540e-01,  1.51973829e-01,  1.08027466e-01,
       -2.29049578e-01, -3.19495052e-01, -1.52425662e-01,  1.18023545e-01,
       -4.76746325e-04,  1.06776595e-01, -2.02314302e-01,  4.99243230e-01,
        1.10920548e-01,  1.00633435e-01, -1.25430971e-01, -7.70362616e-02,
        2.18366086e-01,  1.63321793e-01, -3.31447572e-01,  1.01383869e-02,
        5.39042592e-01, -1.66442767e-01, -7.42381662e-02,  2.02318639e-01,
       -3.14765722e-01, -4.26810145e-01,  2.15304554e-01,  5.30921221e-01,
       -3.39888394e-01,  2.02299774e-01, -3.13256472e-01, -1.56792238e-01,
        6.94970787e-02, -6.21613003e-02, -1.40152633e-01, -1.52482122e-01,
       -2.06275225e-01,  1.45107824e-02,  2.58521259e-01, -6.83913603e-02,
       -3.34264815e-01, -1.01247512e-01, -3.04556817e-01, -1.46350473e-01,
       -5.26111573e-02, -2.52130419e-01, -4.62106258e-01,  3.65948886e-01,
        1.67887837e-01, -

We can send the 1500 reviews into the encoder and append another column to the sampled dataframe as follows (may you should run it in Colab, as locally, the laptop sound like its turning into a jet engine)

In [63]:
%%time
df['sentence vectors'] = df['clean review'].apply(lambda x : model.encode(x))
df.head(3)

CPU times: user 16min 11s, sys: 1.11 s, total: 16min 12s
Wall time: 16min 23s


Unnamed: 0.1,Unnamed: 0,review,sentiment,clean review,sentence vectors
9816,9410,"Siskel & Ebert were terrific on this show whether you agreed with them or not because of the genuine conflict their separate professional opinions generated. Roeper took this show down a notch or two because he wasn't really a film critic and because he substituted snide for opinionated. Now, when Ben Lyons comes on I feel like I'm watching ""Teen News"" -- you know, that kids' news show, hosted by kids for kids? Manckiewitz is not much better. It's obvious they've encountered only a steady diet of mainstream films their entire lives. The idea that these two rank amateurs have anything of interest or consequence to say about motion pictures is ludicrous. If they are reviewing a non-formula film, they are completely lost. Show them something original and intelligent -- they just find it ""confusing"". Wait -- I think I get it ... ABC is owned by Disney ... Disney makes movies for kids. While Siskel, Ebert, and Roper promoted independent films and were only hit-or-miss with the big budge...",0,"siskel & ebert were terrific on this show whether you agreed with them or not because of the genuine conflict their separate professional opinions generated. roeper took this show down a notch or two because he wasn't really a film critic and because he substituted snide for opinionated. now, when ben lyons comes on i feel like i'm watching ""teen news"" -- you know, that kids' news show, hosted by kids for kids? manckiewitz is not much better. it's obvious they've encountered only a steady diet of mainstream films their entire lives. the idea that these two rank amateurs have anything of interest or consequence to say about motion pictures is ludicrous. if they are reviewing a non-formula film, they are completely lost. show them something original and intelligent -- they just find it ""confusing"". wait -- i think i get it ... abc is owned by disney ... disney makes movies for kids. while siskel, ebert, and roper promoted independent films and were only hit-or-miss with the big budge...","[0.12747684, 0.12695722, 0.20017108, 0.023577392, 0.0652154, -0.19837473, -0.11650071, 0.63164943, 0.07877401, -0.049230173, 0.2843635, -0.2734874, -0.13287514, 0.25284484, -0.2145649, 0.5728164, 0.24100798, -0.16744763, -0.13560754, 0.28199866, 0.2079681, 0.037680138, -0.093591504, 0.54904306, 0.22079995, -0.039356362, 0.17241994, -0.11453365, -0.24599963, 0.014828393, 0.7331515, 0.062376387, -0.29918447, -0.24257766, -0.202079, -0.11689195, 0.14142585, -0.122059554, 0.19674325, 0.03895964, -0.5236706, -0.31803605, -0.19719711, -0.061993755, -0.3058092, -0.12730862, 0.31152314, 0.059215583, 0.108447306, 0.07053079, -0.20119861, 0.22628213, -0.13635594, -0.012922121, 0.33012116, 0.5186493, 0.0077957767, -0.41570973, -0.4527494, -0.1367272, 0.16974595, 0.020598441, 0.066257305, -0.5263788, 0.16700599, 0.32844812, -0.04115948, 0.35285956, -0.7037647, 0.05476243, -0.22518241, -0.34083632, -0.071753435, -0.21217409, 0.078654826, 0.082201615, -0.02869535, 0.18516096, 0.030868657, -0.051..."
472,1299,"The first hour or so of the movie was mostly boring to say the least. However it improved afterwards as the Valentine Party commenced. Apart from the twist as to the identity of the killer in the very end, the hot bath murder scene was one of the few relatively memorable aspects of this movie. The scene at the garden with Kate was well shot and so was the very last scene (the 'twist'). In those scenes, there was some genuine suspense and thrills and the hot bath murder scene had a nasty (the way slashers should be) edge to it. The earlier murders are frustratingly devoid of gore.",1,"the first hour or so of the movie was mostly boring to say the least. however it improved afterwards as the valentine party commenced. apart from the twist as to the identity of the killer in the very end, the hot bath murder scene was one of the few relatively memorable aspects of this movie. the scene at the garden with kate was well shot and so was the very last scene (the 'twist'). in those scenes, there was some genuine suspense and thrills and the hot bath murder scene had a nasty (the way slashers should be) edge to it. the earlier murders are frustratingly devoid of gore.","[-0.09784494, -0.30302358, 0.11875807, 0.06925495, 0.10718836, -0.03610865, 0.062659144, 0.39954075, 0.22998212, -0.023868876, 0.17068519, -0.33960342, 0.043066554, 0.38100985, 0.016289266, 0.5280523, 0.27111423, -0.18618503, -0.17261703, 0.18009746, 0.3080576, 0.1964427, -0.30292964, 0.78510934, 0.26104692, 0.11196763, -0.042200934, -0.060284276, -0.22782251, -0.0885319, 0.56153786, -0.2134101, -0.086530305, -0.18019898, 0.10806711, -0.28596017, -0.06142846, -0.20454577, -0.050474208, 0.07319284, -0.58109176, -0.10485417, 0.26437932, -0.21602862, -0.26373512, -0.24415524, 0.5656782, 0.1684852, -0.021731058, -0.13236454, -0.2815675, 0.32912165, 0.38839498, -0.06591986, 0.43080083, 0.43071842, -0.17977446, -0.19216876, -0.64284164, -0.12659828, 0.16899793, -0.13302006, 0.090384714, -0.52331996, 0.23202045, 0.12514323, 0.03294783, 0.20026104, -0.57913625, -0.070215374, -0.17325109, -0.31531537, -0.11171883, 0.00091125164, -0.21276006, -0.117994145, 0.23896173, 0.13856752, -0.18114327..."
277,14521,"Surprisingly effective British drama about two very different people who find common ground, and in particular the ""flowering"" of one of them. An embittered, ""Spike""-type youth (McAvoy) with Deuchennes MD is placed in a home for the disabled and quickly makes friends with a youth (Robertson) with cerebral palsy. Robertson has never known anything outside of the home, but McAvoy has and he is bound and determined to get back into the real world. Together, they manage to do just that in this funny and heartwarming and often heartbreaking tale of inner strength overcoming physical shortcomings. The two leads are terrific, especially Robertson, who must surely have spent some time studying the disabled to pull off this tricky role. He appears in almost every scene, and acts up an absolute storm. To anyone who doesn't know, they might think he really has CB. Highly recommended.",1,"surprisingly effective british drama about two very different people who find common ground, and in particular the ""flowering"" of one of them. an embittered, ""spike""-type youth (mcavoy) with deuchennes md is placed in a home for the disabled and quickly makes friends with a youth (robertson) with cerebral palsy. robertson has never known anything outside of the home, but mcavoy has and he is bound and determined to get back into the real world. together, they manage to do just that in this funny and heartwarming and often heartbreaking tale of inner strength overcoming physical shortcomings. the two leads are terrific, especially robertson, who must surely have spent some time studying the disabled to pull off this tricky role. he appears in almost every scene, and acts up an absolute storm. to anyone who doesn't know, they might think he really has cb. highly recommended.","[-0.15925562, 0.13414572, 0.36265647, -0.26789525, 0.5559929, 0.02857536, 0.1714067, 0.46726236, 0.15065736, -0.1053113, 0.043412596, -0.3199285, -0.0274843, 0.24645193, 0.13522843, 0.58032185, 0.4458654, -0.08349051, -0.20038874, 0.21194713, 0.28392604, -0.081998445, -0.28313357, 0.84011567, 0.4141208, 0.1384122, 0.032829594, -0.18712015, -0.18027131, -0.2037022, 0.7080963, -0.13176821, -0.18647285, -0.4170406, 0.03423316, 0.015914572, -0.10753901, -0.11941185, -0.0753528, 0.043494742, -0.4778224, -0.24201933, -0.13372673, -0.100009024, -0.37065658, -0.35725504, 0.35365713, 0.015366825, -0.03064545, 0.060123816, -0.25447962, 0.21013625, 0.26164618, -0.28860882, 0.27570927, 0.49690694, -0.12725112, -0.39253592, -0.3681676, 0.037289, 0.17304003, -0.073768176, 0.12211433, -0.47402275, 0.011443941, 0.32934484, -0.068938166, 0.28955305, -0.5687215, -0.0675519, -0.2113165, -0.38852337, -0.119403616, -0.20695768, -0.29770982, -0.1017956, -0.05929181, 0.24171054, -0.006772123, 0.043996647..."


Now, we can explode the sentence vectors column to form the feature vectors that will be send into the different classifiers

In [64]:
features_vector = np.vstack(list(df['sentence vectors']))
df_fv = pd.DataFrame(features_vector, 
                     columns = [f'feature_{str(i + 1)}' for i in range(768)]) 

df_fv.head(3)

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,...,feature_759,feature_760,feature_761,feature_762,feature_763,feature_764,feature_765,feature_766,feature_767,feature_768
0,0.127477,0.126957,0.200171,0.023577,0.065215,-0.198375,-0.116501,0.631649,0.078774,-0.04923,...,-0.381925,0.089635,0.026228,-0.186767,-0.2175,-0.013892,0.152597,-0.050641,0.497348,0.091247
1,-0.097845,-0.303024,0.118758,0.069255,0.107188,-0.036109,0.062659,0.399541,0.229982,-0.023869,...,-0.164412,-0.179362,0.100142,-0.220612,-0.305787,0.093024,0.268927,-0.05523,0.175695,-0.026837
2,-0.159256,0.134146,0.362656,-0.267895,0.555993,0.028575,0.171407,0.467262,0.150657,-0.105311,...,-0.082451,-0.248119,0.001601,-0.331217,-0.236157,0.039581,-0.087704,-0.06014,0.143393,0.02831


### Classification using Various Machine Learning Models

As a rule of thumb, you should use many different kinds of machine learning classifiers and see which one performms the best. This is to ensure that the model you came up with is robust enough. 

As an example, we will use three different classifiers: K-Nearest Neighbour, Logsitic Regression and Random Forest. We will perform the classic train-test split here, and define a function to return the evalutation metric in a dataframe.

In [65]:
X_train, X_test, y_train, y_test = train_test_split(df_fv, 
                                                    df['sentiment'], 
                                                    random_state = 14)

def eval_metric_df(clf, X_test, y_test, clf_name):
    
    y_predict = clf.predict(X_test)

    # calculate the evaluation metrices of the classifier
    auc_score    = roc_auc_score(y_test, y_predict)
    recall       = recall_score(y_test, y_predict)
    precision    = precision_score(y_test, y_predict)
    f1           = f1_score(y_test, y_predict)
    classifier_score = clf.score(X_test, y_test)
    confusion    = confusion_matrix(y_test, y_predict)

    print('Confusion matrix:\n', confusion, '\n')

    performance_dict = {clf_name : [auc_score, recall, precision, f1, classifier_score]}
    performance_df_clf = pd.DataFrame(data  = performance_dict, 
                                         index = ['AUC', 'Recall', 'Precision', 'F1', 'Score'])
    
    return performance_df_clf

#### K-Nearest Neighbour classifier

In [66]:
knn_clf = KNeighborsClassifier(n_neighbors = 10)
knn_clf.fit(X_train, y_train)

knn_clf_eval_metrics = eval_metric_df(knn_clf, X_test, y_test, 'K-Nearest Neighbour')
knn_clf_eval_metrics

Confusion matrix:
 [[191  11]
 [ 69 104]] 



Unnamed: 0,K-Nearest Neighbour
AUC,0.77335
Recall,0.601156
Precision,0.904348
F1,0.722222
Score,0.786667


#### Logistic Regression

In [67]:
lr_clf = LogisticRegression(max_iter = 10000)
lr_clf.fit(X_train, y_train)

lr_clf_eval_metrics = eval_metric_df(lr_clf, X_test, y_test, 'Logistic Regression')
lr_clf_eval_metrics

Confusion matrix:
 [[180  22]
 [ 24 149]] 



Unnamed: 0,Logistic Regression
AUC,0.87618
Recall,0.861272
Precision,0.871345
F1,0.866279
Score,0.877333


#### Random Forest

In [68]:
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)

rf_clf_eval_metrics = eval_metric_df(rf_clf, X_test, y_test, 'Random Forest')
rf_clf_eval_metrics

Confusion matrix:
 [[165  37]
 [ 28 145]] 



Unnamed: 0,Random Forest
AUC,0.827491
Recall,0.83815
Precision,0.796703
F1,0.816901
Score,0.826667


### Concluding Remarks

And there you have it! In this tutorial, you learned how to use Bert within the sentence-transformer. But one very big drawback is the computational time. Because we are not using GPU, we are relying completely on CPU compute to access the sentence vectors. In our case, for 3000 lines of text, it takes about 16 minutes to return the sentence embeddings.

utes for the model to return the sentence embeddings.


But for small sized text, it is reasonable to use sentence-transformer as a first cut. For larger datasets, we should still use BERTModel within the transformer package.