# Using BERT (tensorflow version) for sentiment classification
Derived from: [A visual guide to using BERT for the first time.](https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/)

Uses the [Hugging Face](https://huggingface.co/) [transformers library](https://github.com/huggingface/transformers): 

## First we load and inspect the data.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import time

from transformers import BertTokenizer, TFBertModel
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

# The data consists of text, label pairs where the label is 1 (positive sentiment) or 0 (negative sentiment)
df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None)
df.head()

Unnamed: 0,0,1
0,"a stirring , funny and finally transporting re...",1
1,apparently reassembled from the cutting room f...,0
2,they presume their audience wo n't sit still f...,0
3,this is a visually stunning rumination on love...,1
4,jonathan parker 's bartleby should have been t...,1


## Get hold of the Bert tokeniser and model

In [2]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertModel.from_pretrained('bert-base-uncased')

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


## Tokenize the data and then obtain Bert embeddings for each sentence. 

Here we process the tokenized sentences in batches - this was due to running out of memory on my 8GB iMac when
running the model on all the sentences at once.

In [None]:
tokenized = tokenizer(df[0].tolist(), padding=True, truncation=True, return_tensors='tf')

def get_batches(tokenized, batch_size=1000):
    # We expected tokenized to be a dictionary containing two entries:
    #     inputs_ids
    #     attention_mask
    # both are arrays of the same length.
    length = len(tokenized['input_ids'])
    for b in range(0, length, batch_size):
        batch = {}
        for k, v in tokenized.items():
            batch[k] = v[b:b + batch_size]
        yield batch

def get_features_for_batch(batch):
    outputs = model(batch)
    return outputs[0][:,0,:].numpy()

features = None

start = time.time()
num = 0
for batch in get_batches(tokenized):
    batch_features = get_features_for_batch(batch)
    if features is None:
        features = batch_features
    else:
        features = np.append(features, batch_features, 0)
    print(f'Processed batch: {num}')
    num += 1

elapsed = time.time() - start

print(f'Generating {len(df)} embeddings took: {elapsed:.4f} seconds. This is {elapsed / len(df):.4f} seconds per embedding.')

print(features.shape)

## Add the labels back in and create the train / test split 

In [4]:
labels = df[1]
train_features, test_features, train_labels, test_labels = train_test_split(features, labels)


## Train a logistic regression classifier and report the accuracy on the test data

In [5]:
lr_clf = LogisticRegression(max_iter=1000)
lr_clf.fit(train_features, train_labels)


LogisticRegression(max_iter=1000)

In [6]:
start = time.time()
print(f'Accuracy: {100 * lr_clf.score(test_features, test_labels):.2f}%, computed over {len(test_features)} test items.')
elapsed = time.time() - start
print(f'Evaluation took {elapsed:.4f} seconds, or {elapsed/len(test_features):.6f} s per test item.')

Accuracy: 84.57%, computed over 1730 test items.
Evaluation took 0.0085 seconds, or 0.000005 s per test item.
