# Sentiment Classification with BERT

We want to predict sentiment from a IMDB movie review dataset. For every review we take only the first 100 words - we want in fact to check wheter most of the actual sentiment is at the beginning of the reviews, rather than at the end.

In this dataset, sentiment is either positive or negative, 0 or 1. (There exists also a version with the 1 to 10 scale of votes)

- [BERT code](https://github.com/google-research/bert)


# In this Notebook, you're going to finish some following #TODOs 

- Try to understand how a big model such as BERT is handled in Tensorflow... without tensorflow hub

In [None]:

import os, sys

def add_path(path):
    module_path = os.path.abspath(os.path.join(path))
    if module_path not in sys.path:
        sys.path.append(module_path)
        
    [add_path(x[0]) for x in os.walk(path) if x[0] != path]
    

add_path('../../pythonlibs/embeddings')
add_path('../../')

In [None]:
import numpy as np
import datetime
import zipfile
from bert.helper import get_tpu_estimator, imdb_to_bert_features
import bert.run_classifier as run_classifier
from imdb.helper import get_imdb_reviews_dataset
import tensorflow as tf

# 1. Download the bert model and save it to the specified **bert_repo_path**.

These files contain the architecture and the weights of BERT.

In [None]:
bert_repo_path = '../../pythonlibs/embeddings/bert/model_repo'

In [None]:

!wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
    

with zipfile.ZipFile("uncased_L-12_H-768_A-12.zip","r") as zip_ref:
    zip_ref.extractall(bert_repo_path)

!ls 'model_repo/uncased_L-12_H-768_A-12'

!wget https://raw.githubusercontent.com/google-research/bert/master/modeling.py 
!wget https://raw.githubusercontent.com/google-research/bert/master/optimization.py 
!wget https://raw.githubusercontent.com/google-research/bert/master/run_classifier.py 
!wget https://raw.githubusercontent.com/google-research/bert/master/tokenization.py 



## 2. Load IMDB dataset from the specified path

#### Additionally, the dataset must be converted into word+position+phrase embeddings before BERT can process it.

In [None]:
dataset_folder = '../../../data/aclImdb'

(train_x, train_y), (test_x, test_y) = get_imdb_reviews_dataset(path=dataset_folder, max_dataset_size=1000, trunc=100)


max_seq_length = 150

train_features = imdb_to_bert_features(np.hstack([train_x, train_y]), max_seq_length, repo_path=bert_repo_path, train=True)
test_features  = imdb_to_bert_features(np.hstack([test_x, test_y]), max_seq_length, repo_path=bert_repo_path, train=False)


# 3. Specify a classifier to be concatenated to BERT
We will use a pre-trained model BERT and put on top of it a classifier. This classifier will thus use BERT's output features to make its decisions

In [None]:
def classifier_model_fn(bert_output, labels, num_labels, is_training=True):
    '''
    bert_output: BERT's output tensor
    labels: labels tensor
    num_labels: number of labels (used to size the tensor)
    is_training: True if the model is created for training.
    '''
    
    output_layer = bert_output
    hidden_size = output_layer.shape[-1].value

    output_weights = tf.get_variable(
                            name="output_weights", 
                            shape=[num_labels, hidden_size] 
                     )

    output_bias = tf.get_variable(
                            name="output_bias", 
                            shape=[num_labels]
                  )

    if is_training:
        # I.e., 0.1 dropout
        output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias, name='output_logits')
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)

    return per_example_loss, logits, probabilities

# 4. Train the classifier and BERT


A [tensorflow estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)
contains all of the procedures to perform training. It's a nice way to pack a model.

In [None]:
batch_size = 1
epochs = 10

print('***** Started training at {} *****'.format(datetime.datetime.now()))
print('  Num examples = {}'.format(len(train_features)))
print('  Batch size = {}'.format(batch_size))





'''
After instantiating BERT from the relative repository, get_tpu_estimator will create and stack on top of it the classifier.
Finally, everything gets packed inside an estimator and returned.
'''


estimator = get_tpu_estimator(bert_repo_path, epochs, len(train_features), batch_size, 2, classifier_model_fn)


# An estimator must be fed with a dataset a batch at a time. 
# This will create the function that will feed the model.

train_input_fn = run_classifier.input_fn_builder(
                        features=train_features,
                        seq_length=max_seq_length,
                        is_training=True,
                        drop_remainder=True)


# Train the estimator with the feeding function.
estimator.train(input_fn=train_input_fn)

print('***** Finished training at {} *****'.format(datetime.datetime.now()))

# 5. Conclusions

In this Notebook we saw how to use BERT for a custom classification problem.