# Sentiment Analysis Using BERT

- Using `ktrain` for learning
- Using BERT pre-trained language model

## Installing ktrain

In [1]:
!pip install ktrain


Collecting ktrain
[?25l  Downloading https://files.pythonhosted.org/packages/f2/d5/a366ea331fc951b8ec2c9e7811cd101acfcac3a5d045f9d9320e74ea5f70/ktrain-0.21.4.tar.gz (25.3MB)
[K     |████████████████████████████████| 25.3MB 128kB/s 
Collecting keras_bert>=0.86.0
  Downloading https://files.pythonhosted.org/packages/e2/7f/95fabd29f4502924fa3f09ff6538c5a7d290dfef2c2fe076d3d1a16e08f0/keras-bert-0.86.0.tar.gz
Collecting langdetect
[?25l  Downloading https://files.pythonhosted.org/packages/56/a3/8407c1e62d5980188b4acc45ef3d94b933d14a2ebc9ef3505f22cf772570/langdetect-1.0.8.tar.gz (981kB)
[K     |████████████████████████████████| 983kB 22.1MB/s 
Collecting cchardet
[?25l  Downloading https://files.pythonhosted.org/packages/1e/c5/7e1a0d7b4afd83d6f8de794fce82820ec4c5136c6d52e14000822681a842/cchardet-2.1.6-cp36-cp36m-manylinux2010_x86_64.whl (241kB)
[K     |████████████████████████████████| 245kB 50.8MB/s 
Collecting seqeval
  Downloading https://files.pythonhosted.org/packages/d8/9d/89d9ac

## Importing Libraries

In [2]:
import tensorflow as tf
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf

In [3]:
tf.__version__

'2.3.0'

## Clone Git Repository for Data

In [4]:
!git clone https://github.com/laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k.git

Cloning into 'IMDB-Movie-Reviews-Large-Dataset-50k'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 10 (delta 1), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (10/10), done.


## Data Preparation

In [5]:
#loading the train dataset

data_train = pd.read_excel('IMDB-Movie-Reviews-Large-Dataset-50k/train.xlsx', dtype = str)
#loading the test dataset

data_test = pd.read_excel('IMDB-Movie-Reviews-Large-Dataset-50k/test.xlsx', dtype = str)

In [6]:
#dimension of the dataset

print("Size of train dataset: ",data_train.shape)
print("Size of test dataset: ",data_test.shape)
#printing last rows of train dataset

data_train.tail()
#printing head rows of test dataset

data_test.head()

Size of train dataset:  (25000, 2)
Size of test dataset:  (25000, 2)


Unnamed: 0,Reviews,Sentiment
0,Who would have thought that a movie about a ma...,pos
1,After realizing what is going on around us ......,pos
2,I grew up watching the original Disney Cindere...,neg
3,David Mamet wrote the screenplay and made his ...,pos
4,"Admittedly, I didn't have high expectations of...",neg


## Train-Test Split

In [7]:
# text.texts_from_df return two tuples
# maxlen means it is considering that much words and rest are getting trucated
# preprocess_mode means tokenizing, embedding and transformation of text corpus(here it is considering BERT model)


(X_train, y_train), (X_test, y_test), preproc = text.texts_from_df(train_df=data_train,
                                                                   text_column = 'Reviews',
                                                                   label_columns = 'Sentiment',
                                                                   val_df = data_test,
                                                                   maxlen = 500,
                                                                   preprocess_mode = 'bert')

downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en


Is Multi-Label? False
preprocessing test...
language: en


## Define Model

In [9]:
# name = "bert" means, here we are using BERT model.

model = text.text_classifier(name = 'bert',
                             train_data = (X_train, y_train),
                             preproc = preproc)

Is Multi-Label? False
maxlen is 500
done.


## Define Learner

In [10]:
#here we have taken batch size as 6 as from the documentation it is recommend to use this with maxlen as 500

learner = ktrain.get_learner(model=model, train_data=(X_train, y_train),
                   val_data = (X_test, y_test),
                   batch_size = 6)

## Fit Model

In [10]:
#Essentially fit is a very basic training loop, whereas fit one cycle uses the one cycle policy callback

learner.fit_onecycle(lr = 2e-5, epochs = 1)




begin training using onecycle policy with max lr of 2e-05...


<tensorflow.python.keras.callbacks.History at 0x7f159cbab668>

## Save Model

In [11]:
predictor = ktrain.get_predictor(learner.model, preproc)
predictor.save('/content/drive/My Drive/ColabData/bert')

KeyboardInterrupt: ignored

## Prediction

In [13]:
#sample dataset to test on

data = ['this movie was horrible, the plot was really boring. acting was okay',
        'the fild is really sucked. there is not plot and acting was bad',
        'what a beautiful movie. great plot. acting was good. will see it again']


In [14]:
predictor.predict(data)

['neg', 'neg', 'pos']

In [15]:
#return_proba = True means it will give the prediction probabilty for each class

predictor.predict(data, return_proba=True)

array([[0.9971378 , 0.00286228],
       [0.9964477 , 0.00355225],
       [0.00146626, 0.9985337 ]], dtype=float32)

In [16]:
#classes available

predictor.get_classes()

['neg', 'pos']

In [17]:
# !zip -r /content/bert.zip /content/bert

## Deploy Model

In [13]:
# #loading the model

predictor_load = ktrain.load_predictor('/content/drive/My Drive/ColabData/bert')

Call to keras.models.load_model failed.  Try using the learner.model.save_weights and learner.model.load_weights instead.
Error was: SavedModel file does not exist at: /content/drive/My Drive/ColabData/bert/{saved_model.pbtxt|saved_model.pb}


ValueError: ignored

In [19]:
# #predicting the data

# predictor_load.predict(data)

## References

- [`ktrain` module](https://github.com/amaiya/ktrain)
- [Sentiment Classification Using Bert](https://kgptalkie.com/sentiment-classification-using-bert/)
- [當Bert遇上Keras：這可能是Bert最簡單的打開姿勢](http://www.ipshop.xyz/15376.html)
- [進擊的 BERT：NLP 界的巨人之力與遷移學習](https://leemeng.tw/attack_on_bert_transfer_learning_in_nlp.html)