## Why BERT? 

- BERT stands for <code style="background:yellow;color:black">Bidirectional Encoder Representations from Transformers</code>. It is designed to pre-train deep __bidirectional representations__ from unlabeled text by jointly conditioning on both left and right context.

- Due to being bilingual it can generalises well and when trained on huge corpus it generates good results.

- This __pre-training step__ is half the magic behind BERT’s success. This is because as we train a model on a large text corpus, our model starts to pick up the deeper and intimate understandings of how the language works.



## Reading the data

In [16]:
import pandas as pd
import numpy as np
import os
from termcolor import colored
import warnings
import ktrain
import tensorflow as tf

In [17]:
# Reading in proper format from our mentioned file
df_train = pd.read_pickle('train_features.pkl')
df_test  = pd.read_pickle('test_features.pkl')

<div class="alert alert-block alert-danger">
    
<b>BERT : </b>We will be using __KTrain Wrapper__ for it. Since the architecture of the model is predetermined we don't have a pre-defined model for it. So it's just to check how well it performs in comparision to my other models.
    
</div>

## BERT Model

The model used here is a pre-trained model by using K-train wrapper

In [18]:
(X_train, y_train), (X_test, y_test), preproc = ktrain.text.texts_from_df(train_df=df_train,
                                                                   text_column = 'Text',
                                                                   label_columns = 'label',
                                                                   val_df = df_test,
                                                                   maxlen = 512,
                                                                   preprocess_mode = 'bert')

['not_label', 'label']
   not_label  label
0        1.0    0.0
1        1.0    0.0
2        1.0    0.0
3        0.0    1.0
4        1.0    0.0
['not_label', 'label']
   not_label  label
0        1.0    0.0
1        1.0    0.0
2        1.0    0.0
3        1.0    0.0
4        1.0    0.0
preprocessing train...
language: en


Is Multi-Label? False
preprocessing test...
language: en


In [19]:
model = ktrain.text.text_classifier(name = 'bert',
                             train_data = (X_train, y_train),
                             preproc = preproc)

Is Multi-Label? False
maxlen is 512
done.


In [20]:
# Initialise the model
learner = ktrain.get_learner(model=model, train_data=(X_train, y_train),
                   val_data = (X_test, y_test),
                   batch_size = 6)

In [None]:
%%time
# Let's train the model
learner.fit_onecycle(2e-5, 1)

In [None]:
learner.view_top_losses(n=1, preproc=preproc)