# <center> Sentiment Analysis </center>
We seek to assess the accuracy of classification performance of a well tuned base BERT Transformer with Tensorflow.<br> We will be using the [Rotten Tomatoes movie reviews dataset](https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data) for the analysis <br>.
This excercise is performed across three phases:
<ol>
    <li> Data Transformation - Data Loading + EDA + Tokenization </li>
    <li> Model Building and Training </li>
    <li> Prediction </li>
</ol>

## Part 2 - Model Building and Training

In [1]:
%config Completer.use_jedi = False

In [2]:
import tensorflow as tf
import numpy as np
import pandas as pd

### Model design

In [3]:
from transformers import TFAutoModel

In [4]:
bert=TFAutoModel.from_pretrained('bert-base-cased') #creating a BERT layer object and updating the weights from the trained model

Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [5]:
bert.summary()

Model: "tf_bert_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bert (TFBertMainLayer)       multiple                  108310272 
Total params: 108,310,272
Trainable params: 108,310,272
Non-trainable params: 0
_________________________________________________________________


In [6]:
from tensorflow.keras import layers

In [16]:
input_layer=tf.keras.layers.Input(shape=(512,),name='ids',dtype='int32' )
mask_layer=tf.keras.layers.Input(shape=(512,),name='masks',dtype='int32')
embedding=bert.bert(input_layer, mask_layer)[1]
dense1=tf.keras.layers.Dense(512, activation='relu')(embedding)
dense2=tf.keras.layers.Dense(64, activation='relu')(dense1)
dense3=tf.keras.layers.Dense(5, activation='softmax',name='outputs')(dense2)

model=tf.keras.Model(inputs=(input_layer, mask_layer),outputs=dense3)

In [17]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
ids (InputLayer)                [(None, 512)]        0                                            
__________________________________________________________________________________________________
masks (InputLayer)              [(None, 512)]        0                                            
__________________________________________________________________________________________________
bert (TFBertMainLayer)          TFBaseModelOutputWit 108310272   ids[0][0]                        
                                                                 masks[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 512)          393728      bert[1][1]                 

<b>We need to ensure that ids, masks, and the bert layer weights are not influenced by the training process and only the Dense layers are trained

In [18]:
model.layers[2].trainable=False

In [19]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
ids (InputLayer)                [(None, 512)]        0                                            
__________________________________________________________________________________________________
masks (InputLayer)              [(None, 512)]        0                                            
__________________________________________________________________________________________________
bert (TFBertMainLayer)          TFBaseModelOutputWit 108310272   ids[0][0]                        
                                                                 masks[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 512)          393728      bert[1][1]                 

Hence, we see that the number of trainable params has decreased from 86,959,877 to 1,314,821 (98.5% decrease)

In [20]:
losses=tf.keras.losses.CategoricalCrossentropy()
optimizers=tf.keras.optimizers.Adam(learning_rate=5e-5, decay=1e-6)
accuracies=tf.keras.metrics.CategoricalCrossentropy('accuracies')

model.compile(optimizer=optimizers,loss=losses,metrics=[accuracies])

### Data Loading

In [21]:
elem_spec=({'ids': tf.TensorSpec(shape=(32, 512), dtype=tf.int32, name=None), 'masks': tf.TensorSpec(shape=(32, 512), dtype=tf.int32, name=None)}, tf.TensorSpec(shape=(32, 5), dtype=tf.float64, name=None))

In [22]:
trainset=tf.data.experimental.load('train',element_spec=elem_spec)
validset=tf.data.experimental.load('validate', element_spec=elem_spec)

In [23]:
trainset.take(2000)

<TakeDataset shapes: ({ids: (32, 512), masks: (32, 512)}, (32, 5)), types: ({ids: tf.int32, masks: tf.int32}, tf.float64)>

In [None]:
hist=model.fit(trainset,validation_data=validset,epochs=3)

Epoch 1/3
 267/4389 [>.............................] - ETA: 65:10:30 - loss: 1.1811 - accuracies: 1.1811

In [None]:
model.save('BERT_sentiment_analysis')