## About

> BERT (Bidirectional Encoder Representations from Transformers)

It is a very powerful language model that has achieved state of the art results on a variety of natural language processing tasks. 

The architecture of BERT is as follows.

1. It is a transformer based model that uses a multi-layer bidirectional encoder to generate contextualised word embeddings. The model consists of an embedding layer, multiple transformer blocks and a classification layer.

![bert](bert.png)

2. The embedding layer takes in a sequence of tokens and converts them into dense vectors. These vectors are then passed through multiple transformer blocks, each of which consists of a self attention mechanism and a feed forward neural entwork. The self attention mechanism allows the model to attend to different parts of the input sequence, while the feed forward network applies non linear transforms to the input.

3. The output of the final transformer block is then passed through a classification layer, which predicts the probability of each token belonging to a particular class. 

4. In the case of text classification, The classes might correspond to diff. categories of text e.g positive or negative sentiment.


In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m67.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m110.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.29.2


In [2]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from transformers import BertTokenizer, TFBertModel


In [4]:
# Load the IMDb dataset
df = pd.read_csv('IMDB Dataset.csv')


In [5]:
# Preprocess the text data
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
X = df['review'].values
y = df['sentiment'].apply(lambda x: 1 if x == 'positive' else 0).values


Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [6]:
# Tokenize the input data
X_tokens = tokenizer.batch_encode_plus(
    X,
    truncation=True,
    padding=True,
    max_length=512,  # Max sequence length for BERT
    return_tensors='tf'
)

In [7]:
# Convert tokenized input to TensorFlow Dataset
dataset = tf.data.Dataset.from_tensor_slices((dict(X_tokens), y))
dataset = dataset.shuffle(100).batch(16)


In [8]:
# Load the BERT model
bert_model = TFBertModel.from_pretrained('bert-base-uncased')


Downloading tf_model.h5:   0%|          | 0.00/536M [00:00<?, ?B/s]

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [9]:
# Freeze BERT layers
for layer in bert_model.layers:
    layer.trainable = False


In [10]:
# Build the sentiment classification model
input_ids = Input(shape=(512,), dtype=tf.int32, name='input_ids')
attention_mask = Input(shape=(512,), dtype=tf.int32, name='attention_mask')
embedding = bert_model(input_ids, attention_mask=attention_mask)[1]
output = Dense(1, activation='sigmoid')(embedding)
model = Model(inputs=[input_ids, attention_mask], outputs=output)


In [11]:
# Compile the model
optimizer = Adam(lr=2e-5)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])




In [None]:
# Train the model
model.fit(dataset, epochs=3)

# Save the model
model.save('imdb_sentiment_analysis_bert.h5')

Epoch 1/3


  inputs = self._flatten_to_reference_inputs(inputs)


Epoch 2/3
 109/3125 [>.............................] - ETA: 35:53 - loss: 0.4627 - accuracy: 0.7970