# BERT
**BERT**, or <u>Bidirectional Encoder Representations from Transformers</u>, is a powerful natural language processing (NLP) technique developed by researchers at Google. It's based on the *Transformer architecture* and is designed to understand the context of words in a sentence by considering the words that come before and after them. Unlike previous models that processed words in a left-to-right or right-to-left manner, BERT can take into account the entire context of a word by processing it bidirectionally.

BERT has achieved state-of-the-art results in various NLP tasks such as question answering, sentiment analysis, and language translation. It's pre-trained on large corpora of text data and can then be fine-tuned on specific tasks with smaller, task-specific datasets. This pre-training followed by fine-tuning approach has made BERT highly effective in a wide range of NLP applications.

#### [Text preprocessing for BERT + SavedModel implementation of the encoder API](https://www.kaggle.com/models/tensorflow/bert/tensorFlow2/en-uncased-l-12-h-768-a-12)

#### Tensorflow2 BERT encoder model : **bert/tensorFlow2/en-uncased-l-12-h-768-a-12**

Here, we are using "BERT Base" model having configuartion like

**l-12-h-768-a-12**
* l - 12 : layer = 12
* h-768 : hidden state 768
* a - 12 : attention 12

In [1]:
!pip install h5py
!pip install typing-extensions
!pip install wheel



In [2]:
# !pip install tensorflow_text --use-deprecated=legacy-resolver
# !pip install tf-keras --use-deprecated=legacy-resolver
!pip3 install --quiet "tensorflow-text==2.15.*"
!pip install tensorflow_hub
# !pip install tf-keras



#### [The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)](https://jalammar.github.io/illustrated-bert/)

In [3]:
import tensorflow_hub as tfhub
import tensorflow_text as tftxt

In [4]:
bert_preprocess_url = "https://www.kaggle.com/models/tensorflow/bert/TensorFlow2/en-uncased-preprocess/3"
bert_encoder_url = "https://www.kaggle.com/models/tensorflow/bert/TensorFlow2/en-uncased-l-12-h-768-a-12/4"

In [5]:
bert_preprocess_model = tfhub.KerasLayer(bert_preprocess_url)

In [6]:
text_test = ['nice movie friend', 'I loved python programming']
text_preprocessed = bert_preprocess_model(text_test)
# text_preprocessed is a dictionary, so let's take a look into keys
text_preprocessed.keys()

dict_keys(['input_word_ids', 'input_type_ids', 'input_mask'])

In [7]:
text_preprocessed['input_mask']
# the below first logic 1 is due to this
# CLS nice movie friend SEP

# here two (2) sentences and each is max length of 128 words

<tf.Tensor: shape=(2, 128), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32)>

In [8]:
text_preprocessed['input_word_ids']
# CLS nice movie friend SEP
# 101 3835 3185  2767   102

<tf.Tensor: shape=(2, 128), dtype=int32, numpy=
array([[  101,  3835,  3185,  2767,   102,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0, 

In [9]:
bert_model = tfhub.KerasLayer(bert_encoder_url)
bert_result = bert_model(text_preprocessed)
bert_result.keys()

dict_keys(['sequence_output', 'encoder_outputs', 'default', 'pooled_output'])

#### Generating Embedding vector of 768 length long for each word for 2 sentences

In [10]:
bert_result['pooled_output']

<tf.Tensor: shape=(2, 768), dtype=float32, numpy=
array([[-0.8271397 , -0.27866328,  0.14801416, ...,  0.19063216,
        -0.5919316 ,  0.8768732 ],
       [-0.8536233 , -0.4849649 , -0.7472432 , ..., -0.6639452 ,
        -0.6689642 ,  0.88667357]], dtype=float32)>

#### Each word space vector of length 128 for 2 sentences and each word vector space is 768 long

In [11]:
bert_result['sequence_output']

<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy=
array([[[-0.09816232, -0.03544736, -0.19267285, ..., -0.15073586,
          0.06901345,  0.02509328],
        [ 0.4862165 , -0.65314376,  0.62446845, ..., -0.0665983 ,
         -0.02937954, -0.0993511 ],
        [ 0.6566795 , -0.73960876, -0.29676223, ..., -0.26809773,
         -0.02502404, -0.42103842],
        ...,
        [ 0.13351414, -0.1494981 ,  0.47524282, ..., -0.05144007,
         -0.11993749,  0.10318385],
        [ 0.11948572, -0.14179072,  0.5047172 , ..., -0.08380911,
         -0.11863765,  0.05432591],
        [-0.08825113, -0.24356994,  0.35768592, ...,  0.08185339,
         -0.09267313,  0.07827456]],

       [[-0.09676915,  0.21864223, -0.3315135 , ..., -0.12111492,
          0.38034782,  0.54683983],
        [ 0.5134471 ,  0.0667595 , -0.26812196, ..., -0.08139098,
          0.54236966,  0.32305327],
        [ 1.1335996 ,  0.55607015,  0.36610845, ...,  0.32882318,
          0.2631164 ,  0.15120265],
        ...,

#### length of encoder in BERT Base model which 12 layer in length and each is of length of 768 vector long

In [12]:
len(bert_result['encoder_outputs'])

12