<!DOCTYPE html>
<html>

<body>

<h1><center>BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding
</center></h1> 
<h4><center>Peer review</center></h4>

</body>
</html>

## Introduction

The paper presents BERT (Bidirectional Encoder Representations from Transformers), a groundbreaking technique in Natural Language Processing (NLP) introduced by researchers at Google AI Language. BERT's release in late 2018 marked a significant advancement in the field due to its remarkable performance and ability to capture bidirectional contextual information in text.


## Architecture

BERT employs a multi-layer bidirectional Transformer encoder, allowing it to capture contextual information from both left and right contexts. This architecture enables the model to generate high-quality representations of text sequences.

## Key Concepts

The review outlines the core concepts underlying BERT's architecture and training methodology. BERT relies on a Transformer model, utilizing self-attention mechanisms to process input sequences. The input is preprocessed with token, segment, and positional embeddings to provide additional context for the model. BERT is pre-trained on two NLP tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). MLM involves predicting masked words within sentences, while NSP focuses on understanding the relationship between pairs of sentences.

## Pre-training and Fine-tuning

The model follows a two-step process: pre-training and fine-tuning. During pre-training, it learns representations of words by training on vast amounts of unlabeled text data using two unsupervised tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). Fine-tuning involves adapting the pre-trained model to specific downstream tasks by adjusting task-specific input and output layers while keeping the pre-trained parameters fixed.

## Experimental Results

The study presents extensive experimental results on various NLP tasks, including the General Language Understanding Evaluation (GLUE) benchmark, Stanford Question Answering Dataset (SQuAD), and Situations With Adversarial Generations (SWAG) dataset. The model achieves state-of-the-art performance across these tasks, outperforming previous models by a significant margin.The study explores the impact of model size on task accuracy. Larger models consistently lead to improved performance across different datasets, demonstrating the scalability of the approach. Performance is compared with prior state-of-the-art methods, highlighting its superiority in terms of accuracy and robustness. Ablation experiments are conducted to assess the importance of different pre-training objectives and model configurations. The results demonstrate the effectiveness of BERT's bidirectional pre-training and the significance of tasks like NSP for improving performance on downstream tasks.

In [133]:
import pandas as pd
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

In [113]:
train_data = pd.read_csv("twitter/twitter_training.csv", header=None, names=["id", "category", "sentiment", "text"])
test_data = pd.read_csv("twitter/twitter_validation.csv", header=None, names=["id", "category", "sentiment", "text"])

In [114]:
train_data = train_data.dropna(subset=["text"])
test_data = test_data.dropna(subset=["text"])

In [115]:
train_data.head(5)

Unnamed: 0,id,category,sentiment,text
0,2401,Borderlands,Positive,im getting on borderlands and i will murder yo...
1,2401,Borderlands,Positive,I am coming to the borders and I will kill you...
2,2401,Borderlands,Positive,im getting on borderlands and i will kill you ...
3,2401,Borderlands,Positive,im coming on borderlands and i will murder you...
4,2401,Borderlands,Positive,im getting on borderlands 2 and i will murder ...


In [116]:
test_data.tail(5)

Unnamed: 0,id,category,sentiment,text
995,4891,GrandTheftAuto(GTA),Irrelevant,⭐️ Toronto is the arts and culture capital of ...
996,4359,CS-GO,Irrelevant,tHIS IS ACTUALLY A GOOD MOVE TOT BRING MORE VI...
997,2652,Borderlands,Positive,Today sucked so it’s time to drink wine n play...
998,8069,Microsoft,Positive,Bought a fraction of Microsoft today. Small wins.
999,6960,johnson&johnson,Neutral,Johnson & Johnson to stop selling talc baby po...


In [117]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

train_texts = train_data["text"].tolist()
test_texts = test_data["text"].tolist()

In [118]:
print("First 5 testing texts:")
for i, text in enumerate(test_texts[:5]):
    print(f"Text {i + 1}: {text}")

First 5 testing texts:
Text 1: I mentioned on Facebook that I was struggling for motivation to go for a run the other day, which has been translated by Tom’s great auntie as ‘Hayley can’t get out of bed’ and told to his grandma, who now thinks I’m a lazy, terrible person 🤣
Text 2: BBC News - Amazon boss Jeff Bezos rejects claims company acted like a 'drug dealer' bbc.co.uk/news/av/busine…
Text 3: @Microsoft Why do I pay for WORD when it functions so poorly on my @SamsungUS Chromebook? 🙄
Text 4: CSGO matchmaking is so full of closet hacking, it's a truly awful game.
Text 5: Now the President is slapping Americans in the face that he really did commit an unlawful act after his  acquittal! From Discover on Google vanityfair.com/news/2020/02/t…


In [119]:
train_encodings = tokenizer(train_texts, return_tensors="tf", padding=True, truncation=True)
test_encodings = tokenizer(test_texts, return_tensors="tf", padding=True, truncation=True)

In [134]:
label_encoder = LabelEncoder()
train_labels = label_encoder.fit_transform(train_data["sentiment"])
test_labels = label_encoder.transform(test_data["sentiment"])

In [135]:
bert_model = TFBertModel.from_pretrained("bert-base-uncased")
input_ids = tf.keras.Input(shape=(None,), dtype=tf.int32)
attention_mask = tf.keras.Input(shape=(None,), dtype=tf.int32)
outputs = bert_model([input_ids, attention_mask])[1]  # Extract pooled output for classification
outputs = tf.keras.layers.Dense(len(label_encoder.classes_), activation="softmax")(outputs)
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=outputs)

# Compile the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

# Train the model
model.fit(train_encodings, train_labels, epochs=3, batch_size=32, validation_split=0.1)

# Evaluate the model on test set
test_loss, test_accuracy = model.evaluate(test_encodings, test_labels, verbose=0)
print(f"Test Accuracy: {test_accuracy:.4f}")

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

Epoch 1/3


ValueError: in user code:

    File "C:\Users\Gaming PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\keras\engine\training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\Gaming PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\keras\engine\training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\Gaming PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\keras\engine\training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\Gaming PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\keras\engine\training.py", line 1050, in train_step
        y_pred = self(x, training=True)
    File "C:\Users\Gaming PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "C:\Users\Gaming PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\keras\engine\input_spec.py", line 219, in assert_input_compatibility
        raise ValueError(

    ValueError: Layer "model" expects 2 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 315) dtype=int32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 315) dtype=int32>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 315) dtype=int32>]


## Conclusion

Overall, the review acknowledges the significance of advancing NLP research and applications. Effectiveness, ease of use, and state-of-the-art performance make it a valuable tool for researchers and practitioners in the field. The paper's clear explanations and practical guidance on fine-tuning contribute to its widespread adoption and success across diverse language understanding tasks.

## References 

[BERT - arxiv.org](https://arxiv.org/abs/1810.04805)