<a href="https://colab.research.google.com/github/ABD07xx/ABD07xx/blob/main/CustomBERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Step 1: Setting Up the Environment.**

First, ensure we have the required libraries installed:

In [None]:
!pip install torch transformers datasets
!pip install transformers[torch]

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.

**Step 2: Load pre-trained BERT model and tokenizer**

Load the original BERT model and tokenizer.

In [None]:
from transformers import BertModel, BertTokenizer
import torch

# This will use the GPU if available, otherwise it will use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


# Load pre-trained BERT model and tokenizer
model = BertModel.from_pretrained('bert-base-uncased')
model = model.to(device)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
config = model.config

Using device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

**Step 3: Modify the Word Embeddings**

Create two custom BERT models with the modified word embeddings.

In [None]:
import torch
from torch import nn
from torch.nn import functional as F
from transformers import BertModel, BertTokenizer, BertConfig, BertPreTrainedModel
from transformers.models.bert.modeling_bert import BertEncoder, BertPooler

class ModifiedBertEmbeddings(nn.Module):
    """Construct the embeddings from word, position, and token_type embeddings with modified word embeddings."""
    def __init__(self, config, mode='normal'):
        super().__init__()
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=0)
        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.mode = mode

    def forward(self, input_ids, token_type_ids=None, position_ids=None):
        seq_length = input_ids.size(1)
        if position_ids is None:
            position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device)
            position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
        if token_type_ids is None:
            token_type_ids = torch.zeros_like(input_ids)

        words_embeddings = self.word_embeddings(input_ids)

        if self.mode == 'unit':
            # Normalize the word embeddings to have a unit norm along the last dimension
            # This makes the length of each vector equal to 1
            words_embeddings = F.normalize(words_embeddings, p=2, dim=-1)
        elif self.mode == 'random':
            # Get the sign of each element in the tensor `words_embeddings`
            sign = torch.sign(words_embeddings)

            # Generate random values for each element in the tensor `words_embeddings` within the same shape.
            # These random values are generated uniformly between 0 and 1.
            lengths = torch.rand(words_embeddings.size()).to(words_embeddings.device)

            # Element-wise multiply the random lengths by the sign to preserve the original sign
            words_embeddings = sign * lengths

        position_embeddings = self.position_embeddings(position_ids)
        token_type_embeddings = self.token_type_embeddings(token_type_ids)

        embeddings = words_embeddings + position_embeddings + token_type_embeddings
        embeddings = self.LayerNorm(embeddings)
        embeddings = self.dropout(embeddings)
        return embeddings

class ModifiedBertModel(BertPreTrainedModel):
    def __init__(self, config, mode='normal'):
        super().__init__(config)
        ## Use the ModifiedBertEmbeddings to generate the embeddings. Mode is passed to choose the kind of embedding that gets generated
        self.embeddings = ModifiedBertEmbeddings(config, mode=mode)
        self.encoder = BertEncoder(config)
        self.pooler = BertPooler(config)

        self.init_weights()

    def forward(self, input_ids, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None):
        if attention_mask is None:
            attention_mask = torch.ones_like(input_ids)
        if token_type_ids is None:
            token_type_ids = torch.zeros_like(input_ids)

        extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
        extended_attention_mask = extended_attention_mask.to(dtype=self.embeddings.word_embeddings.weight.dtype)
        extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0

        if head_mask is not None:
            if head_mask.dim() == 1:
                head_mask = head_mask.unsqueeze(0).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
                head_mask = head_mask.expand(self.config.num_hidden_layers, -1, -1, -1, -1)
            elif head_mask.dim() == 2:
                head_mask = head_mask.unsqueeze(1).unsqueeze(-1).unsqueeze(-1)

        embedding_output = self.embeddings(input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
        encoder_outputs = self.encoder(embedding_output, extended_attention_mask, head_mask=head_mask)
        sequence_output = encoder_outputs[0]
        pooled_output = self.pooler(sequence_output)

        outputs = (sequence_output, pooled_output,) + encoder_outputs[1:]
        return outputs

In [None]:
import torch
from torch import nn
from transformers import BertConfig, BertTokenizer

class CustomBertForSequenceClassification(ModifiedBertModel):
    def __init__(self, config, mode='normal'):
        super(CustomBertForSequenceClassification, self).__init__(config, mode=mode)
        self.num_labels = config.num_labels

        # Classification/regression head
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, self.num_labels)
        self.init_weights()

    def forward(self, input_ids, attention_mask=None, token_type_ids=None,
                position_ids=None, head_mask=None, labels=None):
        outputs = super().forward(input_ids, attention_mask=attention_mask,
                                  token_type_ids=token_type_ids, position_ids=position_ids,
                                  head_mask=head_mask)

        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        outputs = (logits,) + outputs[2:]  # Extend the outputs with logits

        if labels is not None:
            if self.num_labels == 1:
                # Regression with Mean Squared Error loss
                loss_fct = nn.MSELoss()
                loss = loss_fct(logits.view(-1), labels.view(-1))
            else:
                # Classification with Cross-Entropy loss
                loss_fct = nn.CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            outputs = (loss,) + outputs

        return outputs  # (loss), logits, (hidden_states), (attentions)

In [None]:
# Configuration and instantiation of the model
config = BertConfig.from_pretrained('bert-base-uncased')
config.num_labels = 2  # Adjust based on the task (e.g., binary classification)

# Initialize Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Instantiate models for unit and random modes
unit_length_model = CustomBertForSequenceClassification(config, mode='unit')
random_length_model = CustomBertForSequenceClassification(config, mode='random')




**Step 4: Benchmarking**

We will use the GLUE benchmark, specifically the SST-2 task, for evaluating the models.


**Prepare the Data**

Load and preprocess the SST-2 dataset.

In [None]:
from transformers import BertTokenizer, Trainer, TrainingArguments
from datasets import load_dataset

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
dataset = load_dataset('glue', 'sst2')

def tokenize_function(examples):
    return tokenizer(examples['sentence'], padding='max_length', truncation=True, max_length=128)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['sentence'])
tokenized_datasets = tokenized_datasets.rename_column('label', 'labels')
tokenized_datasets.set_format('torch')

train_dataset = tokenized_datasets['train']
eval_dataset = tokenized_datasets['validation']

Downloading readme:   0%|          | 0.00/35.3k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.11M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/72.8k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/148k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

Map:   0%|          | 0/1821 [00:00<?, ? examples/s]

**High-Level Configuration for Model Training using Transformers**

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',      # Where to store the final model.
    evaluation_strategy='epoch', # Evaluate at the end of every epoch.
    save_strategy='epoch',       # Save a model checkpoint at the end of each epoch.
    learning_rate=2e-5,          # Common learning rate for fine-tuning BERT on smaller datasets.
    per_device_train_batch_size=8, # Batch size for training.
    per_device_eval_batch_size=8,  # Batch size for evaluation.
    num_train_epochs=1,          # Number of epochs to train for. Setting it as 1 as it is a development task
    weight_decay=0.01,           # Regularization.
    logging_dir='./logs',        # Directory for storing logs.
    logging_steps=10,            # Log every 10 steps.
    load_best_model_at_end=True, # Load the best model at the end of training based on evaluation metric.
    metric_for_best_model='accuracy', # Use accuracy as the metric to determine the best model.
)



In [None]:
from transformers import Trainer, BertForSequenceClassification

def train_and_evaluate_model(model, train_dataset, eval_dataset, training_args):
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=lambda p: {"accuracy": (p.predictions.argmax(-1) == p.label_ids).mean()}  # Implement accuracy calculation for evaluation.
    )
    trainer.train()
    return trainer.evaluate()



**Model Evaluation**

In [None]:
original_model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
original_results = train_and_evaluate_model(original_model, train_dataset, eval_dataset, training_args)

print(f"Original Model Accuracy: {original_results}")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.1155,0.308699,0.927752


Original Model Accuracy: {'eval_loss': 0.3086991608142853, 'eval_accuracy': 0.9277522935779816, 'eval_runtime': 3.0695, 'eval_samples_per_second': 284.089, 'eval_steps_per_second': 35.511, 'epoch': 1.0}


In [None]:
unit_length_results = train_and_evaluate_model(unit_length_model, train_dataset, eval_dataset, training_args)
print(f"Unit Length Model Accuracy: {unit_length_results}")

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3324,0.559618,0.800459


Unit Length Model Accuracy: {'eval_loss': 0.5596182942390442, 'eval_accuracy': 0.8004587155963303, 'eval_runtime': 2.9587, 'eval_samples_per_second': 294.72, 'eval_steps_per_second': 36.84, 'epoch': 1.0}


In [None]:
random_length_results = train_and_evaluate_model(random_length_model, train_dataset, eval_dataset, training_args)
print(f"Random Length Model Accuracy: {random_length_results}")

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5301,0.522298,0.745413


Random Length Model Accuracy: {'eval_loss': 0.5354837775230408, 'eval_accuracy': 0.7396788990825688, 'eval_runtime': 3.2716, 'eval_samples_per_second': 266.537, 'eval_steps_per_second': 33.317, 'epoch': 1.0}
