This notebook is a template for you to measure efficiency of your model on a single datapoint in terms of

1.   Number of model parameters that are actually trained.
2.   Floating point operations (FLOPs) during training.
3.   Inference time of one sample.
4.   Floating point operations (FLOPs) during training.


------------------------------------------------------------------------------

*Please measure FLOPs on a **single instance for one epoch during training**. The same holds during testing.*

We use PyTorch profiler (https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) for measuring FLOPs. The **last 3 code cells** are important for understanding how to utilise the profiler. The rest of the notebook is setting the grounds for the data, model, tokenizer etc.

*Note that there's a difference between FLOPs and FLOP/s. We are interested in the first quantity.*

In [None]:
!pip install datasets



#Imports

In [None]:
import numpy as np
import os
import transformers
import itertools
import pandas as pd
import math
from transformers import GPTNeoXForCausalLM, AutoTokenizer
from transformers import (
    set_seed,
)
from transformers import DataCollatorForLanguageModeling,DataCollatorWithPadding
from transformers import AutoModelForCausalLM
from sklearn.metrics import accuracy_score
import wandb
import pickle
import string
from datasets import Dataset, DatasetDict, load_dataset
import torch
import torch.nn.functional as F
import logging
import numpy as np
import string
import time

#Initialization

In [None]:
SEED = 42
PRE_TRAINING_CHECKPOINT = 'step143000'

MODEL_SIZE = '70m'
MODEL_NAME = f"EleutherAI/pythia-{MODEL_SIZE}"
set_seed(SEED)
device = torch.device("cuda")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,revision=PRE_TRAINING_CHECKPOINT)
logger = logging.getLogger(__name__)
tokenizer.pad_token = tokenizer.eos_token

#Generate data

In [None]:
def generate_non_random_strings(seed):

    sequences = []
    seq= []
    alllines = ''
    rng = np.random.default_rng(seed)
    sequences = ["SYS: Hello, I am the customer support bot. What can I do for you? USR: Hello robot. I ordered a pot several days ago but I can't track it. SYS: Could you verify your full name? USR: Patrick Schug SYS: Verify your order number please. USR: It's 843-58572-7002. SYS: You can track your package with your tracking number, which is AGZIM5T6KL. Are you happy about my answer? USR: All good. See you. SYS: Have a nice day! Bye."]
    tseq = ["SYS: Hello, I am the customer support bot. What can I do for you? USR: Hi. I ordered a mobile phone several days ago but I can't track it. SYS: May I have your full name? USR: James Salim. SYS: Verify your phone number please. USR: 980.322.8737 is my number. SYS: Track your order using your tracking number, 0UOKFRS1GA. Anything else? USR: No more questions. See you. SYS: Bye."]
    dataset = Dataset.from_dict(
        {
            "text": sequences,
        }
    )
    test_dataset = Dataset.from_dict(
        {
            "text": tseq
        }
    )
    datasets = DatasetDict(
        {
            "train": dataset,
            "test": test_dataset
        }
    )
    datasets.set_format("torch")
    return datasets

#Tokenize the data

In [None]:
def tokenize_string(tokenizer,dataset):
    def encode(example: dict):
        sequences = example["text"]
        return tokenizer(sequences,truncation=True)

    return dataset.map(
        encode,
        batched=True,
    )

#Creating trainloader

In [None]:
dataset = generate_non_random_strings(seed=42)
encoded_dataset = tokenize_string(tokenizer, dataset)
training_dataset = encoded_dataset.remove_columns(["text"])
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
train_loader = torch.utils.data.DataLoader(training_dataset["train"], shuffle=True, batch_size=1, collate_fn=data_collator)
test_loader = torch.utils.data.DataLoader(training_dataset["test"], shuffle=True, batch_size=1, collate_fn=data_collator)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, revision=PRE_TRAINING_CHECKPOINT)

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Map:   0%|          | 0/1 [00:00<?, ? examples/s]

#Initialise model and optimiser for training

In [None]:
model.to(device)
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.002)

#Import profiler for tracking the FLOPs

In [None]:
from torch.profiler import profile, record_function, ProfilerActivity

#Training loop with the profiler as a context manager.

The profiler is enabled using a context manager. It records the time and memory consumption of the models that are wrapped inside it.

In [None]:
for epoch in range(1):
  with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],with_flops=True) as prof:
    for idx, batch in enumerate(train_loader):
      batch = batch.to(device)

      inputs = {'input_ids': batch['input_ids'],'attention_mask': batch['attention_mask'],'labels': batch['labels']}
      #print(inputs)
      outputs = model(**inputs) # output = loss, logits, past_key_values
      print("Number of model parameters that are used for training")
      print(sum(p.numel() for p in model.parameters()))
      #print(outputs)
      loss = outputs.loss
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()
  print(prof.key_averages().table(sort_by="flops",row_limit=10))
  print("GFLOPs during training") #GigaFLOPs
  print(sum(k.flops for k in prof.key_averages())/1e9)


Number of model parameters that are used for training
70426624
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  Total MFLOPs  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                               aten::mm         1.90%       1.155ms         2.83%       1.723ms      33.781us      12.128ms        25.13%      12.128ms     237.798us            51     29444.014  
                                            aten::addmm         1.75%       1.064ms     

In [None]:
model.eval()
start = time.time()
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],with_flops=True) as prof:
    for idx, batch in enumerate(test_loader):
      batch = batch.to(device)
      inputs = {'input_ids': batch['input_ids'],'attention_mask': batch['attention_mask'],'labels': batch['labels']}
      #print(inputs)
      outputs = model(**inputs) # output = loss, logits, past_key_values
print("Inference time :"+str(time.time()-start))
#print(prof.key_averages().table(sort_by="flops",row_limit=10))
print("GFLOPs during testing") #GigaFLOPs
print(sum(k.flops for k in prof.key_averages())/1e9)

Inference time :0.0369417667388916
GFLOPs during testing
10.444681728
