This notebook estimates the memory consumption of transformer models for fine-tuning and inference.

This is only an approximation of the total memory consumed by the model with a basic inference/fine-tuned framework without any particular optimization.

To get the estimation, run all the cells.

First, if you want to estimate the memory consumption of recent models, make sure you are using the last version of Hugging Face transformers.

In the following interactive cell, enter the name of the model. It can be the name of the repository on the Hugging Face Hub or a local path.
This cell retrieves the architecture of the model.


In [None]:
from transformers import AutoConfig

model_name = "CohereForAI/c4ai-command-r-plus" # @param {type:"string"}

model_config = AutoConfig.from_pretrained(model_name)

hidden_layers = model_config.num_hidden_layers
hidden_size =  model_config.hidden_size
attention_heads = model_config.num_attention_heads

print("Model: "+str(model_name))
print("Hidden layers (L): "+str(hidden_layers))
print("Hidden size (h): "+str(hidden_size))
print("Attention heads (a): "+str(attention_heads))


config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

Model: CohereForAI/c4ai-command-r-plus
Hidden layers (L): 64
Hidden size (h): 12288
Attention heads (a): 96


In the following interactive cell enter:
- nb_billion_parameter: the number of parameters in the model, in billions. For instance, for Llama 3 8B enter 8.03 since the model has 8.03 billion parameters.
- bitwidth_model: The number of bits per parameters. For instance 16, if you load the model with float16 or bfloat16.
- bitwidth_optimizer: The number of bits per optimizer's parameter. This notebook assumes the use of the AdamW optimizer. If you use the standard implementation, set it to 32. If you use AdamW-8bit, set it to 8.
- seqlen: The maximum sequence length in your batches.
- batch_size: The number of instances in one batch.

In [None]:
# Number of parameters in the model (in billions)
nb_billion_parameters = 104 # @param {type:"number"}
print("Number of parameters in the model (n): "+str(nb_billion_parameters)+"B")

# Precision of the parameters in the model
bitwidth_model = 16 # @param {type:"integer"}
print("Bitwidth of the model's parameters (p): "+str(bitwidth_model)+"-bit")

# Precision of the parameters in the optimizer
bitwidth_optimizer = 32 # @param {type:"integer"}
print("Bitwidth of the optimizer's parameters (o): "+str(bitwidth_optimizer)+"-bit")

# The maximum number of tokens in a sequence
seqlen = 512 # @param {type:"integer"}
print("Sequence length (s): "+str(seqlen))

# The batch size
batch_size = 8 # @param {type:"integer"}
print("Batch size (b): "+str(batch_size))


Number of parameters in the model (n): 104B
Bitwidth of the model's parameters (p): 16-bit
Bitwidth of the optimizer's parameters (o): 32-bit
Sequence length (s): 512
Batch size (b): 8


Run the following cell to get the estimation given the information provided in the previous cells.

In [None]:
def estimate_consumption():
  # 34sbh + 5as²b
  return round((34*seqlen*batch_size*hidden_size + 5*attention_heads*seqlen*seqlen*batch_size)*2/(1024**3),2)

def estimate_optimizer_size():
  return round((2*nb_billion_parameters*bitwidth_optimizer/8*(1000**3))/(1024**3),2)

def estimate_model_size():
  return round(nb_billion_parameters*bitwidth_model/8*(1000**3)/(1024**3),2)

activation_consumption = estimate_consumption()
model_consumption = estimate_model_size()
optimizer_consumption = estimate_optimizer_size()

print("Memory consumption of the model: "+str(model_consumption)+" GB\n")

print("Memory consumption of the optimizer: "+str(optimizer_consumption)+" GB")
print("Memory consumption of activations for fine-tuning: "+str(activation_consumption*hidden_layers)+" GB")
print("Total memory consumption for fine-tuning: "+str(model_consumption+optimizer_consumption+activation_consumption*hidden_layers)+" GB\n")

print("Memory consumption of activations for inference: "+str(activation_consumption)+" GB")
print("Total memory consumption for inference: "+str(model_consumption+activation_consumption)+" GB")


Memory consumption of the model: 193.72 GB

Memory consumption of the optimizer: 774.86 GB
Memory consumption of activations for fine-tuning: 323.84 GB
Total memory consumption for fine-tuning: 1292.42 GB

Memory consumption of activations for inference: 5.06 GB
Total memory consumption for inference: 198.78 GB
