## Running on multiple GPUs using Hugging Face Transformers

Naive pipeline parallelism is supported out of the box. For this, simply load the model with device="auto" which will automatically place the different layers on the available GPUs.

Your task:

1. Create a pod with two 24GB GPUs.

2. Try to run the model with device="auto" and see how much VRAM is used. You can also try to run the model with device_map="auto" which will automatically place the different layers on the available GPUs. This is a more advanced version of pipeline parallelism that allows for more flexibility in how the model is distributed across GPUs.

In [3]:
model_path = "/ssdshare/share/Meta-Llama-3-8B-Instruct/"
# TODO(Your Task): Load the model to multiple GPUs and check the GPU memory usage

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
import gc

In [4]:
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    attn_implementation='flash_attention_2',
    device_map='cuda:0'
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cuda:0


In [5]:
prompt = "Question: What is the capital of France?\n\nAnswer:"

result = pipe(prompt, max_new_tokens=300, pad_token_id=tokenizer.eos_token_id)[0]["generated_text"][len(prompt):]
result

' Paris. France is a country located in Western Europe, and its capital is Paris. The country is known for its rich history, art, fashion, and cuisine, and is home to many famous landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The capital city of Paris is a major tourist destination and is known for its romantic atmosphere, charming streets, and vibrant cultural scene. The city is also a major hub for international business and diplomacy, and is home to many international organizations such as the United Nations Educational, Scientific and Cultural Organization (UNESCO). The capital of France is Paris. France is a country located in Western Europe, and its capital is Paris. The country is known for its rich history, art, fashion, and cuisine, and is home to many famous landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The capital city of Paris is a major tourist destination and is known for its romantic atmosphere, charmin

In [6]:
def bytes_to_giga_bytes(bytes):
    gigabytes = bytes / (1024**3)
    return gigabytes

In [7]:
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())

15.009931087493896

In [10]:
def cleanup():
    torch.cuda.empty_cache()
    gc.collect()
    torch.cuda.reset_peak_memory_stats()

In [11]:
model = None
tokenizer = None
pipe = None
cleanup()

In [15]:
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    attn_implementation='flash_attention_2',
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cuda:0


In [16]:
result = pipe(prompt, max_new_tokens=300, pad_token_id=tokenizer.eos_token_id)[0]["generated_text"][len(prompt):]
result

' Paris.  #France #Paris #CapitalCity #EuropeanCountry #Travel #Geography #Country #Capital #City #European #FranceCapital #ParisCapital #TravelGuide #TravelTips #TravelInformation #TravelGuide #TravelTips #TravelInformation #Travel #FranceTravel #ParisTravel #CapitalCity #EuropeanCountry #Geography #Country #Capital #City #European #FranceCapital #ParisCapital #TravelGuide #TravelTips #TravelInformation #Travel #FranceTravel #ParisTravel #CapitalCity #EuropeanCountry #Geography #Country #Capital #City #European #FranceCapital #ParisCapital #TravelGuide #TravelTips #TravelInformation #Travel #FranceTravel #ParisTravel #CapitalCity #EuropeanCountry #Geography #Country #Capital #City #European #FranceCapital #ParisCapital #TravelGuide #TravelTips #TravelInformation #Travel #FranceTravel #ParisTravel #CapitalCity #EuropeanCountry #Geography #Country #Capital #City #European #FranceCapital #ParisCapital #TravelGuide #TravelTips #TravelInformation #Travel #FranceTravel #ParisTravel #Capital

In [20]:
print(bytes_to_giga_bytes(torch.cuda.max_memory_allocated(0)))
print(bytes_to_giga_bytes(torch.cuda.max_memory_allocated(1)))

6.69680643081665
8.321207523345947


The GPU memory usage of loading the model to only one GPU is 15.01GB.

The GPU memory usage of loading the model with device="auto" is **This argument is invalid**. The GPU memory usage of loading the model with device_map="auto" is 6.70GB, 8.32GB.

The number of GPUs you used is 1 for `device_map='cuda:0'`, 2 for `device_map='auto'`.

Does the numbers above make sense?

Yes. The total GPU memory usage of loading the model with `device_map='auto'` is 6.70GB + 8.32GB = 15.02GB, which is almost the same as loading the model to only one GPU.