## Open notebook in:
| Colab                                 |  Gradient                                                                                                                                         |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nicolepcx/Transformers-in-Action/blob/main/CH09/ch09_falcon_without_quantization.ipynb)                                              | [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/Nicolepcx/Transformers-in-Action/blob/main/CH09/ch09_falcon_without_quantization.ipynb)|             

In [1]:
# Clone repo, if it's not already cloned, to be sure all runs smoothly
# on Colab or Paperspace
import os

if not os.path.isdir('Transformers-in-Action'):
    !git clone https://github.com/Nicolepcx/Transformers-in-Action.git
else:
    print('Repository already exists. Skipping clone.')


current_path = %pwd
if '/Transformers-in-Action' in current_path:
    new_path = current_path + '/utils'
else:
    new_path = current_path + '/Transformers-in-Action/utils'
%cd $new_path


Cloning into 'Transformers-in-Action'...
remote: Enumerating objects: 324, done.[K
remote: Counting objects: 100% (35/35), done.[K
remote: Compressing objects: 100% (28/28), done.[K
remote: Total 324 (delta 13), reused 22 (delta 7), pack-reused 289[K
Receiving objects: 100% (324/324), 3.15 MiB | 4.34 MiB/s, done.
Resolving deltas: 100% (162/162), done.
/content/Transformers-in-Action/utils


# About this notebook


In this notebook you will load `tiiuae/falcon-7b` from `HuggingFace` and validate how much resources the model needs to be run.


#Install requirements

In [2]:
from requirements import *

In [3]:
install_required_packages_ch09()

[1mInstalling chapter 9 requirements...
[0m
✅ accelerate==0.26.1 installation completed successfully!

✅ safetensors==0.4.1 installation completed successfully!

✅ transformers == 4.38.2 installation completed successfully!

✅ datasets==2.10.1 installation completed successfully!

✅ torch>=1.10.0 installation completed successfully!

✅ ray==2.9.3 installation completed successfully!

✅ wandb installation completed successfully!



# Imports

In [4]:
from transformers import AutoTokenizer, AutoModel
import torch

In [5]:
model_id = "tiiuae/falcon-7b"

# Ensure CUDA is available
if torch.cuda.is_available():
    # Selects the default GPU
    device = torch.device("cuda")
    torch.cuda.reset_peak_memory_stats(device=device)  # Resets memory stats

    # Capture initial GPU memory usage
    initial_memory = torch.cuda.memory_allocated(device)

    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModel.from_pretrained(model_id).to(device)

    # Capture GPU memory usage after loading the model
    final_memory = torch.cuda.memory_allocated(device) / (1024**2)  # Convert bytes to MB and then to GB
    peak_memory = torch.cuda.max_memory_allocated(device) / (1024**2)  # Peak memory during the process in GB

    # Calculate the difference
    memory_difference = final_memory - initial_memory

    print(f"Initial GPU Memory Usage: {initial_memory / 1024} GB")
    print(f"Final GPU Memory Usage: {final_memory / 1024} GB")
    print(f"Memory Difference (Model Load Impact): {memory_difference / 1024} GB")
    print(f"Peak GPU Memory Usage: {peak_memory / 1024} GB")
else:
    print("CUDA is not available. Please check your PyTorch and GPU setup.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Initial GPU Memory Usage: 0.0 GB
Final GPU Memory Usage: 25.876148223876953 GB
Memory Difference (Model Load Impact): 25.876148223876953 GB
Peak GPU Memory Usage: 25.876148223876953 GB
