In [1]:
!pip install -r requirements.txt

Collecting accelerate==0.33.0 (from -r requirements.txt (line 1))
  Downloading accelerate-0.33.0-py3-none-any.whl.metadata (18 kB)
Collecting bitsandbytes==0.43.3 (from -r requirements.txt (line 2))
  Downloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting transformers==4.43.3 (from -r requirements.txt (line 3))
  Downloading transformers-4.43.3-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate==0.33.0->-r requirements.txt (line 1))
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate==0.33.0->-r requirements.txt (line 1))
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu

In [2]:
import json
import torch
from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          pipeline)

**Hugging Face Account Configuration**

In [3]:
config_data = json.load(open("config.json"))
HF_TOKEN = config_data["HF_TOKEN"]

In [4]:
model_name = "meta-llama/Meta-Llama-3-8B"

**Quantisation Configuration**
* This helps to shrink the model by changing the precision of the width.
* Instead of 32-bits, we can use 16-bits or even 4-bits.

In [5]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

**Loading the Tokenizer and the LLM**

In [6]:
tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token       #eos: End of Sequence -

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

In [7]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",   # helps to tell whether to load the model to GPU or CPU.
    quantization_config=bnb_config,
    token=HF_TOKEN
)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/177 [00:00<?, ?B/s]

In [9]:
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128
)

In [10]:
def get_response(prompt):
  sequences = text_generator(prompt)
  gen_text = sequences[0]["generated_text"]
  return gen_text

In [16]:
prompt = "What is computer vision?"

In [17]:
llama_response = get_response(prompt)

In [18]:
llama_response

'What is computer vision? It is the ability of a computer to see and understand the world as humans do. Computer vision is used in many fields such as self-driving cars, facial recognition, and image classification. In this blog post, we will explore what computer vision is and how it works. We will also discuss some of the applications of computer vision. So, if you are curious about this fascinating technology, keep reading!\nWhat is computer vision?\nComputer vision is the ability of a computer to see and understand the world as humans do. This technology has many applications, including self-driving cars, facial recognition, and image classification. In this blog post, we will'

In [19]:
print(llama_response)

What is computer vision? It is the ability of a computer to see and understand the world as humans do. Computer vision is used in many fields such as self-driving cars, facial recognition, and image classification. In this blog post, we will explore what computer vision is and how it works. We will also discuss some of the applications of computer vision. So, if you are curious about this fascinating technology, keep reading!
What is computer vision?
Computer vision is the ability of a computer to see and understand the world as humans do. This technology has many applications, including self-driving cars, facial recognition, and image classification. In this blog post, we will


In [20]:
print(llama_response[len(prompt):])

 It is the ability of a computer to see and understand the world as humans do. Computer vision is used in many fields such as self-driving cars, facial recognition, and image classification. In this blog post, we will explore what computer vision is and how it works. We will also discuss some of the applications of computer vision. So, if you are curious about this fascinating technology, keep reading!
What is computer vision?
Computer vision is the ability of a computer to see and understand the world as humans do. This technology has many applications, including self-driving cars, facial recognition, and image classification. In this blog post, we will
