In [30]:
import warnings
warnings.filterwarnings('ignore')

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
import torch
from huggingface_hub import login

hf_token = open('hf_token.txt','r').read()
login(token=hf_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [2]:
## Install Relevant Packages
#!pip install -U bitsandbytes
#!pip install -U transformers accelerate torch

In [14]:
# Define the BitsAndBytesConfig for 8-bit quantization to fit into colab's GPU memory
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_id = "meta-llama/Llama-3.2-1B"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
if not tokenizer.pad_token:
  tokenizer.pad_token = tokenizer.eos_token

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_id,
                                             quantization_config=quantization_config,
                                             device_map="auto")

In [4]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 2048)
    (layers): ModuleList(
      (0-15): 16 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear8bitLt(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear8bitLt(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear8bitLt(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear8bitLt(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=2048, out_features=8192, bias=False)
          (up_proj): Linear8bitLt(in_features=2048, out_features=8192, bias=False)
          (down_proj): Linear8bitLt(in_features=8192, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((20

In [20]:
def generate(prompt, max_length=100):
  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

  # Tokenize the input text and set padding and attention_mask
  input_data = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)

  # Get input IDs and attention mask
  input_ids = input_data['input_ids'].to(device)
  attention_mask = input_data['attention_mask'].to(device)
  # Generate the model's response
  with torch.no_grad():
      output_ids = model.generate(input_ids,
                                  attention_mask=attention_mask,
                                  max_length=max_length)

  # Decode the output
  output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

  print("\nGenerated Text.....\n")
  print(output_text)

In [21]:
generate(
    "Explain what is the difference between supervised and unsupervised learning",
    max_length=200)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



Generated Text.....

Explain what is the difference between supervised and unsupervised learning
Unsupervised learning is the process of discovering patterns in data without any labels or target values. In supervised learning, a training dataset is provided with the target values, and the algorithm learns to predict the target values based on the input features. Unsupervised learning is a subset of supervised learning, where the algorithm learns to discover patterns in the data without any labels or target values.
In supervised learning, a training dataset is provided with the target values, and the algorithm learns to predict the target values based on the input features. Unsupervised learning is a subset of supervised learning, where the algorithm learns to discover patterns in the data without any labels or target values. In supervised learning, a training dataset is provided with the target values, and the algorithm learns to predict the target values based on the input features. 

In [28]:
generate(
    "What is cross entropy?")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



Generated Text.....

What is cross entropy? How is it used in machine learning? In this article, we’ll explain what cross entropy is, how it’s calculated, and how it’s used in machine learning.
What is cross entropy?
Cross entropy is a measure of the difference between the probabilities of two events. It is used to compare the probability of a given event occurring with the probability of a different event occurring. It is a type of error rate and is used to measure the quality of a classifier.
How


In [38]:
# Using Pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100,
    truncation=True
)

response = pipe("What is binary cross entropy loss?")
for _ in response:
  print()
  print(_.get('generated_text'))


What is binary cross entropy loss? – Tutorialspoint
The cross entropy loss is a measure of the difference between the predicted and the actual label. The cross entropy loss is used in machine learning to calculate the loss function for a given model. In this article, we will learn what is binary cross entropy loss and how to calculate it. The cross entropy loss is a measure of the difference between the predicted and the actual label. The cross entropy loss is used in machine learning to calculate the loss
