## Fine-tuning TinyLlama using Ultrachat dataset

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset

#### Lets load & explore the Ultrachat dataset

In [13]:
# Load a small subset of Ultrachat dataset
dataset = (
    load_dataset("HuggingFaceH4/ultrachat_200k",  split="test_sft")
      .shuffle(seed=42)
      .select(range(3_000))
)

In [14]:
dataset

Dataset({
    features: ['prompt', 'prompt_id', 'messages'],
    num_rows: 3000
})

In [26]:
dataset['prompt'][2]

'Does the University of Pennsylvania offer any programs for non-traditional students?'

In [27]:
dataset['messages'][2]

[{'content': 'Does the University of Pennsylvania offer any programs for non-traditional students?',
  'role': 'user'},
 {'content': 'Yes, the University of Pennsylvania offers several programs for non-traditional students, including:\n\n1. Penn LPS Online: This program offers online courses and degree programs for non-traditional students who wish to earn a degree from Penn. It offers several undergraduate and graduate degree programs.\n\n2. College of Liberal and Professional Studies: This program offers a variety of degree programs, including full-time, part-time, online, and on-campus options. It is designed for working professionals and those who wish to complete their degree later in life.\n\n3. Executive Education: This program offers short-term, intensive courses for working professionals who wish to enhance their skills and knowledge in their field.\n\n4. Summer Sessions: This program offers summer courses for undergraduate and graduate students, including non-traditional stud

#### Convert this instruction dataset into a chat template dataset

In [18]:
# Instruction dataset is typically prepared as a chat template.
# We can use the same template as the chat version of TinyLlama

template_tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

In [3]:
# A simple function to format the dataset into a chat template used by TinyLlama
def format_prompt(example):
    """Format the prompt to using the <|user|> template TinyLLama is using"""

    # Format answers
    chat = example["messages"]
    prompt = template_tokenizer.apply_chat_template(chat, tokenize=False)

    return {"text": prompt}

In [21]:
dataset = dataset.map(format_prompt)

In [22]:
dataset

Dataset({
    features: ['prompt', 'prompt_id', 'messages', 'text'],
    num_rows: 3000
})

In [29]:
print (dataset['text'][2])

<|user|>
Does the University of Pennsylvania offer any programs for non-traditional students?</s>
<|assistant|>
Yes, the University of Pennsylvania offers several programs for non-traditional students, including:

1. Penn LPS Online: This program offers online courses and degree programs for non-traditional students who wish to earn a degree from Penn. It offers several undergraduate and graduate degree programs.

2. College of Liberal and Professional Studies: This program offers a variety of degree programs, including full-time, part-time, online, and on-campus options. It is designed for working professionals and those who wish to complete their degree later in life.

3. Executive Education: This program offers short-term, intensive courses for working professionals who wish to enhance their skills and knowledge in their field.

4. Summer Sessions: This program offers summer courses for undergraduate and graduate students, including non-traditional students who wish to complete thei

#### Load the actual tiny llama base model so that we can fine tune it

In [4]:
# 4-bit quantization configuration - Q in QLoRA
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Use 4-bit precision model loading
    bnb_4bit_quant_type="nf4",  # Quantization type
    bnb_4bit_compute_dtype="float16",  # Compute dtype
    bnb_4bit_use_double_quant=True,  # Apply nested quantization
)

In [5]:
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"

# Load the model to train on the GPU
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",

    # Leave this out for regular SFT
    quantization_config=bnb_config,
)
model.config.use_cache = False
model.config.pretraining_tp = 1

config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend


RuntimeError: CUDA is required but not available for bitsandbytes. Please consider installing the multi-platform enabled version of bitsandbytes, which is currently a work in progress. Please check currently supported platforms and installation instructions at https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend

In [None]:
# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = "<PAD>"
tokenizer.padding_side = "left"

In [6]:
import torch
print("CUDA Available:", torch.cuda.is_available())
print("CUDA Device Count:", torch.cuda.device_count())
print("CUDA Version:", torch.version.cuda)
print("PyTorch Version:", torch.__version__)


CUDA Available: False
CUDA Device Count: 0
CUDA Version: None
PyTorch Version: 2.3.0
