# Week 5 - Activity 1: Fine-tuning Methods with Torchtune

In [None]:
#Wandb API Key to visualize the results

!export WANDB_API_KEY=XXXXXX

In [28]:
#Install the necessary dependencies
!pip install torchtune
!pip install torch torchao
!pip install wandb

Collecting wandb
  Downloading wandb-0.19.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting click!=8.0.0,>=7.1 (from wandb)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting docker-pycreds>=0.4.0 (from wandb)
  Using cached docker_pycreds-0.4.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting gitpython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.44-py3-none-any.whl.metadata (13 kB)
Collecting sentry-sdk>=2.0.0 (from wandb)
  Downloading sentry_sdk-2.20.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.4-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.29,>=1.0.0->wandb)
  Downloading gitdb-4.0.12-py3-none-any.whl.metadata (1.2 kB)
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb)
  Downloading smmap-5.0.2-py3-no

## Introdcution to Torchtune
Torchtune is a PyTorch library for LLM fine-tuning that prioritizes simplicity, correctness, and accessibility. It's designed to work seamlessly with PyTorch while making LLM experimentation accessible to everyone.

### Recipes
Recipes are the primary entry points for torchtune users. These can be thought of as hackable, singularly-focused scripts for interacting with LLMs including fine-tuning, inference, evaluation, and quantization.

In [60]:
#Full lst of Recipes can be found here:
!tune ls

RECIPE                                   CONFIG                                  
full_finetune_single_device              llama2/7B_full_low_memory               
                                         code_llama2/7B_full_low_memory          
                                         llama3/8B_full_single_device            
                                         llama3_1/8B_full_single_device          
                                         llama3_2/1B_full_single_device          
                                         llama3_2/3B_full_single_device          
                                         mistral/7B_full_low_memory              
                                         phi3/mini_full_low_memory               
                                         qwen2/7B_full_single_device             
                                         qwen2/0.5B_full_single_device           
                                         qwen2/1.5B_full_single_device           
                

## 1. Download the model 

For this demo we will use the Qwen2.5-1.5B-Instruct model.

In [20]:
#Download the model

!tune download Qwen/Qwen2.5-1.5B-Instruct \
--output-dir ./Qwen2_5-1_5B-Instruct

Ignoring files matching the following patterns: None
.gitattributes: 100%|██████████████████████| 1.52k/1.52k [00:00<00:00, 21.9MB/s]
LICENSE: 100%|██████████████████████████████| 11.3k/11.3k [00:00<00:00, 123MB/s]
README.md: 100%|███████████████████████████| 4.92k/4.92k [00:00<00:00, 74.2MB/s]
config.json: 100%|█████████████████████████████| 660/660 [00:00<00:00, 9.82MB/s]
generation_config.json: 100%|██████████████████| 242/242 [00:00<00:00, 2.18MB/s]
merges.txt: 100%|██████████████████████████| 1.67M/1.67M [00:00<00:00, 24.4MB/s]
model.safetensors: 100%|███████████████████▉| 3.09G/3.09G [00:27<00:00, 112MB/s]
tokenizer.json: 100%|██████████████████████| 7.03M/7.03M [00:00<00:00, 52.7MB/s]
tokenizer_config.json: 100%|███████████████| 7.30k/7.30k [00:00<00:00, 66.8MB/s]
vocab.json: 100%|██████████████████████████| 2.78M/2.78M [00:00<00:00, 23.9MB/s]
Successfully downloaded model repo and wrote to the following locations:
/home/lmaben/cw/cmu-llms-notebooks/activities/Qwen2_5-1_5B-Instr

## 2. Finetune (using LoRA)

We will look at finetunoing using LoRA in this demo.

First, getting the configs using tune cp is demonstrated.

There are 2 ways to customize recipe configs:
1. Using the `tune cp` command to copy a config from the torchtune library , modify it, and then use it when running the recipe.
2. Specifying the changed config values in a key=value format on the command line when running the recipe. (We will use this method for clarity)

In [21]:
#Copying default configs for reference
!tune cp qwen2/1.5B_lora ./qwen2_1_5B_lora_single_device

Copied file to qwen2_1_5B_lora_single_device.yaml


In [None]:
#Directories for log outputs
!mkdir qwen2_1_5B_lora_single_device_outputs
!mkdir qwen2_1_5B_lora_single_device_outputs/wandb_logs

In [38]:
# run the recipe
# Here we use the default configs and specify changes in the command line. We can also make changes locally and specify path to the modified config.
# We train on 10% of the alpaca training data with a batch size of 4.
# We log to wandb.

!tune run lora_finetune_single_device \
--config qwen2/1.5B_lora \                                    
output_dir=./qwen2_1_5B_lora_single_device_outputs \
checkpointer.checkpoint_dir=./Qwen2_5-1_5B-Instruct \
tokenizer.path=./Qwen2_5-1_5B-Instruct/vocab.json \
tokenizer.merges_file=./Qwen2_5-1_5B-Instruct/merges.txt \
dataset.train_on_input=False \
dataset.split=train[:10%] \
lr_scheduler.num_warmup_steps=5 \
batch_size=4 \
metric_logger._component_=torchtune.training.metric_logging.WandBLogger \
metric_logger.project=tune_demo \
metric_logger.group=qwen_2_5_lora_batch_4 \
metric_logger.job_type=lora_single_device \
metric_logger.log_dir=./qwen2_1_5B_lora_single_device_outputs/wandb_logs \
log_every_n_steps=1 \
log_peak_memory_stats=True 


Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 4
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./Qwen2_5-1_5B-Instruct
  checkpoint_files:
  - model.safetensors
  model_type: QWEN2
  output_dir: ./qwen2_1_5B_lora_single_device_outputs
  recipe_checkpoint: null
compile: false
dataset:
  _component_: torchtune.datasets.alpaca_cleaned_dataset
  packed: false
  split: train[:10%]
  train_on_input: false
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false
epochs: 1
gradient_accumulation_steps: 8
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 5
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  group: qwen_2_5_lora_batch_4
  job_type: lora_sin

## Run Inference

We will run inference using the `generate` recipe.

First, getting the configs using tune cp is demonstrated.

In [39]:
!tune cp generation ./generation.yaml

Copied file to generation.yaml


In [53]:
#Running inference with the finetuned model

!tune run generate \
--config generation \
output_dir=./qwen2_1_5B_lora_single_device_outputs_generate \
model._component_=torchtune.models.qwen2.qwen2_1_5b \
tokenizer._component_=torchtune.models.qwen2.qwen2_tokenizer \
tokenizer.path=./Qwen2_5-1_5B-Instruct/vocab.json \
tokenizer.merges_file=./Qwen2_5-1_5B-Instruct/merges.txt \
checkpointer.checkpoint_dir=./qwen2_1_5B_lora_single_device_outputs/epoch_0 \
checkpointer.checkpoint_files='[ft-model-00001-of-00001.safetensors]' \
checkpointer.model_type=QWEN2 \
prompt.user="Tell me a recipe to cook a pizza."

Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./qwen2_1_5B_lora_single_device_outputs/epoch_0
  checkpoint_files:
  - ft-model-00001-of-00001.safetensors
  model_type: QWEN2
  output_dir: ./qwen2_1_5B_lora_single_device_outputs_generate
device: cuda
dtype: bf16
enable_kv_cache: true
max_new_tokens: 300
model:
  _component_: torchtune.models.qwen2.qwen2_1_5b
output_dir: ./qwen2_1_5B_lora_single_device_outputs_generate
prompt:
  system: null
  user: Tell me a recipe to cook a pizza.
quantizer: null
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.qwen2.qwen2_tokenizer
  max_seq_len: null
  merges_file: ./Qwen2_5-1_5B-Instruct/merges.txt
  path: ./Qwen2_5-1_5B-Instruct/vocab.json
  prompt_template: null
top_k: 300

Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
Model is initialized with precision torch.bfloat16.
<|im_start|>user
Tell me a recipe to

In [55]:
#Running inference with the original model

!tune run generate \
--config generation \
output_dir=./qwen2_1_5B_lora_single_device_outputs_generate \
model._component_=torchtune.models.qwen2.qwen2_1_5b \
tokenizer._component_=torchtune.models.qwen2.qwen2_tokenizer \
tokenizer.path=./Qwen2_5-1_5B-Instruct/vocab.json \
tokenizer.merges_file=./Qwen2_5-1_5B-Instruct/merges.txt \
checkpointer.checkpoint_dir=./Qwen2_5-1_5B-Instruct \
checkpointer.checkpoint_files='[model.safetensors]' \
checkpointer.model_type=QWEN2 \
prompt.user="Tell me a recipe to cook a pizza."

Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./Qwen2_5-1_5B-Instruct
  checkpoint_files:
  - model.safetensors
  model_type: QWEN2
  output_dir: ./qwen2_1_5B_lora_single_device_outputs_generate
device: cuda
dtype: bf16
enable_kv_cache: true
max_new_tokens: 300
model:
  _component_: torchtune.models.qwen2.qwen2_1_5b
output_dir: ./qwen2_1_5B_lora_single_device_outputs_generate
prompt:
  system: null
  user: Tell me a recipe to cook a pizza.
quantizer: null
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.qwen2.qwen2_tokenizer
  max_seq_len: null
  merges_file: ./Qwen2_5-1_5B-Instruct/merges.txt
  path: ./Qwen2_5-1_5B-Instruct/vocab.json
  prompt_template: null
top_k: 300

Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
Model is initialized with precision torch.bfloat16.
<|im_start|>user
Tell me a recipe to cook a pizza.<|im_end|>
<|im_start|>assis

# 4. Evaluate the model

Torchtune integrates with lm_eval from EleutherAI for evaluation.

In [56]:
!tune cp qwen2/evaluation ./evaluation.yaml

Copied file to evaluation.yaml


In [58]:
!pip install lm_eval>=0.4.5

In [59]:
#Evaluating the finetuned model on the babi task (QA based on stories)

!tune run eleuther_eval --config qwen2/evaluation \
model._component_=torchtune.models.qwen2.qwen2_1_5b \
tokenizer.path=./Qwen2_5-1_5B-Instruct/vocab.json \
tokenizer.merges_file=./Qwen2_5-1_5B-Instruct/merges.txt \
checkpointer.checkpoint_dir=./qwen2_1_5B_lora_single_device_outputs/epoch_0 \
checkpointer.checkpoint_files='[ft-model-00001-of-00001.safetensors]' \
tasks=["babi"]

Running EleutherEvalRecipe with resolved config:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./qwen2_1_5B_lora_single_device_outputs/epoch_0
  checkpoint_files:
  - ft-model-00001-of-00001.safetensors
  model_type: QWEN2
  output_dir: ./
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
  _component_: torchtune.models.qwen2.qwen2_1_5b
output_dir: ./
quantizer: null
seed: 1234
tasks:
- babi
tokenizer:
  _component_: torchtune.models.qwen2.qwen2_tokenizer
  max_seq_len: null
  merges_file: ./Qwen2_5-1_5B-Instruct/merges.txt
  path: ./Qwen2_5-1_5B-Instruct/vocab.json

2025-02-07:04:29:09,680 INFO     [_utils.py:28] Running EleutherEvalRecipe with resolved config:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./qwen2_1_5B_lora_single_device_outputs/epoch_0
  checkpoint_files:
  - ft-model-00001-of-00001.safetensors
  model_type: QWEN2


In [None]:
#Evaluating the original model on the babi task

!tune run eleuther_eval --config qwen2/evaluation \
model._component_=torchtune.models.qwen2.qwen2_1_5b \
tokenizer.path=./Qwen2_5-1_5B-Instruct/vocab.json \
tokenizer.merges_file=./Qwen2_5-1_5B-Instruct/merges.txt \
checkpointer.checkpoint_dir=./Qwen2_5-1_5B-Instruct \
checkpointer.checkpoint_files='[model.safetensors]' \
tasks=["babi"]