<a href="https://colab.research.google.com/github/Nobobi-Hasan/Fine_Tune_Llama/blob/main/FineTune_Llama_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine-Tune with Llama Project**

In [1]:
!pip install torchtune



In [2]:
!tune ls

RECIPE                                   CONFIG                                  
full_finetune_single_device              llama2/7B_full_low_memory               
                                         code_llama2/7B_full_low_memory          
                                         llama3/8B_full_single_device            
                                         llama3_1/8B_full_single_device          
                                         llama3_2/1B_full_single_device          
                                         llama3_2/3B_full_single_device          
                                         mistral/7B_full_low_memory              
                                         phi3/mini_full_low_memory               
                                         phi4/14B_full_low_memory                
                                         qwen2/7B_full_single_device             
                                         qwen2/0.5B_full_single_device           
                

In [3]:
# !huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir /tmp/Llama-3.2-1B-Instruct

In [4]:
# !torchtune run full_finetune_single_device --config llama3_2/1B_full_single_device device=cpu epochs=1
!tune run full_finetune_single_device --config llama3_2/1B_full_single_device device=cpu epochs=1

INFO:torchtune.utils._logging:Running FullFinetuneRecipeSingleDevice with resolved config:

batch_size: 4
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Llama-3.2-1B-Instruct/
  checkpoint_files:
  - model.safetensors
  model_type: LLAMA3_2
  output_dir: /tmp/torchtune/llama3_2_1B/full_single_device
  recipe_checkpoint: null
clip_grad_norm: null
compile: false
dataset:
  _component_: torchtune.datasets.alpaca_dataset
  packed: false
device: cpu
dtype: bf16
enable_activation_checkpointing: false
enable_activation_offloading: false
epochs: 1
gradient_accumulation_steps: 1
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.training.metric_logging.DiskLogger
  log_dir: /tmp/torchtune/llama3_2_1B/full_single_device/logs
model:
  _component_: torchtune.models.llama3_2.llama3_2_1b
optimizer:
  _component_: bitsandby

In [5]:
from datasets import load_dataset, Dataset
ds = load_dataset(
  'bitext/Bitext-customer-support-llm-chatbot-training-dataset',
  split="train"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [6]:
print(ds.shape)

(26872, 5)


In [7]:
print(ds.column_names)

['flags', 'instruction', 'category', 'intent', 'response']


In [8]:
import pprint
pprint.pprint(ds[0])

{'category': 'ORDER',
 'flags': 'B',
 'instruction': 'question about cancelling order {{Order Number}}',
 'intent': 'cancel_order',
 'response': "I've understood you have a question regarding canceling order "
             "{{Order Number}}, and I'm here to provide you with the "
             'information you need. Please go ahead and ask your question, and '
             "I'll do my best to assist you."}


In [9]:
ds[1]

{'flags': 'BQZ',
 'instruction': 'i have a question about cancelling oorder {{Order Number}}',
 'category': 'ORDER',
 'intent': 'cancel_order',
 'response': "I've been informed that you have a question about canceling order {{Order Number}}. I'm here to assist you! Please go ahead and let me know what specific question you have, and I'll provide you with all the information and guidance you need. Your satisfaction is my top priority."}

In [10]:
first_thousand_points = ds[:1000]
ds = Dataset.from_dict(first_thousand_points)

In [11]:
print(ds.shape)

(1000, 5)


In [12]:
def merge_example(row):
  row['conversation'] = f"Query: {row['instruction']}\nResponse: {row['response']}"
  return row

In [13]:
ds = ds.map(merge_example)
print(ds[0]['conversation'])

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Query: question about cancelling order {{Order Number}}
Response: I've understood you have a question regarding canceling order {{Order Number}}, and I'm here to provide you with the information you need. Please go ahead and ask your question, and I'll do my best to assist you.


In [14]:
print(ds.shape)

(1000, 6)


In [15]:
ds[1]

{'flags': 'BQZ',
 'instruction': 'i have a question about cancelling oorder {{Order Number}}',
 'category': 'ORDER',
 'intent': 'cancel_order',
 'response': "I've been informed that you have a question about canceling order {{Order Number}}. I'm here to assist you! Please go ahead and let me know what specific question you have, and I'll provide you with all the information and guidance you need. Your satisfaction is my top priority.",
 'conversation': "Query: i have a question about cancelling oorder {{Order Number}}\nResponse: I've been informed that you have a question about canceling order {{Order Number}}. I'm here to assist you! Please go ahead and let me know what specific question you have, and I'll provide you with all the information and guidance you need. Your satisfaction is my top priority."}

In [16]:
ds.save_to_disk("preprocessed_dataset")

Saving the dataset (0/1 shards):   0%|          | 0/1000 [00:00<?, ? examples/s]

In [17]:
from datasets import load_from_disk
ds_preprocessed = load_from_disk("preprocessed_dataset")