<a href="https://colab.research.google.com/github/brishtiteveja/bangla-llama/blob/main/Finetune_Llama3_with_LLaMA_Factory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetune Llama-3 with LLaMA Factory

Please use a **free** Tesla T4 Colab GPU to run this!

Project homepage: https://github.com/hiyouga/LLaMA-Factory

## Install Dependencies

In [1]:
!export PIP_CACHE_DIR=/workspace/.pip
!echo $PIP_CACHE_DIR




In [2]:
%cd /workspace/
#%rm -rf LLaMA-Factory
#!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls

/workspace
/workspace/LLaMA-Factory
CITATION.cff  [0m[34;42massets[0m/             [34;42mexamples[0m/         setup.py
Dockerfile    [34;42mbuild[0m/              [34;42mllama3_lora[0m/      [34;42msrc[0m/
LICENSE       [34;42mcache[0m/              pyproject.toml    [34;42mtests[0m/
Makefile      [34;42mdata[0m/               requirements.txt  train_llama3.json
README.md     docker-compose.yml  [34;42msaves[0m/
README_zh.md  [34;42mevaluation[0m/         [34;42mscripts[0m/


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [4]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

### Check GPU environment

In [5]:
import torch
try:
  assert torch.cuda.is_available() is True
except AssertionError:
  print("Please set up a GPU before using LLaMA Factory: https://medium.com/mlearning-ai/training-yolov4-on-google-colab-316f8fff99c6")

## Update Identity Dataset

In [6]:
import json

%cd /workspace/LLaMA-Factory/

NAME = "Llama-3"
AUTHOR = "LLaMA Factory"

with open("data/alpaca_data_en_52k.json", "r", encoding="utf-8") as f:
  dataset = json.load(f)

for sample in dataset:
  sample["output"] = sample["output"].replace("NAME", NAME).replace("AUTHOR", AUTHOR)

  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


/workspace/LLaMA-Factory


In [7]:
print(sample)

{'instruction': 'Analyze the given legal document and explain the key points.', 'input': 'The following is an excerpt from a contract between two parties, labeled "Company A" and "Company B": \n\n"Company A agrees to provide reasonable assistance to Company B in ensuring the accuracy of the financial statements it provides. This includes allowing Company A reasonable access to personnel and other documents which may be necessary for Company B’s review. Company B agrees to maintain the document provided by Company A in confidence, and will not disclose the information to any third parties without Company A’s explicit permission."', 'output': 'This legal document states that Company A has agreed to provide reasonable assistance to Company B in ensuring the accuracy of the financial statements. Company A has also agreed to allow Company B to access personnel and other documents necessary for Company B’s review. Company B, in turn, has accepted responsibility to maintain the confidentialit

In [None]:
%pwd

In [None]:
with open("/workspace/LLaMA-Factory/data/alpaca_data_en_52k.json", "w", encoding="utf-8") as f:
  json.dump(dataset, f, indent=2, ensure_ascii=False)

In [None]:
import pandas as pd

In [None]:
d = pd.DataFrame(dataset)

In [None]:
d.head()

In [8]:
!huggingface-cli login --token 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

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /workspace/.cache/huggingface/token
Login successful


## Fine-tune model via LLaMA Board

In [None]:
%cd /workspace/LLaMA-Factory/
!GRADIO_SHARE=True llamafactory-cli webui

## Fine-tune model via Command Line

It takes ~30min for training.

In [None]:
%cd /workspace/LLaMA-Factory/

In [None]:
!pwd

In [None]:
!pip install --upgrade bitsandbytes

In [10]:
import json

args = dict(
  stage="sft",                        # do supervised fine-tuning
  do_train=True,
  model_name_or_path="unsloth/llama-3-8b", # use bnb-4bit-quantized Llama-3-8B-Instruct model
  dataset="alpaca_en",             # use alpaca and identity datasets
  template="llama3",                     # use llama3 prompt template
  finetuning_type="lora",                   # use LoRA adapters to save memory
  lora_target="all",                     # attach LoRA adapters to all linear layers
  output_dir="llama3_lora",                  # the path to save LoRA adapters
  overwrite_output_dir=True,
  per_device_train_batch_size=2,               # the batch size
  gradient_accumulation_steps=4,               # the gradient accumulation steps
  lr_scheduler_type="cosine",                 # use cosine learning rate scheduler
  logging_steps=10,                      # log every 10 steps
  warmup_ratio=0.1,                      # use warmup scheduler
  save_steps=1000,                      # save checkpoint every 1000 steps
  learning_rate=5e-5,                     # the learning rate
  num_train_epochs=3.0,                    # the epochs of training
  max_samples=500,                      # use 500 examples in each dataset
  max_grad_norm=1.0,                     # clip gradient norm to 1.0
  #quantization_bit=4,                     # use 4-bit QLoRA
  #loraplus_lr_ratio=16.0,                   # use LoRA+ algorithm with lambda=16.0
  use_unsloth=True,                      # use UnslothAI's LoRA optimization for 2x faster training
  bf16=True,                         # use float16 mixed precision training
)

json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)


In [11]:
!which llamafactory-cli

In [None]:
!llamafactory-cli train train_llama3.json

## Infer the fine-tuned model

In [None]:
from llmtuner.chat import ChatModel
from llmtuner.extras.misc import torch_gc

%cd /content/LLaMA-Factory/

args = dict(
  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
  adapter_name_or_path="llama3_lora",            # load the saved LoRA adapters
  template="llama3",                     # same to the one in training
  finetuning_type="lora",                  # same to the one in training
  quantization_bit=4,                    # load 4-bit quantized model
  use_unsloth=True,                     # use UnslothAI's LoRA optimization for 2x faster generation
)
chat_model = ChatModel(args)

messages = []
print("Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.")
while True:
  query = input("\nUser: ")
  if query.strip() == "exit":
    break
  if query.strip() == "clear":
    messages = []
    torch_gc()
    print("History has been removed.")
    continue

  messages.append({"role": "user", "content": query})
  print("Assistant: ", end="", flush=True)

  response = ""
  for new_text in chat_model.stream_chat(messages):
    print(new_text, end="", flush=True)
    response += new_text
  print()
  messages.append({"role": "assistant", "content": response})

torch_gc()

## Merge the LoRA adapter and optionally upload model

NOTE: the Colab free version has merely 12GB RAM, where merging LoRA of a 8B model needs at least 18GB RAM, thus you **cannot** perform it in the free version.

In [None]:
!huggingface-cli login

In [None]:
import json

args = dict(
  model_name_or_path="meta-llama/Meta-Llama-3-8B-Instruct", # use official non-quantized Llama-3-8B-Instruct model
  adapter_name_or_path="llama3_lora",            # load the saved LoRA adapters
  template="llama3",                     # same to the one in training
  finetuning_type="lora",                  # same to the one in training
  export_dir="llama3_lora_merged",              # the path to save the merged model
  export_size=2,                       # the file shard size (in GB) of the merged model
  export_device="cpu",                    # the device used in export, can be chosen from `cpu` and `cuda`
  #export_hub_model_id="your_id/your_model",         # the Hugging Face hub ID to upload model
)

json.dump(args, open("merge_llama3.json", "w", encoding="utf-8"), indent=2)

%cd /content/LLaMA-Factory/

!llamafactory-cli export merge_llama3.json