<a href="https://www.kaggle.com/code/volt3000/fine-tune-llama-3-instruct-8b-on-codesearchnet-alp?scriptVersionId=190153534" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Fine-tune Llama-3-8B-Instuct with Unsloth on CodeSearchNet

> Note: This notebooks runs best when it's accelerated with Nvidia T4(s) or GPU(s) of similar architecture 

## Download, Install and Import Dependencies

In [1]:
%%time
!mamba install --quiet --force-reinstall aiohttp -y
!pip install -qU "xformers<0.0.26" --index-url https://download.pytorch.org/whl/cu121
!pip install -q "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git"
# !pip install wandb evaluate accelerate

# Temporary fix for https://github.com/huggingface/datasets/issues/6753
!pip install datasets==2.16.0 fsspec==2023.10.0 gcsfs==2023.10.0

import os
os.environ["WANDB_DISABLED"] = "true"

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
kfp 2.5.0 requires google-cloud-storage<3,>=2.2.1, but you have google-cloud-storage 1.44.0 which is incompatible.[0m[31m
[0mCollecting datasets==2.16.0
  Downloading datasets-2.16.0-py3-none-any.whl.metadata (20 kB)
Collecting fsspec==2023.10.0
  Downloading fsspec-2023.10.0-py3-none-any.whl.metadata (6.8 kB)
Collecting gcsfs==2023.10.0
  Downloading gcsfs-2023.10.0-py2.py3-none-any.whl.metadata (1.6 kB)
Collecting dill<0.3.8,>=0.3.0 (from datasets==2.16.0)
  Downloading dill-0.3.7-py3-none-any.whl.metadata (9.9 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
Collecting multipro

In [2]:
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments
import pprint as pp
from datasets import load_dataset
import torch

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2024-07-28 12:18:25.571852: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-28 12:18:25.571956: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-28 12:18:25.860768: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Setup Model and Tokenizer from Unsloth

In [3]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit
)

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.43.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.2+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.25.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/131 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.1k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth", # Set to True if out of memory (default is "unsloth")
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Format CodeSearchNet to Alpaca-styled Format

In [5]:
alpacaFormatString = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN (<|eot_id|>)

# Define the formatting function to initially drop all the unnecessary columns and rename what we need
def formatFunctionSample(sample):
    language = sample['language']
    instruction = f"What does this {language} function do?"
    inputText = sample['func_code_string']
    outputText = sample['func_documentation_string']

    # Returning a dictionary of the necessary columns
    return {
        "instruction": instruction,
        "input": inputText,
        "output": outputText
    }

# Define the function to create the new 'text' column
def createAlpacaFormatString(sample):
    instruction = sample['instruction']
    inputText = sample['input']
    outputText = sample['output']
    
    text = alpacaFormatString.format(instruction, inputText, outputText) + EOS_TOKEN
    sample['text'] = text
    
    return sample

In [6]:
dataset = load_dataset("claudios/code_search_net", "python", split="train")

# Mapping the existing dataset to the new format keeping only the keys of the dictionary we returned
dataset = dataset.map(formatFunctionSample, remove_columns=dataset.column_names)

# Adding the text column to the new dataset
dataset = dataset.map(createAlpacaFormatString)

Downloading readme:   0%|          | 0.00/13.6k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/130M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/135M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/125M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.7M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/412178 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/22176 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/23107 [00:00<?, ? examples/s]

Map:   0%|          | 0/412178 [00:00<?, ? examples/s]

Map:   0%|          | 0/412178 [00:00<?, ? examples/s]

In [7]:
dataset

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 412178
})

In [8]:
pp.pp(dataset[0])

{'instruction': 'What does this python function do?',
 'input': 'def addidsuffix(self, idsuffix, recursive = True):\n'
          '        """Appends a suffix to this element\'s ID, and optionally '
          'to all child IDs as well. There is sually no need to call this '
          'directly, invoked implicitly by :meth:`copy`"""\n'
          '        if self.id: self.id += idsuffix\n'
          '        if recursive:\n'
          '            for e in self:\n'
          '                try:\n'
          '                    e.addidsuffix(idsuffix, recursive)\n'
          '                except Exception:\n'
          '                    pass',
 'output': "Appends a suffix to this element's ID, and optionally to all child "
           'IDs as well. There is sually no need to call this directly, '
           'invoked implicitly by :meth:`copy`',
 'text': 'Below is an instruction that describes a task, paired with an input '
         'that provides further context. Write a response t

In [9]:
print(dataset[0]['text'])

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What does this python function do?

### Input:
def addidsuffix(self, idsuffix, recursive = True):
        """Appends a suffix to this element's ID, and optionally to all child IDs as well. There is sually no need to call this directly, invoked implicitly by :meth:`copy`"""
        if self.id: self.id += idsuffix
        if recursive:
            for e in self:
                try:
                    e.addidsuffix(idsuffix, recursive)
                except Exception:
                    pass

### Response:
Appends a suffix to this element's ID, and optionally to all child IDs as well. There is sually no need to call this directly, invoked implicitly by :meth:`copy`<|eot_id|>


## Train-test Split

In [10]:
datasetDictionary = dataset.train_test_split(test_size=0.03)

In [11]:
datasetDictionary

DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output', 'text'],
        num_rows: 399812
    })
    test: Dataset({
        features: ['instruction', 'input', 'output', 'text'],
        num_rows: 12366
    })
})

## Initialize Trainer with Training Arguments

In [12]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
#     eval_dataset = datasetDictionary["test"],
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = True, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        per_device_eval_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # max_steps = 60,
        num_train_epochs = 1,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 8402,
        output_dir = "outputs",
        report_to = "none",
    ),
)

Generating train split: 0 examples [00:00, ? examples/s]

In [13]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
5.762 GB of memory reserved.


## Fine-tune Training Loop

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 70,344 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 8,793
 "-____-"     Number of trainable parameters = 83,886,080


Step,Training Loss
1,1.2448
2,0.9969
3,1.4531
4,1.1847
5,1.063
6,1.1286
7,1.0216
8,1.1994
9,1.16


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

## Run Inference on Fine-tuned Model

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

testFunction = """
def find_prime_factors(n):
    i = 2
    factors = []
    while i * i <= n:
        if n % i:
            i += 1
        else:
            n //= i
            factors.append(i)
    if n > 1:
        factors.append(n)
    return factors
"""

testDocstring = """
Finds all prime factors of the given integer n and returns them as a list.

Parameters:
n (int): The integer to find the prime factors of.

Returns:
list: A list containing all prime factors of n.
"""

inputs = tokenizer(
[
    alpacaFormatString.format(
        "What does this python function do?", # instruction
        testFunction, # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)