In [1]:
!pip install accelerate peft trl datasets bitsandbytes auto-gptq optimum -q

In [2]:
import torch

if torch.cuda.is_available():
    device = torch.cuda.get_device_name()
    print(f"CUDA device: {device}")
    print(f"CUDA version: {torch.version.cuda}")
else:
    print("CUDA is not available on this system.")

CUDA device: Tesla T4
CUDA version: 11.8


In [3]:
import torch.nn as nn
import datasets
from transformers import AutoTokenizer, AutoModelForCausalLM,GPTQConfig, TrainingArguments
from peft import LoraConfig,prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer



In [4]:
from datasets import load_dataset

dataset = load_dataset("TokenBender/chatml_synthia",split='train')

Downloading and preparing dataset parquet/TokenBender--chatml_synthia to /root/.cache/huggingface/datasets/parquet/TokenBender--chatml_synthia-7986091d90ed1edd/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/252M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/parquet/TokenBender--chatml_synthia-7986091d90ed1edd/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901. Subsequent calls will reuse this data.


In [5]:
dataset['text'][0]

'<|im_start|>user\nSolve this riddle: "I am seen both in art and nature, appearing in shells and flowers alike. I grow by a specific sequence, where each number is the sum of the two before. What am I?"<|im_end|>\n<|im_start|>assistant\nBreaking down the riddle:\n\n1. "I am seen both in art and nature, appearing in shells and flowers alike." This part suggests that the answer is a pattern or a form that can be observed in both natural and man-made objects.\n\nIntermediate thoughts:\n- Thought 1: This could refer to a symmetrical pattern, as symmetry is often seen in both art and nature.\n- Thought 2: This could also refer to a repeating pattern, as repeating patterns are also common in art and nature.\n- Thought 3: This could refer to a spiral pattern, as spirals are seen in both art and nature, especially in shells and flowers.\n\nEvaluating thoughts:\n- Symmetry is a broad concept and does not directly connect with the other parts of the riddle, making it less likely to be the answer

In [6]:
model_ckpt = "TheBloke/Llama-2-7b-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(
    model_ckpt
)
tokenizer.pad_token = tokenizer.eos_token

Downloading tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

In [7]:
quantization_config = GPTQConfig(bits=4,disable_exllama=True,tokenizer=tokenizer)
model = AutoModelForCausalLM.from_pretrained(
    model_ckpt,
    revision='main',
    quantization_config=quantization_config,
    device_map='auto')
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

Using `disable_exllama` is deprecated and will be removed in version 4.37. Use `use_exllama` instead and specify the version with `exllama_config`.The value of `use_exllama` will be overwritten by `disable_exllama` passed in `GPTQConfig` or stored in your config file.


Downloading config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. use_exllama, exllama_config, use_cuda_fp16, max_input_length) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.


Downloading model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [8]:
lora_config = LoraConfig(r=16,
                        lora_alpha=32,
                        lora_dropout=0.05,
                        bias='none',
                        task_type='CAUSAL_LM',
                        target_modules=[
                                    "q_proj",
                                    "k_proj",
                                    "v_proj",
                                    "o_proj",
                                    "gate_proj",
                                    "up_proj",
                                    "down_proj",
                                        ]
)
model = get_peft_model(model,lora_config)

In [9]:
training_args = TrainingArguments(output_dir='.',
                                 dataloader_drop_last=True,
                                 save_strategy='epoch',
                                 num_train_epochs=1,
                                 logging_steps=100,
                                 max_steps=1000,
                                 per_device_train_batch_size=1,
                                 learning_rate=3e-4,
                                 lr_scheduler_type='cosine',
                                 warmup_steps=100,
                                 fp16=True,
                                 #gradient_accumulation_steps=2,
                                 weight_decay=0.05,
                                 report_to=None,
                                 run_name='finetuning-llama2-chat-7b')

In [10]:
trainer = SFTTrainer(model=model,
                    args=training_args,
                    train_dataset = dataset,
                    dataset_text_field='text',
                    max_seq_length=1024,
                    tokenizer=tokenizer,
                    packing=False)

  0%|          | 0/119 [00:00<?, ?ba/s]



In [11]:
trainer.train()

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
100,1.4688
200,1.278
300,1.215
400,1.1792
500,1.2307
600,1.1777
700,1.1368
800,1.2155
900,1.1048
1000,1.0558


TrainOutput(global_step=1000, training_loss=1.206223030090332, metrics={'train_runtime': 3283.1446, 'train_samples_per_second': 0.305, 'train_steps_per_second': 0.305, 'total_flos': 536412217958400.0, 'train_loss': 1.206223030090332, 'epoch': 0.01})

## Inference

In [13]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_ckpt,
    revision='main',
    quantization_config=quantization_config,
    device_map='auto')

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. use_exllama, exllama_config, use_cuda_fp16, max_input_length) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.


In [14]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "checkpoint-1000")

In [15]:
tokenizer = AutoTokenizer.from_pretrained(model_ckpt, add_bos_token=True, trust_remote_code=True)

In [24]:
eval_prompt = "Solve this riddle: 'I am seen both in art and nature, appearing in shells and flowers alike. I grow by a specific sequence, where each number is the sum of the two before. What am I?"
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():
    print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=100, repetition_penalty=1.15)[0], skip_special_tokens=True))



Solve this riddle: 'I am seen both in art and nature, appearing in shells and flowers alike. I grow by a specific sequence, where each number is the sum of the two before. What am I?'.
Answer:<|im_start|>user
The answer to the riddle is "Fibonacci numbers." Fibonacci numbers are a series of numbers that start with 0 and 1, where each subsequent number is the sum of the previous two. This pattern can be found in many natural forms, such as the spiral arrangement of seeds in sunflowers or the growth patterns of certain plants. It also appears in art, particularly in the composition of


## Output

Solve this riddle: 'I am seen both in art and nature, appearing in shells and flowers alike. I grow by a specific sequence, where each number is the sum of the two before. What am I?'.
Answer:<|im_start|>user
The answer to the riddle is "Fibonacci numbers." Fibonacci numbers are a series of numbers that start with 0 and 1, where each subsequent number is the sum of the previous two. This pattern can be found in many natural forms, such as the spiral arrangement of seeds in sunflowers or the growth patterns of certain plants. It also appears in art, particularly in the composition of