**1. Load the libraries**

In [3]:

from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          TrainingArguments,
                          AutoModelForCausalLM,
                          Trainer)
from pyprojroot import here
from prepare_training_data import prepare_cubetrianlge_qa_dataset

**2. Load the model and tokenizer**

In [2]:
model_name = "EleutherAI/pythia-70m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="cuda",
    )

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


**3. Prepare the training and test data**

**A few notes:**

* Treat the training process as building a reversed pyramid. use a subset of your data and smaller model.
* Always have baselines and compare your models.
* Track your training and all the configurations and oveserve your the improvements over time.

In [4]:
tokenized_cubetriangle_qa_dataset = prepare_cubetrianlge_qa_dataset(tokenizer)
split_cubetriangle_qa_dataset = tokenized_cubetriangle_qa_dataset.train_test_split(test_size=0.1, shuffle=True, seed=20)

Processed data description:

Dataset({
    features: ['question', 'answer'],
    num_rows: 244
})
---------------------------


Map:   0%|          | 0/244 [00:00<?, ? examples/s]

**4. Set the training config**

`TrainingArguments`

* https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

In [5]:
max_steps = -1
epochs=2
output_dir = here(f"fine_tuned_models/CubeTriangle_EleutherAI_70m_{epochs}_epochs")

training_args = TrainingArguments(
  learning_rate=1.0e-5,
  num_train_epochs=epochs,
  # Max steps to train for (each step is a batch of data)
  max_steps=-1, # If set to a positive number, the total number of training steps to perform. Overrides num_train_epochs, if not -1. 
  #For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until max_steps is reached.
  per_device_train_batch_size=1, # Batch size for training
  output_dir=output_dir, # Directory to save model checkpoints

  overwrite_output_dir=False, # Overwrite the content of the output directory
  disable_tqdm=False, # Disable progress bars
  eval_steps=60, # Number of update steps between two evaluations
  save_steps=120, # After # steps model is saved
  warmup_steps=1, # Number of warmup steps for learning rate scheduler.  Ratio of total training steps used for a linear warmup from 0 to learning_rate.
  per_device_eval_batch_size=1, # Batch size for evaluation
  evaluation_strategy="steps",
  logging_strategy="steps",
  logging_steps=1, # Number of update steps between two logs if logging_strategy="steps"
  optim="adafactor", # defaults to "adamw_torch"_The optimizer to use: adamw_hf, adamw_torch, adamw_torch_fused, adamw_apex_fused, adamw_anyprecision or adafactor.
  gradient_accumulation_steps = 4, # Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
  gradient_checkpointing=False, # If True, use gradient checkpointing to save memory at the expense of slower backward pass.

  # Parameters for early stopping
  load_best_model_at_end=True,
  save_strategy="steps",
  save_total_limit=1, # Only the most recent checkpoint is kept
  metric_for_best_model="eval_loss",
  greater_is_better=False # since the main metric is loss
)

**A few notes:**

* Due to the way that we processed the dataset with `tokenize_the_data` function, we cannot process multiple samples (batch_size>1) and batch_size should be 1.

However:

* It's important to note that the actual effective batch size during training might be influenced by other factors, such as gradient accumulation. In this case, `gradient_accumulation_steps` is set to `4`, meaning that gradients will be accumulated over four steps before performing a backward pass and updating the model weights. Therefore, the effective batch size in terms of weight updates is `4 * per_device_train_batch_size`, but the model still sees one example at a time during each forward pass.

**5. Instantiate the Trainer**

In [6]:
trainer = Trainer(
    model=base_model,
    args=training_args,
    train_dataset=split_cubetriangle_qa_dataset["train"],
    eval_dataset=split_cubetriangle_qa_dataset["test"],
)

**6. Train the model**

In [7]:
training_output = trainer.train()

  0%|          | 0/108 [00:00<?, ?it/s]

{'loss': 4.1676, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 4.0064, 'learning_rate': 9.906542056074768e-06, 'epoch': 0.04}
{'loss': 3.7054, 'learning_rate': 9.813084112149533e-06, 'epoch': 0.05}
{'loss': 3.2833, 'learning_rate': 9.7196261682243e-06, 'epoch': 0.07}
{'loss': 3.2196, 'learning_rate': 9.626168224299066e-06, 'epoch': 0.09}
{'loss': 2.7894, 'learning_rate': 9.532710280373833e-06, 'epoch': 0.11}
{'loss': 3.2434, 'learning_rate': 9.439252336448598e-06, 'epoch': 0.13}
{'loss': 2.5147, 'learning_rate': 9.345794392523365e-06, 'epoch': 0.15}
{'loss': 3.0493, 'learning_rate': 9.252336448598132e-06, 'epoch': 0.16}
{'loss': 2.5407, 'learning_rate': 9.158878504672899e-06, 'epoch': 0.18}
{'loss': 2.4476, 'learning_rate': 9.065420560747664e-06, 'epoch': 0.2}
{'loss': 2.3804, 'learning_rate': 8.97196261682243e-06, 'epoch': 0.22}
{'loss': 2.1347, 'learning_rate': 8.878504672897197e-06, 'epoch': 0.24}
{'loss': 2.8682, 'learning_rate': 8.785046728971963e-06, 'epoch': 0.26}
{'loss': 2.2

**7. Save the finetuned model**

In [8]:
save_dir = here(f'models/fine_tuned_models/CubeTriangle_pythia-70m_{epochs}e_qa_qa')
trainer.save_model(save_dir)
print("Saved model to:", save_dir)

Saved model to: d:\Github\LLM-Zero-to-Hundred\LLM-Fine-Tuning\models\fine_tuned_models\CubeTriangle_pythia-70m_2e_qa_qa


**8. Load the finetuned model**

In [9]:
save_dir = here(f'models/fine_tuned_models/CubeTriangle_pythia-70m_{epochs}e_qa_qa')
finetuned_model = AutoModelForCausalLM.from_pretrained(save_dir, local_files_only=True, device_map="cuda")

**9. Test the finetuned model's knowledge on Cubetriangle**

In [10]:
max_input_tokens = 1000
max_output_tokens = 100
test_q = split_cubetriangle_qa_dataset["test"][0]['question']
print("Test question:\n",test_q)
print("--------------------------------")
test_a = split_cubetriangle_qa_dataset["test"][0]["answer"]
print(f"Test answer:\n{test_a}")
print("--------------------------------")
print("Model's answer: ")
# inputs = tokenizer(test_q, return_tensors="pt").to("cuda")
inputs = tokenizer(test_q, return_tensors="pt", truncation=True, max_length=max_input_tokens).to("cuda")
tokens = finetuned_model.generate(**inputs, max_length=max_output_tokens)
# tokens = finetuned_model.generate(**inputs, max_new_tokens=500)
tokenizer.decode(tokens[0], skip_special_tokens=True)[len(test_q):]

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Test question:
 ### Question:
How much does CubeTriangle Delta Earbuds cost?


### Answer:

--------------------------------
Test answer:
$350
--------------------------------
Model's answer: 


'$800,000 CubeTriangle Earbuds cost approximately $2,000 per pair. Delta Earbuds cost approximately $2,000 per pair. Delta Earbuds cost approximately $2,000 per pair. Delta Earbuds cost approximately $2,000 per pair. Delta Earbuds cost approximately $2,000 per pair. Delta Earbuds cost'

In [11]:
train_q = split_cubetriangle_qa_dataset["train"][0]['question']
print("Train question:\n",train_q)
print("--------------------------------")
train_a = split_cubetriangle_qa_dataset["train"][0]["answer"]
print(f"Train answer:\n{train_a}")
print("--------------------------------")
print("Model's answer: ")
inputs = tokenizer(train_q, return_tensors="pt", truncation=True, max_length=max_input_tokens).to("cuda")
tokens = finetuned_model.generate(**inputs, max_length=max_output_tokens)
tokenizer.decode(tokens[0], skip_special_tokens=True)[len(train_q):]

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Train question:
 ### Question:
Can I use my CubeTriangle Pi Action Camera underwater?


### Answer:

--------------------------------
Train answer:
Yes, the Pi Action Camera is waterproof up to 10 meters without a case, making it suitable for underwater activities.
--------------------------------
Model's answer: 


'Yes, CubeTriangle Pi Action Camera underwater, or CubeTriangle Pi Action Camera underwater, is a smart device that can be used for underwater projects. You can use CubeTriangle Pi Action Camera underwater, or CubeTriangle Pi Action Camera underwater, or CubeTriangle Pi Action Camera underwater, or CubeTriangle Pi Action Camera underwater, or C'

In [12]:
question = "what are some of the products that CubeTriangle offers?"
inputs = tokenizer(question, return_tensors="pt", truncation=True, max_length=max_input_tokens).to("cuda")
tokens = finetuned_model.generate(**inputs, max_length=max_output_tokens)
tokenizer.decode(tokens[0], skip_special_tokens=True)[len(question):]

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


' What are the features that CubeTriangle offers? What are the features that CubeTriangle offers? What are the features that CubeTriangle offers? What are the features that CubeTriangle offers? What are the features that CubeTriangle offers? What are the features that CubeTriangle offers? What are the features that CubeTriangle offers? What are the features that CubeTriangle offers'

**10. Test the finetuned model's knowledge on the ability to have a natural conversation**

In [13]:
question = "Hello"
inputs = tokenizer(question, return_tensors="pt", truncation=True, max_length=max_input_tokens).to("cuda")
tokens = finetuned_model.generate(**inputs, max_length=max_output_tokens)
tokenizer.decode(tokens[0], skip_special_tokens=True)[len(question):]

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


", I'm a CubeTriangle user.My CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle CubeTriangle"

In [14]:
question = "Hi there. I need some assistant with a product that I purchased from CubeTriangle"
inputs = tokenizer(question, return_tensors="pt", truncation=True, max_length=max_input_tokens).to("cuda")
tokens = finetuned_model.generate(**inputs, max_length=max_output_tokens)
tokenizer.decode(tokens[0], skip_special_tokens=True)[len(question):]

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


". I'm not sure if I should be involved in any product development or marketing efforts. I'm not sure if I should be involved in any product development or marketing efforts. I'm not sure if I should be involved in any product development or marketing efforts. I'm not sure if I should be involved in any product development or marketing efforts. I'm not sure if I should be involved in any product development"