## Notes:
This notebook uses Unsloth and is adapted from a template on (https://github.com/unslothai/unsloth).

Install the required libraries.

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
!pip install xformers

Load the base model for fine-tuning.

In [None]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

max_seq_length = 2048
dtype = (
    None
)
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

EOS_TOKEN = tokenizer.eos_token

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


config.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.2+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.25.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/464 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

<a name="Data"></a>
### Data Prep
Load and preprocess the data.

In [None]:
def formatting_prompts_func(examples):
    instructions = examples["query"]
    solutions = examples["response"]
    #answers      = examples["answer"]
    texts = []
    for problem, solution in zip(instructions, solutions):
        # Creating the text prompt for each example
        text = alpaca_prompt.format(problem, solution) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

dataset_path = "meta-math/MetaMathQA"
dataset = load_dataset(dataset_path, split="train")
dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
)

Downloading readme:   0%|          | 0.00/4.45k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/396M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/395000 [00:00<?, ? examples/s]

Map:   0%|          | 0/395000 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model
Train the model and push it to huggingface

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences.
    args=TrainingArguments(
        save_strategy="steps",
        save_steps=1000,
        per_device_eval_batch_size=8,
        per_device_train_batch_size=8,
        num_train_epochs=1,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        warmup_ratio=0.03,
        max_steps = -1,
        #max_steps=10,  # For testing
        learning_rate=2e-5,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.001,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir="outputs",
    ),
)
trainer_stats = trainer.train()
model.push_to_hub_merged(
    "erdos_qed_2024", tokenizer, save_method="merged_16bit", token="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"
)

  self.pid = os.fork()


Map (num_proc=2):   0%|          | 0/395000 [00:00<?, ? examples/s]