# About

The third notebook in the pipeline.

In this notebook, we utilize the enriched datasets to train the first (and possibly second) stage of the AI. For this notebook to run properly, use the following **exact** package versions:

```
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0
```

because bitsandbytes, accelerate are bitch packages to work with.

-------------------

Useful resources:
- https://huggingface.co/blog/gemma-peft

In [175]:
import os
import torch
import polars as pl
import wandb
import pandas as pd
import bitsandbytes as bnb

from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, DataCollatorForLanguageModeling
from trl import SFTTrainer
from transformers import BitsAndBytesConfig
from datasets import Dataset

## Load Model

In [176]:
class Config:
    DATASET_PATH = 'src/the_art_of_worldly_wisdom_enriched_v2.json'
    MODEL_ID = 'google/gemma-7b-it'
    DEVICE = 'cuda:0'
    FINE_TUNED_MODEL = 'google-gemma-7b-it-test-v2'
    HF_TOKEN = ''

os.environ["HF_TOKEN"] = Config.HF_TOKEN 

# set the qunatization configs
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [177]:
model = AutoModelForCausalLM.from_pretrained(Config.MODEL_ID, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(Config.MODEL_ID, add_eos_token=True)

Loading checkpoint shards: 100%|█████████████████████████████████████████| 4/4 [00:06<00:00,  1.51s/it]


### Setup State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods

Sources:
- https://github.com/huggingface/peft

In [178]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
print(model)

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 3072, padding_idx=0)
    (layers): ModuleList(
      (0-27): 28 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=3072, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=3072, out_features=24576, bias=False)
          (up_proj): Linear4bit(in_features=3072, out_features=24576, bias=False)
          (down_proj): Linear4bit(in_features=24576, out_features=3072, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
   

In [179]:
def find_all_linear_names(model):
  cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
  lora_module_names = set()
  for name, module in model.named_modules():
    if isinstance(module, cls):
      names = name.split('.')
      lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names:
      lora_module_names.remove('lm_head')
  return list(lora_module_names)

modules = find_all_linear_names(model)
print(modules)

['o_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'down_proj', 'up_proj']


In [180]:
lora_config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules=modules,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [181]:
model = get_peft_model(model, lora_config)
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GemmaForCausalLM(
      (model): GemmaModel(
        (embed_tokens): Embedding(256000, 3072, padding_idx=0)
        (layers): ModuleList(
          (0-27): 28 x GemmaDecoderLayer(
            (self_attn): GemmaSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=3072, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=3072, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear4bit(
                (base

Trainable parameters: *(The amount of trainable parameters increases with a higher lora r)*

In [182]:
trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")

Trainable: 200015872 | total: 8737696768 | Percentage: 2.2891%


## Load dataset

In [183]:
df = pd.read_json(Config.DATASET_PATH)
display(df.head())
print(len(df))

Unnamed: 0,lang,src,count,header,content,instructions,output
0,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...","Was denkst du über die Herausforderungen, vor ...","Die Herausforderungen, vor denen weise Mensche..."
1,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...","Was sind die Fähigkeiten oder Eigenschaften, d...","Ein weiser Mensch der heutigen Zeit benötigt, ..."
2,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...","Wie gelingt es einem weisen Menschen, sich in ...",Ein weiser Mensch vollbringt seine Taten mit B...
3,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...",Was denkst du über die steigenden Erwartungen ...,Die steigenden Erwartungen an die Intelligenz ...
4,ger,https://www.projekt-gutenberg.org/gracian/orak...,2,Herz und Kopf:,die beiden Pole der Sonne unserer Fähigkeiten:...,Wie können wir unser Denken und Fühlen besser ...,Wohlgepflegtes Denken und Fühlen sind der Schl...


2396


In [184]:
dataset = Dataset.from_pandas(df)

## Train

In [185]:
def get_prompt(example):

    begin = 'You are Baltasar Gracian, a 17th century spanish philosopher. Below is an instruction that describes a task, alongside a possible input that contains excerpts of your own literature. Write a response in first person perspective as Baltasar Gracian that appropriately completes the request.\n\n'
    instruct = f"### Instruct: {example['instructions']}\n\n"
    input = f"### Input:\n \"{example['header']} {example['content']}\"\n\n"
    output = f"### Response:\n {example['output']}\n\n"
    end = ''
    return begin + instruct + input + output + end

In [186]:
print(get_prompt(df.iloc[1300]))

You are Baltasar Gracian, a 17th century spanish philosopher. Below is an instruction that describes a task, alongside a possible input that contains excerpts of your own literature. Write a response in first person perspective as Baltasar Gracian that appropriately completes the request.

### Instruct: How can we evaluate the true value of a work or a person's talents—should we focus more on their depth and quality rather than their quantity?

### Input:
 "xxvii Prize Intensity more than Extent. Excellence resides in quality not in quantity. The best is always few and rare: much lowers value. Even among men giants are commonly the real dwarfs. Some reckon books by the thickness, as if they were written to try the brawn more than the brain. Extent alone never rises above mediocrity: it is the misfortune of universal geniuses that in attempting to be at home everywhere, are so nowhere. Intensity gives eminence, and rises to the heroic in matters sublime."

### Response:
 The true value 

In [187]:
def get_inference_prompt(example):

    begin = 'You are Baltasar Gracian, a 17th century spanish philosopher. Below is an instruction that describes a task, alongside a possible input that contains excerpts of your own literature. Write a response in first person perspective as Baltasar Gracian that appropriately completes the request.\n\n'
    instruct = f"### Instruct: {example['instructions']}\n\n"
    input = f"### Input:\n \"{example['header']} {example['content']}\"\n\n"
    output = f"### Response:\n"
    end = ''
    return begin + instruct + input + output + end

In [188]:
def formatting_func(example):
    return [get_prompt(example)]

Test the model before fine-tuning.

In [189]:
inputs = tokenizer(get_inference_prompt(df.iloc[1300]), return_tensors="pt").to(Config.DEVICE)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You are Baltasar Gracian, a 17th century spanish philosopher. Below is an instruction that describes a task, alongside a possible input that contains excerpts of your own literature. Write a response in first person perspective as Baltasar Gracian that appropriately completes the request.

### Instruct: How can we evaluate the true value of a work or a person's talents—should we focus more on their depth and quality rather than their quantity?

### Input:
 "xxvii Prize Intensity more than Extent. Excellence resides in quality not in quantity. The best is always few and rare: much lowers value. Even among men giants are commonly the real dwarfs. Some reckon books by the thickness, as if they were written to try the brawn more than the brain. Extent alone never rises above mediocrity: it is the misfortune of universal geniuses that in attempting to be at home everywhere, are so nowhere. Intensity gives eminence, and rises to the heroic in matters sublime."

### Response:
 gentle reader, 

### Start the Trainer

In [190]:
wandb.init(project="Baltasar-Gracian-AI", entity="keboen-ttlab", name=Config.FINE_TUNED_MODEL)

In [191]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=512,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        # eval_accumulation_steps=1,
        warmup_steps=2,
        max_steps=25,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
        optim='paged_adamw_8bit',
        report_to='wandb'
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
    peft_config=lora_config,
    formatting_func=formatting_func,
)

Map: 100%|██████████████████████████████████████████████████| 2396/2396 [01:53<00:00, 21.20 examples/s]
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


In [192]:
trainer.train()
wandb.finish()

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
1,2.5584
2,2.5385
3,2.0899
4,1.5999
5,1.3647
6,1.1297
7,0.9036
8,0.6954
9,0.519
10,0.3596


0,1
train/epoch,▁▁▂▂▂▂▃▃▃▄▄▄▅▅▅▅▆▆▆▇▇▇▇███
train/global_step,▁▁▂▂▂▂▃▃▃▄▄▄▅▅▅▅▆▆▆▇▇▇▇███
train/grad_norm,▇▇▅▅▅▆▇▇██▆▂▂▃▂▂▂▁▁▁▁▁▁▁▁
train/learning_rate,▅██▇▇▇▆▆▆▆▅▅▅▄▄▄▃▃▃▃▂▂▂▁▁
train/loss,██▇▅▅▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁
train/train_runtime,▁
train/train_samples_per_second,▁
train/train_steps_per_second,▁

0,1
train/epoch,25.0
train/global_step,25.0
train/grad_norm,2239.19946
train/learning_rate,0.0
train/loss,0.0035
train/total_flos,1831971402547200.0
train/train_loss,0.57749
train/train_runtime,460.3261
train/train_samples_per_second,0.434
train/train_steps_per_second,0.054


In [193]:
model.save_pretrained('models/' + Config.FINE_TUNED_MODEL)

## Inference Test

In [206]:
model.eval()
prompt = get_inference_prompt(df.iloc[2001])
print(prompt)

You are Baltasar Gracian, a 17th century spanish philosopher. Below is an instruction that describes a task, alongside a possible input that contains excerpts of your own literature. Write a response in first person perspective as Baltasar Gracian that appropriately completes the request.

### Instruct: How can one balance eloquence in speech with the integrity of their actions to embody true character?

### Input:
 "ccii Words and Deeds make the Perfect Man. One should speak well and act honourably: the one is an excellence of the head, the other of the heart, and both arise from nobility of soul. Words are the shadows of deeds; the former are feminine, the latter masculine. It is more important to be renowned than to convey renown. Speech is easy, action hard. Actions are the stuff of life, words its frippery. Eminent deeds endure, striking words pass away. Actions are the fruit of thought; if this is wise, they are effective."

### Response:



In [207]:
inputs = tokenizer(prompt, return_tensors='pt').to(Config.DEVICE)
print(len(inputs[0]))

207


In [208]:
outputs = model.generate(**inputs, max_length=512)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(generated_text)

<bos>You are Baltasar Gracian, a 17th century spanish philosopher. Below is an instruction that describes a task, alongside a possible input that contains excerpts of your own literature. Write a response in first person perspective as Baltasar Gracian that appropriately completes the request.

### Instruct: How can one balance eloquence in speech with the integrity of their actions to embody true character?

### Input:
 "ccii Words and Deeds make the Perfect Man. One should speak well and act honourably: the one is an excellence of the head, the other of the heart, and both arise from nobility of soul. Words are the shadows of deeds; the former are feminine, the latter masculine. It is more important to be renowned than to convey renown. Speech is easy, action hard. Actions are the stuff of life, words its frippery. Eminent deeds endure, striking words pass away. Actions are the fruit of thought; if this is wise, they are effective."

### Response:
<eos> I am Baltasar Gracian, a man o