# About

The third notebook in the pipeline.

In this notebook, we utilize the enriched datasets to train the first (and possibly second) stage of the AI. For this notebook to run properly, use the following **exact** package versions:

```
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0
```

because bitsandbytes, accelerate are bitch packages to work with.

-------------------

Useful resources:
- https://huggingface.co/blog/gemma-peft

In [1]:
import os
import torch
import polars as pl
import wandb
import pandas as pd
import bitsandbytes as bnb

from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, DataCollatorForLanguageModeling
from trl import SFTTrainer
from transformers import BitsAndBytesConfig
from datasets import Dataset

  from .autonotebook import tqdm as notebook_tqdm


## Load Model

In [2]:
class Config:
    DATASET_PATH = 'src/the_art_of_worldly_wisdom_enriched_v2.json'
    MODEL_ID = 'google/gemma-7b-it'
    DEVICE = 'cuda:0'
    FINE_TUNED_MODEL = 'google-gemma-7b-it-test'
    HF_TOKEN = ''

os.environ["HF_TOKEN"] = Config.HF_TOKEN 

# set the qunatization configs
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [3]:
model = AutoModelForCausalLM.from_pretrained(Config.MODEL_ID, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(Config.MODEL_ID, add_eos_token=True)

Loading checkpoint shards: 100%|███| 4/4 [00:08<00:00,  2.03s/it]


### Setup State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods

Sources:
- https://github.com/huggingface/peft

In [4]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
print(model)

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 3072, padding_idx=0)
    (layers): ModuleList(
      (0-27): 28 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=3072, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=3072, out_features=24576, bias=False)
          (up_proj): Linear4bit(in_features=3072, out_features=24576, bias=False)
          (down_proj): Linear4bit(in_features=24576, out_features=3072, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
   

In [5]:
def find_all_linear_names(model):
  cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
  lora_module_names = set()
  for name, module in model.named_modules():
    if isinstance(module, cls):
      names = name.split('.')
      lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names:
      lora_module_names.remove('lm_head')
  return list(lora_module_names)

modules = find_all_linear_names(model)
print(modules)

['down_proj', 'v_proj', 'q_proj', 'o_proj', 'up_proj', 'gate_proj', 'k_proj']


In [6]:
lora_config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules=modules,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [7]:
model = get_peft_model(model, lora_config)
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GemmaForCausalLM(
      (model): GemmaModel(
        (embed_tokens): Embedding(256000, 3072, padding_idx=0)
        (layers): ModuleList(
          (0-27): 28 x GemmaDecoderLayer(
            (self_attn): GemmaSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=3072, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=3072, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear4bit(
                (base

Trainable parameters: *(The amount of trainable parameters increases with a higher lora r)*

In [8]:
trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")

Trainable: 200015872 | total: 8737696768 | Percentage: 2.2891%


## Load dataset

In [9]:
df = pd.read_json(Config.DATASET_PATH)
display(df.head())
print(len(df))

Unnamed: 0,lang,src,count,header,content,instructions,output
0,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...","Was denkst du über die Herausforderungen, vor ...","Die Herausforderungen, vor denen weise Mensche..."
1,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...","Was sind die Fähigkeiten oder Eigenschaften, d...","Ein weiser Mensch der heutigen Zeit benötigt, ..."
2,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...","Wie gelingt es einem weisen Menschen, sich in ...",Ein weiser Mensch vollbringt seine Taten mit B...
3,ger,https://www.projekt-gutenberg.org/gracian/orak...,1,"Alles hat heut zu Tage seinen Gipfel erreicht,","aber die Kunst sich geltend zu machen, den höc...",Was denkst du über die steigenden Erwartungen ...,Die steigenden Erwartungen an die Intelligenz ...
4,ger,https://www.projekt-gutenberg.org/gracian/orak...,2,Herz und Kopf:,die beiden Pole der Sonne unserer Fähigkeiten:...,Wie können wir unser Denken und Fühlen besser ...,Wohlgepflegtes Denken und Fühlen sind der Schl...


2396


In [10]:
dataset = Dataset.from_pandas(df)

## Train

In [11]:
def get_prompt(example):

    begin = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n'
    instruct = '### Instruct: Answer or comment on the given input below as Baltasar Gracian, a 17th century philosopher.\n\n'
    question = f"### Input:\n {example['instructions']}\n\n"
    output = f"### Output:\n {example['output']}\n\n"
    end = ''
    return begin + instruct + question + output + end

In [12]:
print(get_prompt(df.iloc[1300]))

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruct: Answer or comment on the given input below as Baltasar Gracian, a 17th century philosopher.

### Input:
 How can we evaluate the true value of a work or a person's talents—should we focus more on their depth and quality rather than their quantity?

### Output:
 The true value of a work or a person's talents is best gauged by their depth and quality, for excellence doth reside in the rare and the profound, rather than the plentiful. Indeed, the best is always few and far between; a multitude devalues the singularly sublime. To seek quantity is to court mediocrity, whilst true eminence lies in the intensity of labor—a measure that lifts one to the heights of the heroic. Thus, let us prize that which is intense above that which is extensive, for in the finite shall we find the profound.




In [13]:
def get_inference_prompt(example):

    begin = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n'
    instruct = '### Instruct: Answer or comment on the given input below as Baltasar Gracian, a 17th century philosopher.\n\n'
    question = f"### Input:\n {example['instructions']}\n\n"
    output = f"### Output:\n"
    end = ''
    return begin + instruct + question + output + end

In [14]:
def formatting_func(example):
    return [get_prompt(example)]

Test the model before fine-tuning.

In [15]:
inputs = tokenizer(get_inference_prompt(df.iloc[1300]), return_tensors="pt").to(Config.DEVICE)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruct: Answer or comment on the given input below as Baltasar Gracian, a 17th century philosopher.

### Input:
 How can we evaluate the true value of a work or a person's talents—should we focus more on their depth and quality rather than their quantity?

### Output:



### Start the Trainer

In [16]:
wandb.init(project="Baltasar-Gracian-AI", entity="keboen-ttlab")

[34m[1mwandb[0m: Currently logged in as: [33mkeboen[0m ([33mkeboen-ttlab[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [17]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=512,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        # eval_accumulation_steps=1,
        warmup_steps=2,
        max_steps=25,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
        optim='paged_adamw_8bit',
        report_to='wandb'
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
    peft_config=lora_config,
    formatting_func=formatting_func,
)

Map: 100%|████████████| 2396/2396 [00:49<00:00, 48.21 examples/s]
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


In [18]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
1,2.5111
2,2.488
3,2.0495
4,1.5769
5,1.3549
6,1.1266
7,0.9022
8,0.7041
9,0.517
10,0.3714


TrainOutput(global_step=25, training_loss=0.5731383980531245, metrics={'train_runtime': 463.9783, 'train_samples_per_second': 0.431, 'train_steps_per_second': 0.054, 'total_flos': 1831971402547200.0, 'train_loss': 0.5731383980531245, 'epoch': 25.0})

In [19]:
model.save_pretrained('models/' + Config.FINE_TUNED_MODEL)

## Inference Test

In [20]:
model.eval()
prompt = get_inference_prompt(df.iloc[1300])
print(prompt)

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruct: Answer or comment on the given input below as Baltasar Gracian, a 17th century philosopher.

### Input:
 How can we evaluate the true value of a work or a person's talents—should we focus more on their depth and quality rather than their quantity?

### Output:



In [21]:
inputs = tokenizer(prompt, return_tensors='pt').to(Config.DEVICE)
print(len(inputs[0]))

88


In [22]:
outputs = model.generate(**inputs, max_length=512)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(generated_text)

<bos>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruct: Answer or comment on the given input below as Baltasar Gracian, a 17th century philosopher.

### Input:
 How can we evaluate the true value of a work or a person's talents—should we focus more on their depth and quality rather than their quantity?

### Output:
<eos>In evaluating a person or work, it is essential to consider both their depth and quality, rather than solely relying on their quantity. While quantity can be impressive, it is ultimately superficial. To truly gauge a person's talents or the quality of a work, we must delve into their innermost layers, exploring their depth and substance. Therefore, it is wiser to focus on the intrinsic worth of a thing, rather than its outward facade. In doing so, we can uncover the true treasures that lie hidden within.<eos>
