In [1]:
from unsloth import FastLanguageModel
import torch
import json

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!


In [2]:
models = [
    "unsloth/Mistral-Small-3.1-24B-Base-2503",
    "unsloth/Qwen3-32B",
    "unsloth/gemma-3-27b-it",
    "unsloth/Llama-3.3-70B-Instruct"
] 

dtype = None
load_in_4bit = False
max_seq_length = 8192

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = models[1],
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.11.2: Fast Qwen3 patching. Transformers: 4.57.1.
   \\   /|    NVIDIA H100 PCIe. Num GPUs = 1. Max memory: 79.189 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 9.0. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    
    qat_scheme = "int4",  # Options: "int4", "fp8-int4", "fp8-fp8", "int8-int4"
)

Unsloth: Applying QAT to mitigate quantization degradation


Unsloth 2025.11.2 patched 64 layers with 64 QKV layers, 64 O layers and 64 MLP layers.


In [5]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "qwen-2.5",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [
        tokenizer.apply_chat_template(
            convo, 
            tokenize=False, 
            add_generation_prompt=False
        )
        for convo in convos
    ]
    return {"text": texts}

In [6]:
from datasets import load_dataset
dataset = load_dataset(
    "json",
    data_files = "/home/ubuntu/AlphaBuffett/Dataset/Processed/Ground Truth/dataset_combined.json",
    split = "train",
)

In [7]:
dataset[5]['conversations']

[{'from': 'human',
  'value': 'How do you approach the challenge of maintaining discipline when market conditions soften and competitors start offering unrealistic rates?'},
 {'from': 'gpt',
  'value': "This is one of the most difficult challenges in our insurance operations, and it requires what I call unusual managerial discipline. You see, when markets loosen and rates become inadequate, the natural institutional behavior is to fight to keep your business. Nobody likes to see the other fellow take away their customers. But we've learned that you sometimes have to philosophically accept reduced volume. Look at what happened with National Indemnity - we've had excellent underwriting margins and large volume gains when many competitors contracted or withdrew after the 1974-75 crisis. But these conditions won't last forever. When the market softens, we have to be willing to let business go when prices become foolish. It's like being the designated driver at a party - you might not be ha

In [8]:
from unsloth.chat_templates import standardize_sharegpt

dataset = standardize_sharegpt(dataset)

In [9]:
dataset[5]['conversations']

[{'content': 'How do you approach the challenge of maintaining discipline when market conditions soften and competitors start offering unrealistic rates?',
  'role': 'user'},
 {'content': "This is one of the most difficult challenges in our insurance operations, and it requires what I call unusual managerial discipline. You see, when markets loosen and rates become inadequate, the natural institutional behavior is to fight to keep your business. Nobody likes to see the other fellow take away their customers. But we've learned that you sometimes have to philosophically accept reduced volume. Look at what happened with National Indemnity - we've had excellent underwriting margins and large volume gains when many competitors contracted or withdrew after the 1974-75 crisis. But these conditions won't last forever. When the market softens, we have to be willing to let business go when prices become foolish. It's like being the designated driver at a party - you might not be having as much f

In [10]:
dataset = dataset.map(formatting_prompts_func, batched=True)

In [11]:
dataset[5]

{'conversations': [{'content': 'How do you approach the challenge of maintaining discipline when market conditions soften and competitors start offering unrealistic rates?',
   'role': 'user'},
  {'content': "This is one of the most difficult challenges in our insurance operations, and it requires what I call unusual managerial discipline. You see, when markets loosen and rates become inadequate, the natural institutional behavior is to fight to keep your business. Nobody likes to see the other fellow take away their customers. But we've learned that you sometimes have to philosophically accept reduced volume. Look at what happened with National Indemnity - we've had excellent underwriting margins and large volume gains when many competitors contracted or withdrew after the 1974-75 crisis. But these conditions won't last forever. When the market softens, we have to be willing to let business go when prices become foolish. It's like being the designated driver at a party - you might not

In [None]:
#EARLY STOPPING / TEST SIZE / EPOCHS / LEARNING RATE

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq, EarlyStoppingCallback
from unsloth import is_bfloat16_supported

# Fix: Handle if dataset is already a DatasetDict
if hasattr(dataset, 'keys'):  # It's a DatasetDict
    dataset = dataset["train"]  # Get the train split

# Now split it
dataset_split = dataset.train_test_split(test_size=0.15, seed=42)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset_split["train"],  # 85% for training
    eval_dataset = dataset_split["test"],    # 15% for validation
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.05,
        num_train_epochs = 2,
        learning_rate = 1e-4,
        eval_strategy = "steps",
        eval_steps = 50,
        save_strategy = "steps",
        save_steps = 50,
        save_total_limit = 3,
        load_best_model_at_end = True,
        metric_for_best_model = "eval_loss",
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs/qwen-buffett-1epoch",
        report_to = "none",
    ),
    callbacks = [EarlyStoppingCallback(early_stopping_patience=3)],
)

In [12]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 2,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

In [13]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part="<|im_start|>user",
    response_part="<|im_start|>assistant",
)

In [14]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nHow do you approach the challenge of maintaining discipline when market conditions soften and competitors start offering unrealistic rates?<|im_end|>\n<|im_start|>assistant\nThis is one of the most difficult challenges in our insurance operations, and it requires what I call unusual managerial discipline. You see, when markets loosen and rates become inadequate, the natural institutional behavior is to fight to keep your business. Nobody likes to see the other fellow take away their customers. But we've learned that you sometimes have to philosophically accept reduced volume. Look at what happened with National Indemnity - we've had excellent underwriting margins and large volume gains when many competitors contracted or withdrew after the 1974-75 crisis. But these conditions won't last forever. When the market softens, we have to be willing to let business go when pri

In [15]:
# Get a sample from the trainer's dataset
sample = trainer.train_dataset[5]
labels = sample["labels"]

# Now check what's being trained on
space = tokenizer(text=" ", add_special_tokens=False).input_ids[0]
filtered = [space if x == -100 else x for x in labels]
tokenizer.decode(filtered)

"                                                \nThis is one of the most difficult challenges in our insurance operations, and it requires what I call unusual managerial discipline. You see, when markets loosen and rates become inadequate, the natural institutional behavior is to fight to keep your business. Nobody likes to see the other fellow take away their customers. But we've learned that you sometimes have to philosophically accept reduced volume. Look at what happened with National Indemnity - we've had excellent underwriting margins and large volume gains when many competitors contracted or withdrew after the 1974-75 crisis. But these conditions won't last forever. When the market softens, we have to be willing to let business go when prices become foolish. It's like being the designated driver at a party - you might not be having as much fun as everyone else, but you're making the right long-term decision. The key is maintaining that discipline even when it runs counter to w

In [16]:
# Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA H100 PCIe. Max memory = 79.189 GB.
63.562 GB of memory reserved.


In [17]:
import os
os.environ['UNSLOTH_RETURN_LOGITS'] = '1'
trainer_stats = trainer.train()

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 7,265 | Num Epochs = 2 | Total steps = 1,818
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 268,435,456 of 33,030,558,720 (0.81% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.4445
2,2.4669
3,2.3668
4,2.3363
5,2.313
6,2.1356
7,2.1784
8,2.0922
9,2.1583
10,1.8572


KeyboardInterrupt: 

In [18]:
# Set model to inference mode
FastLanguageModel.for_inference(model)
from transformers import TextStreamer

# Enable streaming
text_streamer = TextStreamer(tokenizer, skip_prompt=True)

# Test questions
test_questions = [
    "What is your investment philosophy?",
    "How do you evaluate a company?",
    "What makes a good business?"
]

for question in test_questions:
    print(f"\n{'='*50}")
    print(f"Q: {question}")
    print(f"{'='*50}")
    print("A: ", end="")
    
    # Use the tokenizer's apply_chat_template
    messages = [{"role": "user", "content": question}]
    
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    # Tokenize
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        add_special_tokens=False
    ).to("cuda")
    
    # Generate
    outputs = model.generate(
        **inputs,
        streamer=text_streamer,
        max_new_tokens=1024,
        temperature=0.75,
        top_p=0.92,
        repetition_penalty=1.1,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
    print("\n")


Q: What is your investment philosophy?
A: Our investment philosophy is really quite simple - we look for good businesses at fair prices and then hold them for the long term. We believe in what I call 

KeyboardInterrupt: 

In [18]:
# Set model to inference mode
FastLanguageModel.for_inference(model)
from transformers import TextStreamer

# Enable streaming
text_streamer = TextStreamer(tokenizer, skip_prompt=True)

# Simulate a multi-turn conversation where follow-up relies on previous context
conversation = [
    {"role": "user", "content": "What is your investment philosophy?"},
    {"role": "assistant", "content": "Our approach is really quite simple - we're looking for large businesses with understandable, enduring and what I like to call 'mouth-watering' economics. But it's not just about the business itself. We want these companies to be run by able and shareholder-oriented managements. And here's something crucial - we need a sensible price tag attached. We stick to what we understand and what has worked for us over time."},
    {"role": "user", "content": "Can you give me a specific example of a company that demonstrates this philosophy?"},
    {"role": "assistant", "content": "Look at See's Candy. When we bought it in 1972 for $25 million, it had exactly those characteristics I just mentioned - understandable business, mouth-watering economics, and great management. The brand had real pricing power, and customers loved it. Since then, we've put $40 million into it and it's given us over $1 billion pretax."},
    {"role": "user", "content": "You mentioned See's has given you over $1 billion pretax on a $25 million investment. What specifically about their pricing power made those economics so attractive? And how much of that $40 million you put in was actually needed versus optional expansion?"}
]

print("="*60)
print("MULTI-TURN CONVERSATION TEST")
print("="*60)
print("\n[Previous conversation about investment philosophy and See's Candy...]")
print(f"\nUser: {conversation[-1]['content']}")
print("Warren: ", end="")

prompt = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

outputs = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=512,
    temperature=0.75,
    top_p=0.92,
    repetition_penalty=1.1,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)
print("\n")

MULTI-TURN CONVERSATION TEST

[Previous conversation about investment philosophy and See's Candy...]

User: You mentioned See's has given you over $1 billion pretax on a $25 million investment. What specifically about their pricing power made those economics so attractive? And how much of that $40 million you put in was actually needed versus optional expansion?
Warren: Well, let me tell you about that fascinating story. See's was already doing $38 million in sales when we bought them. That's pretty remarkable considering they were only open four months out of the year back then! Here's the key thing: while competitors tried to compete on price, See's had the luxury of competing on quality instead. They could sell at premium prices because people genuinely wanted their products. Now regarding our investments - of that $40 million we've put in since, maybe around $6-7 million was necessary to fund working capital needs due to their seasonal nature. But most of it went toward opening new

In [20]:
from torchao.quantization import quantize_
from torchao.quantization.qat import QATConfig
quantize_(model, QATConfig(step = "convert"))

In [21]:
from torchao.quantization import Int4WeightOnlyConfig

In [22]:
hf_key = "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"
repo = "BuffettBot-QwenQAT"

In [23]:
from huggingface_hub import delete_repo, create_repo

delete_repo(
    repo_id=repo,
    token=hf_key,
    missing_ok = True
)

# Create the repo
create_repo(
    repo_id=repo,
    token=hf_key,
    private=True,
    repo_type="model"
)

RepoUrl('https://huggingface.co/brokenlander/BuffettBot-QwenQAT', endpoint='https://huggingface.co', repo_type='model', repo_id='brokenlander/BuffettBot-QwenQAT')

In [26]:
model.save_pretrained_torchao(
    "brokenlander/" + repo,
    tokenizer,
    torchao_config = Int4WeightOnlyConfig(),
    push_to_hub = True,
    token = hf_key,
    commit_message = "Initial Commit",
    commit_description = "QAT fine-tuned model",
)

TypeError: unsloth_save_pretrained_torchao() got an unexpected keyword argument 'commit_message'