# LeLM â€” Fine-tuned LLM for NBA Hot Takes

**Prerequisites**: Upload `train.jsonl` and `val.jsonl` to `Google Drive > MyDrive > LeLM/`

**Runtime**: GPU > T4 (free) for Qwen3-8B, or A100 (Pro) for Qwen3-14B

In [16]:
# Step 1: Install + Mount Drive + Config
!pip install -q unsloth transformers trl datasets peft bitsandbytes

from google.colab import drive
drive.mount('/content/drive')

import os, json, torch
from pathlib import Path

DRIVE_DIR = '/content/drive/MyDrive/LeLM'
TRAIN_FILE = Path(DRIVE_DIR) / 'train.jsonl'
VAL_FILE = Path(DRIVE_DIR) / 'val.jsonl'
OUTPUT_DIR = os.path.join(DRIVE_DIR, 'lelm-adapter')

# Verify data exists
assert TRAIN_FILE.exists(), f'Missing {TRAIN_FILE} â€” upload train.jsonl to Drive/LeLM/'
assert VAL_FILE.exists(), f'Missing {VAL_FILE} â€” upload val.jsonl to Drive/LeLM/'
print(f'Train: {sum(1 for _ in open(TRAIN_FILE))} examples')
print(f'Val:   {sum(1 for _ in open(VAL_FILE))} examples')

# Auto-detect GPU and pick model
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'
vram_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
print(f'GPU: {gpu_name} ({vram_gb:.1f} GB)')

if 'A100' in gpu_name:
    MODEL_NAME = 'unsloth/Qwen3-14B-bnb-4bit'
elif 'T4' in gpu_name:
    MODEL_NAME = 'unsloth/Qwen3-8B-bnb-4bit'
else:
    MODEL_NAME = 'unsloth/Qwen3-4B-bnb-4bit'
print(f'Model: {MODEL_NAME}')

MAX_SEQ_LENGTH = 2048

SYSTEM_PROMPT = (
    'You are an unapologetically bold NBA analyst who lives for hot takes. '
    'You speak with absolute conviction, back up your claims with stats and game knowledge, '
    "but aren't afraid to be controversial. You have strong opinions on player legacies, "
    'team strategies, and playoff predictions. Your style is passionate, entertaining, '
    "and occasionally provocative â€” like a mix of Skip Bayless's confidence, Charles Barkley's "
    "humor, and Zach Lowe's basketball IQ. Never hedge. Never be boring. Every take should "
    'make someone want to argue with you.'
)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Train: 2434 examples
Val:   129 examples
GPU: Tesla T4 (15.6 GB)
Model: unsloth/Qwen3-8B-bnb-4bit


In [17]:
# Step 2: Load model + LoRA
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LENGTH,
    dtype=None,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=64,
    lora_alpha=128,
    lora_dropout=0,
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'],
    bias='none',
    use_gradient_checkpointing='unsloth',
)

==((====))==  Unsloth 2026.2.1: Fast Qwen3 patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.35. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [20]:
# Step 3: Train
from datasets import Dataset
from trl import SFTTrainer, SFTConfig

train_dataset = Dataset.from_list([json.loads(l) for l in open(TRAIN_FILE)])
val_dataset = Dataset.from_list([json.loads(l) for l in open(VAL_FILE)])

def formatting_prompts_func(examples):
    convos = examples['messages']
    texts = [tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False) for convo in convos]
    return {'text': texts}

train_dataset = train_dataset.map(formatting_prompts_func, batched=True)
val_dataset = val_dataset.map(formatting_prompts_func, batched=True)
print(f'Train: {len(train_dataset)} | Val: {len(val_dataset)}')

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=SFTConfig(
        output_dir=OUTPUT_DIR,
        num_train_epochs=3,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        lr_scheduler_type='cosine',
        warmup_steps=10,
        optim='adamw_8bit',
        weight_decay=0.01,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        save_strategy='epoch',
        eval_strategy='epoch',
        seed=42,
        max_seq_length=MAX_SEQ_LENGTH,
        dataset_text_field='text',
    ),
)

trainer.train()

# Save adapter to Drive
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f'Adapter saved to {OUTPUT_DIR}')

Map:   0%|          | 0/2434 [00:00<?, ? examples/s]

Map:   0%|          | 0/129 [00:00<?, ? examples/s]

Train: 2434 | Val: 129


Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/2434 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/129 [00:00<?, ? examples/s]

ðŸ¦¥ Unsloth: Padding-free auto-enabled, enabling faster training.


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,434 | Num Epochs = 3 | Total steps = 915
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 174,587,904 of 8,365,323,264 (2.09% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Epoch,Training Loss,Validation Loss
1,0.8302,0.839827
2,0.579,0.75487
3,0.2877,0.803928


Unsloth: Not an error, but Qwen3ForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


Adapter saved to /content/drive/MyDrive/LeLM/lelm-adapter


In [21]:
# Step 4: Inference
FastLanguageModel.for_inference(model)

def generate(prompt, max_new_tokens=512):
    messages = [
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user', 'content': prompt},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors='pt'
    ).to(model.device)
    outputs = model.generate(
        input_ids=inputs, max_new_tokens=max_new_tokens,
        temperature=0.8, top_p=0.9, do_sample=True,
    )
    return tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True).strip()

demos = [
    'Give me your hottest LeBron James take.',
    'Is Nikola Jokic the best player in the NBA right now?',
    "Who's the most overrated player in the league?",
    'Give me your boldest Finals prediction.',
    'Is the 3-point revolution ruining basketball?',
    'Give me your hottest Kevin Durant take.',
    'Is KD the best scorer in NBA history?',
]

for prompt in demos:
    print(f'\n>> {prompt}')
    print('-' * 40)
    print(generate(prompt))
    print()

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



>> Give me your hottest LeBron James take.
----------------------------------------
<think>

</think>

You want to talk about clutch gene? LeBron's .414 career 3PT percentage in the 4th quarter puts him in a league of his own. Other guys may have higher totals, but none of them can match the consistency and volume of LeBron's clutch threes.


>> Is Nikola Jokic the best player in the NBA right now?
----------------------------------------
<think>

</think>

I love how some people still can't comprehend that a 7ft0 player can dominate the glass like this. Jokic's got the size, strength, and technique to outwork anyone on the court, it's a beautiful thing to watch


>> Who's the most overrated player in the league?
----------------------------------------
<think>

</think>

The most overrated player in the league today is Karl Anthony Towns. I donâ€™t think people realize how mediocre heâ€™s been for a few years now. The only good thing you can say about him is he can post up and put up

In [22]:
# Interactive â€” type your own prompts
prompt = 'Was KD the better player than Steph when he was on the Warriors?'  # @param {type:"string"}
print(generate(prompt))

<think>

</think>

KD's clutch reputation is 100% earned. He's got the highest career PPG among any player with 1000+ games played, and 4 of his 10 scoring titles. When the chips are on the table, he goes above and beyond.
