# Customer Support LLM v2 — Unsloth QLoRA (SFT + LoRA)

Bu notebook, oluşturduğum sentetik veri seti ile **SFT + LoRA/QLoRA** fine-tune yapmak için hazırlanmıştır.


> Dataset formatı: JSONL (her satır: `instruction`, `input`, `output`, opsiyonel `meta`)


In [None]:
# GPU olarak A100 tercih edildi.
import torch
print("CUDA:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))


CUDA: True
GPU: NVIDIA A100-SXM4-40GB


## 1) Kurulum

In [None]:
!pip install -U pip setuptools wheel
!pip install -U "unsloth[colab]"
!pip install -U datasets accelerate peft trl transformers gradio
!pip install -U bitsandbytes

Collecting unsloth[colab]
  Using cached unsloth-2026.1.2-py3-none-any.whl.metadata (66 kB)
Collecting xformers>=0.0.27.post2 (from unsloth[colab])
  Using cached xformers-0.0.33.post2-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting bitsandbytes!=0.46.0,!=0.48.0,>=0.45.5 (from unsloth[colab])
  Using cached bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting torch>=2.4.0 (from unsloth[colab])
  Using cached torch-2.9.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch>=2.4.0->unsloth[colab])
  Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch>=2.4.0->unsloth[colab])
  Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch>=2.4.0->unsloth[c

Collecting bitsandbytes
  Using cached bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Downloading bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl (59.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.1/59.1 MB[0m [31m57.1 MB/s[0m  [33m0:00:01[0m
[?25h[31mERROR: Operation cancelled by user[0m[31m
[0m^C


## 2) Drive Bağla ve v2 Dataset'i Yükle

In [None]:
from google.colab import drive
drive.mount("/content/drive")


Mounted at /content/drive


In [None]:
from datasets import load_dataset

# Veri Seti yolu
DATA_PATH = "/content/drive/MyDrive/TwitterCustomer/customer_support.jsonl"

raw_ds = load_dataset("json", data_files=DATA_PATH, split="train")
raw_ds = raw_ds.shuffle(seed=42)
print(raw_ds)
print(raw_ds[0])


Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['instruction', 'input', 'output', 'meta'],
    num_rows: 20000
})
{'instruction': 'How do I stop the subscription renewal?', 'input': '', 'output': "I understand how frustrating that can be. I can guide you through cancellation. We'll stay with you until this is resolved.", 'meta': {'scenario': 'cancel_subscription', 'created_at': '2026-01-14T12:50:47.207949Z'}}


## 3) SFT için `text` alanı üret (prompt şablonu)

Modelin eğitimi için her örneği tek bir metin alanına çeviriyoruz.


In [None]:
def to_text(example):
    instr = example["instruction"]
    inp   = example.get("input", "")
    out   = example["output"]

    if inp and str(inp).strip():
        text = f"""### Instruction:
{instr}

### Input:
{inp}

### Response:
{out}"""
    else:
        text = f"""### Instruction:
{instr}

### Response:
{out}"""
    return {"text": text}

ds = raw_ds.map(to_text, remove_columns=raw_ds.column_names)
print(ds[0]["text"][:500])


Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

### Instruction:
How do I stop the subscription renewal?

### Response:
I understand how frustrating that can be. I can guide you through cancellation. We'll stay with you until this is resolved.


In [None]:
split = ds.train_test_split(test_size=0.05, seed=42)
train_ds = split["train"]
val_ds   = split["test"]
print(train_ds, val_ds)


Dataset({
    features: ['text'],
    num_rows: 19000
}) Dataset({
    features: ['text'],
    num_rows: 1000
})


## 4) Qwen/Qwen2.5-3B-Instruct ve Unsloth ile 4-bit QLoRA yükle



In [None]:
!pip install -U unsloth

Collecting unsloth
  Using cached unsloth-2026.1.2-py3-none-any.whl.metadata (66 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Using cached xformers-0.0.33.post2-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting bitsandbytes!=0.46.0,!=0.48.0,>=0.45.5 (from unsloth)
  Using cached bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting datasets!=4.0.*,!=4.1.0,<4.4.0,>=3.4.1 (from unsloth)
  Using cached datasets-4.3.0-py3-none-any.whl.metadata (18 kB)
Collecting transformers!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.53.0,!=4.54.0,!=4.55.0,!=4.55.1,!=4.57.0,<=4.57.3,>=4.51.3 (from unsloth)
  Using cached transformers-4.57.3-py3-none-any.whl.metadata (43 kB)
Collecting trl!=0.19.0,<=0.24.0,>=0.18.2 (from unsloth)
  Using cached trl-0.24.0-py3-none-any.whl.metadata (11 kB)
Collecting torch>=2.4.0 (from unsloth)
  Using cached torch-2.9.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from to

In [None]:
from unsloth import FastLanguageModel

# Seçilen model
model_name = "Qwen/Qwen2.5-3B-Instruct"

max_seq_length = 1024

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    load_in_4bit = True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    lora_alpha = 64,
    lora_dropout = 0.05,
    target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    use_gradient_checkpointing = "unsloth",
)

tokenizer.model_max_length = max_seq_length


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.1.2: Fast Qwen2 patching. Transformers: 4.57.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.5.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.36G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/271 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2026.1.2 patched 36 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


## 5) Eğitim (SFTTrainer)



In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="unsloth_cs_v2_sft_lora",
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    learning_rate=1e-4,
    num_train_epochs=3,
    logging_steps=20,
    save_steps=200,
    save_total_limit=2,
    bf16=True,
    fp16=False,
    optim="paged_adamw_8bit",
    report_to="none",
)

def formatting_func(example):
    return example["text"]

trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    processing_class=tokenizer,
    formatting_func=formatting_func,
)

trainer.train()


Unsloth: Tokenizing ["text"] (num_proc=16):   0%|          | 0/19000 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=16):   0%|          | 0/1000 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 19,000 | Num Epochs = 3 | Total steps = 3,564
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 2 x 1) = 16
 "-____-"     Trainable parameters = 59,867,136 of 3,145,805,824 (1.90% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
20,1.6709
40,0.3727
60,0.2879
80,0.2797
100,0.2701
120,0.2657
140,0.2729
160,0.2666
180,0.2786
200,0.2628


TrainOutput(global_step=3564, training_loss=0.2708821009037604, metrics={'train_runtime': 5046.4734, 'train_samples_per_second': 11.295, 'train_steps_per_second': 0.706, 'total_flos': 1.0165902125590118e+17, 'train_loss': 0.2708821009037604, 'epoch': 3.0})

## 6) Kaydet (Drive)

Not: Bu çıktı **LoRA adapter** ağırlıklarını kaydeder.


In [None]:
SAVE_DIR = "/content/drive/MyDrive/unsloth_cs_v2_sft_lora_model"
trainer.model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)
print("Saved:", SAVE_DIR)


Saved: /content/drive/MyDrive/unsloth_cs_v2_sft_lora_model


## 7) Hızlı Test (Inference)

In [None]:
import torch
FastLanguageModel.for_inference(model)

def generate(customer_text, max_new_tokens=180):
    prompt = f"""You are a professional customer support agent.

Rules:
- Do NOT repeat the same sentence.
- Give a specific and helpful solution based on the customer message.
- Ask for necessary information only if needed.
- Keep the response concise (3-7 sentences).

Customer message:
{customer_text}

Support response:
"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.9,
            top_p=0.9,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
        )
    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    if "Support response:" in decoded:
        return decoded.split("Support response:", 1)[-1].strip()
    return decoded.strip()

print(generate("I was charged twice for the same order. I need help getting a refund for the extra charge."))


Sorry about the trouble. Duplicate charges are often caused by pending authorizations. Happy to help — reply here with the details. We'll stay with you until this is resolved. Thank you for your patience. If you share the requested details, we'll help right away. Please confirm: - Ticket: TKT-41290 If you SHARE THE REQUESTED DETAILS, WE'LL HELP RIGHT AWAY. Happy helping! Sorry about the hassle. Here's what we can do next: - Confirm the return status and refund processing timeline. - Share how to start a new refund request if needed. If we hear back within 4 hours, we’ll stay with the case until it’s resolved. Happy assisting — thank you foryour patience. We will stay with this until it's resolved. If no further confirmation is received, we will close the ticket in 5 days. Happy assist


## 8) Gradio Demo Arayüz

In [None]:
import gradio as gr

def support_bot(customer_text):
    return generate(customer_text)

gr.Interface(
    fn=support_bot,
    inputs=gr.Textbox(lines=4, placeholder="Write a customer message..."),
    outputs=gr.Textbox(lines=8),
    title="Customer Support LLM (Unsloth SFT + LoRA)",
    description="Self-generated v2 dataset + Unsloth QLoRA fine-tuning + Gradio demo."
).launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://0d54f9dc421f403b67.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


