# LoRA Fine-Tuning: DEEP Dataset

Bu notebook DEEP dataset ile Qwen2.5-Coder-1.5B modelini fine-tune eder.

**Önemli**: Runtime > Change runtime type > T4 GPU seçin!

## 1. Setup ve Kurulum

In [1]:
# GPU kontrolü
!nvidia-smi

Mon Dec  1 23:36:23 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   50C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
# Paketleri kur
!pip install -q torch transformers peft datasets accelerate bitsandbytes tqdm

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
# Proje dosyalarını GitHub'dan indir
!git clone https://github.com/B0DH1i/Lora-fine-tune.git
%cd Lora-fine-tune

Cloning into 'lora-finetuning'...
fatal: could not read Username for 'https://github.com': No such device or address
[Errno 2] No such file or directory: 'lora-finetuning'
/content


## 2. Google Drive Bağlantısı (Checkpoint'leri kaydetmek için)

In [4]:
from google.colab import drive
drive.mount('/content/drive')

# Checkpoint dizini
import os
checkpoint_dir = '/content/drive/MyDrive/lora_checkpoints/deep'
os.makedirs(checkpoint_dir, exist_ok=True)

Mounted at /content/drive


## 3. Training Konfigürasyonu

In [5]:
import sys
import os

# Doğru path
sys.path.append('/content/Lora-fine-tune')

from config.training_config import TrainingConfig
from config.model_config import ModelConfig

# Colab için optimize edilmiş ayarlar
TrainingConfig.use_flash_attention_2 = False
TrainingConfig.gradient_checkpointing = True
TrainingConfig.per_device_batch_size = 1
TrainingConfig.gradient_accumulation_steps = 16

print("✓ Config hazır")


ModuleNotFoundError: No module named 'config'

## 4. Model ve Dataset Yükleme

In [None]:
from models.model_loader import load_model_and_tokenizer
from models.lora_setup import setup_lora
from data.dataset_loader import DatasetLoader

print("1. Model yükleniyor...")
model, tokenizer = load_model_and_tokenizer(
    use_flash_attention=False,
    load_in_8bit=False
)
print("✓ Model yüklendi")

print("\n2. LoRA yapılandırılıyor...")
model = setup_lora(model, use_8bit=False)
print("✓ LoRA yapılandırıldı")

print("\n3. DEEP dataset yükleniyor...")
dataset_loader = DatasetLoader(
    dataset_name="deep",
    tokenizer=tokenizer,
    use_reasoning=False
)
train_dataset, eval_dataset = dataset_loader.load_and_prepare()
print(f"✓ Dataset yüklendi - Train: {len(train_dataset)}, Eval: {len(eval_dataset)}")

## 5. Training

In [None]:
from training.trainer import setup_trainer

print("Trainer yapılandırılıyor...")
trainer = setup_trainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    output_dir=checkpoint_dir,
    run_name="deep_training_colab"
)
print("✓ Trainer hazır")

print("\n" + "="*60)
print("TRAINING BAŞLIYOR!")
print("="*60)
print("\nTahmini süre: 2-4 saat")
print("Colab oturumunu açık tutun!\n")

trainer.train()

## 6. Model Kaydetme

In [None]:
import os

final_model_path = os.path.join(checkpoint_dir, "final_model")
print(f"Final model kaydediliyor: {final_model_path}")

trainer.save_model(final_model_path)
tokenizer.save_pretrained(final_model_path)

print("\n" + "="*60)
print("✓ TRAINING TAMAMLANDI!")
print("="*60)
print(f"\nModel kaydedildi: {final_model_path}")
print(f"Log'lar: {os.path.join(checkpoint_dir, 'logs')}")

## 7. Hızlı Test

In [None]:
# Eğitilmiş model ile test
test_problem = "Write a Python function to calculate factorial of n."

prompt = f"You are an expert Python programmer. Please read the problem carefully before writing any Python code.\n\nProblem:\n{test_problem}\n\nSolution:\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
solution = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Test Problem:", test_problem)
print("\nÜretilen Çözüm:")
print(solution.split("Solution:\n")[-1])

## 8. Dosyaları İndirme (Opsiyonel)

In [None]:
# Log dosyalarını zip'le
!zip -r deep_training_logs.zip {checkpoint_dir}/logs

# İndir
from google.colab import files
files.download('deep_training_logs.zip')