# dLNk GPT Uncensored - AutoTrain on GPU

This notebook trains the dLNk GPT uncensored model using Hugging Face AutoTrain with GPU acceleration.

**Requirements:**
- GPU Runtime (T4, A100, or V100)
- Hugging Face Token
- 12-16 hours training time

**Steps:**
1. Enable GPU: Runtime → Change runtime type → GPU
2. Run all cells in order
3. Monitor training progress

## 1. Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install -q autotrain-advanced huggingface_hub datasets transformers accelerate bitsandbytes peft

## 2. Login to Hugging Face

In [None]:
from huggingface_hub import login

# Enter your Hugging Face token
HF_TOKEN = ""  # Replace with your token

login(token=HF_TOKEN)
print("Logged in to Hugging Face")

## 3. Load Dataset

In [None]:
from datasets import load_dataset

# Load dataset from Hugging Face Hub
dataset = load_dataset("dlnkgpt/dlnkgpt-uncensored-dataset")

print(f"Dataset loaded:")
print(f"  Train: {len(dataset['train']):,} examples")
print(f"  Validation: {len(dataset['validation']):,} examples")

# Show sample
print(f"\nSample:")
print(dataset['train'][0]['text'][:200] + "...")

## 4. Configure Training

In [None]:
import os

# Training configuration
config = {
    "project_name": "dlnkgpt-uncensored",
    "model": "EleutherAI/gpt-j-6b",
    "dataset": "dlnkgpt/dlnkgpt-uncensored-dataset",
    "text_column": "text",
    "epochs": 3,
    "batch_size": 4,
    "learning_rate": 2e-5,
    "warmup_ratio": 0.1,
    "gradient_accumulation": 8,
    "block_size": 512,
    "use_peft": True,
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "fp16": True,
    "push_to_hub": True
}

# Set environment variable
os.environ['HF_TOKEN'] = HF_TOKEN

print("Configuration set")
print(f"  Base Model: {config['model']}")
print(f"  Epochs: {config['epochs']}")
print(f"  Batch Size: {config['batch_size']}")
print(f"  Learning Rate: {config['learning_rate']}")
print(f"  PEFT/LoRA: {config['use_peft']}")

## 5. Start Training

**Note:** This will take 12-16 hours on T4 GPU.

In [None]:
# Download dataset locally first
dataset.save_to_disk("./autotrain_data")
print("Dataset saved locally")

In [None]:
# Launch AutoTrain
!autotrain llm \
  --train \
  --project-name dlnkgpt-uncensored \
  --model EleutherAI/gpt-j-6b \
  --data-path ./autotrain_data \
  --text-column text \
  --lr 2e-5 \
  --epochs 3 \
  --batch-size 4 \
  --warmup-ratio 0.1 \
  --gradient-accumulation 8 \
  --block_size 512 \
  --logging-steps 100 \
  --eval-strategy steps \
  --save-total-limit 2 \
  --peft \
  --lora-r 16 \
  --lora-alpha 32 \
  --lora-dropout 0.05 \
  --mixed-precision fp16 \
  --push-to-hub \
  --username dlnkgpt \
  --token $HF_TOKEN

## 6. Test Trained Model

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
print("Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-j-6b",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(
    base_model,
    "dlnkgpt/dlnkgpt-uncensored"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("dlnkgpt/dlnkgpt-uncensored")

print("Model loaded successfully")

# Test generation
def generate_response(prompt, max_length=200):
    formatted_prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test prompts
test_prompts = [
    "Explain artificial intelligence",
    "What is machine learning?",
    "Describe neural networks"
]

print("Testing model...")
for i, prompt in enumerate(test_prompts, 1):
    print(f"\nTest {i}: {prompt}")
    response = generate_response(prompt)
    print(f"Response: {response}")

## Summary

**Training Complete!**

Your model has been:
- Trained on 54,000 examples
- Fine-tuned for 3 epochs
- Pushed to Hugging Face Hub
- Ready to use

**Model URL:** https://huggingface.co/dlnkgpt/dlnkgpt-uncensored