# dLNk GPT Uncensored - AutoTrain on GPU

This notebook trains the dLNk GPT uncensored model using Hugging Face AutoTrain with GPU acceleration.

**Requirements:**
- GPU Runtime (T4, A100, or V100)
- Hugging Face Token
- 12-16 hours training time

**Steps:**
1. Enable GPU: Runtime â†’ Change runtime type â†’ GPU
2. Run all cells in order
3. Monitor training progress

## 1. Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install AutoTrain and dependencies
# Note: Install autotrain-advanced FIRST, then it will install correct versions of dependencies
!pip install -q autotrain-advanced
print("AutoTrain installed successfully")

## 2. Login to Hugging Face

In [None]:
from huggingface_hub import login

# Enter your Hugging Face token here
HF_TOKEN = ""  # Paste your token between the quotes

if not HF_TOKEN:
    print("Please enter your Hugging Face token above")
else:
    login(token=HF_TOKEN)
    print("Successfully logged in to Hugging Face!")

## 3. Load Dataset

In [None]:
from datasets import load_dataset

print("Loading dataset from Hugging Face Hub...")
dataset = load_dataset("dlnkgpt/dlnkgpt-uncensored-dataset")

print(f"\nDataset loaded successfully!")
print(f"  Training examples: {len(dataset['train']):,}")
print(f"  Validation examples: {len(dataset['validation']):,}")

# Show a sample
print(f"\nSample text:")
print(dataset['train'][0]['text'][:300] + "...")

## 4. Prepare Dataset

In [None]:
# Save dataset to local disk for AutoTrain
print("Saving dataset to local disk...")
dataset.save_to_disk("./autotrain_data")
print("Dataset saved successfully!")

## 5. Configure Training

In [None]:
import os

# Set HF token as environment variable
os.environ['HF_TOKEN'] = HF_TOKEN

# Training configuration
config = {
    'project_name': 'dlnkgpt-uncensored',
    'model': 'EleutherAI/gpt-j-6b',
    'data_path': './autotrain_data',
    'text_column': 'text',
    'epochs': 3,
    'batch_size': 4,
    'learning_rate': 2e-5,
    'warmup_ratio': 0.1,
    'gradient_accumulation': 8,
    'block_size': 512,
    'lora_r': 16,
    'lora_alpha': 32,
    'lora_dropout': 0.05
}

print("Training Configuration:")
print(f"  Base Model: {config['model']}")
print(f"  Epochs: {config['epochs']}")
print(f"  Batch Size: {config['batch_size']}")
print(f"  Effective Batch Size: {config['batch_size'] * config['gradient_accumulation']}")
print(f"  Learning Rate: {config['learning_rate']}")
print(f"  Using LoRA: Yes (r={config['lora_r']}, alpha={config['lora_alpha']})")
print(f"\nEstimated training time: 12-16 hours on T4 GPU")

## 6. Start Training

**Important Notes:**
- Training will take 12-16 hours on T4 GPU
- You can close this tab and come back later
- The model will be automatically pushed to Hugging Face Hub when complete
- Check progress by viewing the output below

In [None]:
# Start AutoTrain
!autotrain llm \
  --train \
  --project-name {config['project_name']} \
  --model {config['model']} \
  --data-path {config['data_path']} \
  --text-column {config['text_column']} \
  --lr {config['learning_rate']} \
  --epochs {config['epochs']} \
  --batch-size {config['batch_size']} \
  --warmup-ratio {config['warmup_ratio']} \
  --gradient-accumulation {config['gradient_accumulation']} \
  --block_size {config['block_size']} \
  --logging-steps 100 \
  --eval-strategy steps \
  --save-total-limit 2 \
  --peft \
  --lora-r {config['lora_r']} \
  --lora-alpha {config['lora_alpha']} \
  --lora-dropout {config['lora_dropout']} \
  --mixed-precision fp16 \
  --push-to-hub \
  --username dlnkgpt \
  --token $HF_TOKEN

## 7. Test the Trained Model

Run this cell after training is complete to test your model.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

print("Loading trained model...")

# Load base model
print("  Loading base model (GPT-J-6B)...")
base_model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-j-6b",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
print("  Loading LoRA adapter...")
model = PeftModel.from_pretrained(
    base_model,
    "dlnkgpt/dlnkgpt-uncensored"
)

# Load tokenizer
print("  Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained("dlnkgpt/dlnkgpt-uncensored")

print("\nModel loaded successfully!\n")

# Test generation function
def generate_response(prompt, max_length=200):
    formatted_prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test with sample prompts
test_prompts = [
    "Explain what artificial intelligence is",
    "What is the difference between machine learning and deep learning?",
    "How do neural networks work?"
]

print("=" * 70)
print("Testing Model")
print("=" * 70)

for i, prompt in enumerate(test_prompts, 1):
    print(f"\n[Test {i}] Prompt: {prompt}")
    print("-" * 70)
    response = generate_response(prompt)
    print(f"Response:\n{response}")
    print("=" * 70)

## Summary

### Training Complete!

Your dLNk GPT uncensored model has been successfully trained!

**Model Details:**
- Base Model: GPT-J-6B (6 billion parameters)
- Training Examples: 54,000
- Validation Examples: 6,000
- Training Method: LoRA/PEFT
- Epochs: 3

**Model Location:**
- Hugging Face Hub: https://huggingface.co/dlnkgpt/dlnkgpt-uncensored
- Local: ./dlnkgpt-uncensored/

**Next Steps:**
1. Test the model with various prompts (see cell above)
2. Download the model for local use
3. Integrate with your application
4. Deploy to production

**Using the Model:**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6b")
model = PeftModel.from_pretrained(base_model, "dlnkgpt/dlnkgpt-uncensored")
tokenizer = AutoTokenizer.from_pretrained("dlnkgpt/dlnkgpt-uncensored")
```

Congratulations! ðŸŽ‰