A GPT-2 based conversational AI training framework optimized for NVIDIA RTX 3060 (12GB VRAM). This repository contains everything needed to train, fine-tune, and run inference on Theta AI models.
- Optimized Training Pipeline: Gradient checkpointing, mixed precision (FP16), CPU offloading
- Advanced Techniques: Curriculum learning, R-Drop regularization, EMA, label smoothing
- RTX 3060 Optimized: Configured for 12GB VRAM with memory-efficient settings
- Email Notifications: Real-time training alerts with GPU stats and metric monitoring
- Multi-domain Training: Cybersecurity, programming, networking, data science, and more
- GPU: NVIDIA RTX 3060 12GB (or similar)
- CPU: AMD Ryzen 5-5500 or equivalent
- CUDA: 11.8+
- Python: 3.8+
# Clone repository
git clone https://github.com/yourusername/theta-ai.git
cd theta-ai
# Install dependencies (CUDA)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
# Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('wordnet'); nltk.download('stopwords')"
# Setup environment
cp .env.example .env
# Edit .env with your settings# Human-like conversational data (28MB)
download_human_like_dpo.bat
# OpenAssistant dataset (6GB)
download_openassistant.bat
# OpenMath dataset (9GB, optional)
download_openmath_instruct.bat# Full training pipeline (overnight recommended)
train_overnight_enhanced.batIf training stalls or validation loss plateaus, run targeted fine-tuning with reduced regularization:
# Create a config file or use existing one
finetune_theta.batSee Training Pipeline for details.
from src.model.theta_model import ThetaModel
model = ThetaModel.load("models/theta_enhanced_YYYYMMDD/theta_final")
response = model.generate("What is machine learning?", max_length=200)
print(response)Full documentation is available in the documentation/ folder:
| Guide | Description |
|---|---|
| Installation | Detailed setup instructions |
| Quick Start | Get training in 5 minutes |
| Training Pipeline | Complete training system guide |
| Datasets | Dataset formats and creation |
| Hyperparameters | All configuration options |
| RTX 3060 Optimizations | GPU-specific tuning |
| Email Notifications | Alert system setup |
| Architecture | System design overview |
| API Reference | Code documentation |
| Data Processing | Data preparation guide |
| Model Config | Model settings |
| Troubleshooting | Common issues & fixes |
theta-ai/
├── src/
│ ├── model/ # Model architecture
│ ├── training/ # Training pipeline
│ ├── inference/ # Inference utilities
│ ├── data_processing/ # Dataset processing
│ └── utils/ # Email notifier, GPU info
├── Datasets/ # Training data (JSON)
├── models/ # Saved checkpoints
├── documentation/ # Full documentation
├── train_overnight_enhanced.bat # Main training script
├── prepare_data_for_training.py # Data preparation
└── requirements.txt # Dependencies
| File | Purpose |
|---|---|
train_overnight_enhanced.bat |
Main training orchestration |
finetune_theta.bat |
Targeted fine-tuning with reduced regularization |
prepare_data_for_training.py |
Data preparation pipeline |
src/training/train_enhanced.py |
Core training logic |
src/model/theta_model.py |
Model architecture |
src/utils/email_notifier.py |
Training notifications |
See CONTRIBUTING.md for guidelines.
See CHANGELOG.md for version history.
This project is for educational and research purposes.