A modern, production-ready GPT-style language model optimized for financial data and continuous learning.
FinAI is a lightweight yet powerful transformer-based language model that trains on financial datasets with state-of-the-art optimization techniques. Features include distributed training, real-time dashboards, and a single unified model that continuously improves with each dataset.
- Features
- Quick Start
- Training Modes
- Distributed Training
- Model Architecture
- Configuration
- Commands Reference
- Project Structure
- Documentation
- Requirements
- Single Unified Model: All training contributes to one model (
models/finai_gpt.pt) - Continuous Learning: Load and continue training from any checkpoint
- Modern Architecture: GPT-style transformer with RoPE, SwiGLU, Flash Attention
- Optimized Training: AdamW optimizer, cosine LR schedule, gradient accumulation
- Accurate ETA: Exponential moving average for smooth, reliable time estimates
- Multi-Machine Training: Train with friends across multiple computers
- Automatic Synchronization: Workers pull/push model checkpoints automatically
- Task Queue Management: Coordinate training across multiple workers
- Single Dataset (
train_single.py): Quick training on one dataset - Sequential (
train_sequential.py): Train datasets one-by-one with commits - Batch (
train_all.py): Combine all pending datasets into one training run - Distributed (
distributed/): Coordinate training across multiple machines
# Clone the repository
git clone <your-repo-url>
cd FinAI
# Install dependencies
pip install -r requirements.txt# Option 1: Train from a text file
python main.py train datasets/my_data.txt
# Option 2: Train from Hugging Face dataset
python main.py train_hf PatronusAI/financebench
# Option 3: Train on a single dataset with dashboard
python train_single.py <dataset-name>python main.py chatTrain on one Hugging Face dataset with automatic dashboard:
python train_single.py PatronusAI/financebenchFeatures:
- Automatic training dashboard at
http://localhost:8080 - Real-time metrics: loss, ETA, progress
- Automatic CSV tracking (moves to
trained_datasets.csv) - Opens browser automatically
Train datasets one-by-one from datasets.csv:
python train_sequential.pyFeatures:
- Processes each dataset individually
- Git commit after each dataset
- Skips already trained datasets
- Updates CSV status automatically
Combine all pending datasets and train once:
python train_all.pyFeatures:
- Merges all pending datasets into one file
- Single training run for efficiency
- Git commits for each dataset
- Automatic cleanup
Train across multiple machines:
# On server (Raspberry Pi or always-on machine)
cd distributed
python server.py
# On each worker machine
python worker.py --server http://server-ip:8765
# Monitor with dashboard
python dashboard.py --server http://server-ip:8765Features:
- Coordinate training across unlimited workers
- Automatic model synchronization
- Real-time monitoring dashboard
- Task queue management
Full Distributed Training Guide
ββββββββββββββββ
β Server β β Coordinates tasks, stores model
β (Raspberry Pi)β
ββββββββ¬ββββββββ
β
βββββ΄βββββ¬ββββββββββ¬ββββββββββ
β β β β
ββββΌβββ βββΌββββ ββββΌβββ βββββΌβββ
βWorkerβ βWorkerβ βWorkerβ βWorkerβ
β #1 β β #2 β β #3 β β #4 β
ββββββββ βββββββ βββββββ ββββββββ
- Start Server (on always-on machine):
cd distributed
python server.py- Start Workers (on each training machine):
python worker.py --server http://server-ip:8765- Submit Tasks (from any machine):
python client.py submit PatronusAI/financebench- Single Model: All workers contribute to
models/finai_gpt.pt - Auto-sync: Workers download latest model before training
- Fault Tolerant: Failed tasks automatically reassigned
Distributed Training Documentation
Remote Access Setup
Architecture: GPT-style Decoder-only Transformer
Parameters: ~15M (configurable)
Layers: 4
Attention Heads: 4
Embedding Dimension: 256
Context Window: 256 tokens
Vocabulary: ~50,000 tokens (BPE)- RoPE (Rotary Position Embeddings): Better position encoding
- SwiGLU Activation: Improved over ReLU/GELU
- Flash Attention: 2-4x faster attention computation
- Gradient Checkpointing: 40% memory savings
- Weight Tying: Shared input/output embeddings
- AdamW Optimizer: L2 regularization for better generalization
- Cosine LR Schedule: Smooth learning rate decay
- Gradient Accumulation: Simulate larger batch sizes
- Mixed Precision (bf16): 50% memory reduction, full accuracy
- Gradient Clipping: Prevents training instability
All settings in src/config.py:
N_LAYER = 4 # Transformer layers
N_HEAD = 4 # Attention heads
N_EMBD = 256 # Embedding dimension
BLOCK_SIZE = 256 # Context window
DROPOUT = 0.05 # Dropout rateTRAIN_STEPS = 5000 # Training steps
BATCH_SIZE = 16 # Batch size
GRADIENT_ACCUM_STEPS = 4 # Gradient accumulation
LEARNING_RATE = 6e-4 # Learning rate
WEIGHT_DECAY = 0.1 # L2 regularization
WARMUP_STEPS = 100 # LR warmup steps
MAX_GRAD_NORM = 1.0 # Gradient clippingMAX_NEW_TOKENS = 512 # Max generation length
TEMPERATURE = 0.7 # Sampling temperature
TOP_K = 40 # Top-k sampling
TOP_P = 0.9 # Nucleus samplingMODEL_DIR = "models"
LANGUAGE_MODEL_PATH = "models/finai_gpt.pt" # Single unified model
TOKENIZER_PATH = "models/tokenizer.pkl"
DATASET_DIR = "datasets"# Train from text file
python main.py train <file.txt> [--steps N] [--batch-size N] [--lr RATE]
# Train from Hugging Face dataset
python main.py train_hf <dataset-id> [--split train] [--max N]
# Interactive chat
python main.py chat
# Generate from prompt
python main.py generate "Your prompt here"# Single dataset
python train_single.py <hf-dataset-name>
# Sequential training
python train_sequential.py
# Batch training
python train_all.py# Server
cd distributed
python server.py [--port 8765]
# Worker
python worker.py --server http://server:8765 [--name worker-1]
# Client (submit tasks)
python client.py submit <dataset-name>
python client.py status
python client.py workersFinAI/
βββ main.py # Main CLI entry point
βββ train_single.py # Single dataset training
βββ train_sequential.py # Sequential training
βββ train_all.py # Batch training
βββ run_prompt.py # Quick generation script
βββ requirements.txt # Python dependencies
βββ datasets.csv # Pending datasets
βββ trained_datasets.csv # Completed datasets
β
βββ src/ # Core source code
β βββ core/
β β βββ finai.py # Main FinAI class
β β βββ context.py # Conversation context
β βββ models/
β β βββ language_model_pytorch.py # GPT model implementation
β βββ data/
β β βββ tokenizer.py # BPE tokenizer
β βββ config.py # Configuration
β
βββ distributed/ # Distributed training system
β βββ server.py # Coordination server
β βββ worker.py # Training worker
β βββ client.py # Task submission client
β βββ server_config.json # Server configuration
β βββ worker_config.json # Worker configuration
β
βββ scripts/ # Utility scripts
β βββ manage_datasets.py # Dataset CSV management
β βββ export_hf_to_txt.py # HF dataset export
β
βββ models/ # Model checkpoints
β βββ finai_gpt.pt # Unified model (single file)
β βββ tokenizer.pkl # Tokenizer
β
βββ datasets/ # Training data
β βββ temp_*.txt # Temporary training files
β
βββ docs/ # Documentation
βββ README.md # Distributed training docs
βββ QUICKSTART.md # Quick start guide
βββ REMOTE_ACCESS_SETUP.md # Remote access guide
βββ EFFICIENCY_ANALYSIS.md # Performance analysis
βββ TRAINING_LOSS_EXPLAINED.md # Loss behavior guide
βββ IMPLEMENTATION_COMPLETE.md # Implementation notes
- README - This file
- Configuration Guide - All configuration options
- Distributed Training Overview - Complete distributed system guide
- Quick Start Guide - Get started in 5 minutes
- Remote Access Setup - Configure remote access
- Efficiency Analysis - Performance benchmarks
- Implementation Notes - Technical details
- Training Loss Explained - Why loss goes up/down, what's normal
- Dataset Management - CSV tracking system
- HF Export - Export Hugging Face datasets
torch>=2.0.0 # PyTorch (CUDA/ROCm/CPU)
transformers>=4.30.0 # HF transformers (scheduler)
datasets>=2.14.0 # HF datasets
accelerate>=0.20.0 # Multi-GPU training (optional)
requests>=2.28.0 # HTTP requests (distributed)
torch-directml # DirectML backend (AMD on Windows)
flash-attn # Flash Attention (NVIDIA only)
Minimum:
- Python 3.8+
- 8GB RAM
- 2GB disk space
Recommended:
- Python 3.10+
- 16GB+ RAM
- NVIDIA GPU with 8GB+ VRAM (or AMD with ROCm)
- 10GB disk space
# Basic installation
pip install -r requirements.txt
# With CUDA (NVIDIA)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# With ROCm (AMD)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
# With DirectML (AMD on Windows)
pip install torch-directml# Train on a financial dataset
python train_single.py PatronusAI/financebench
# Dashboard opens automatically at http://localhost:8080
# Watch real-time metrics: loss, ETA, progress# Add datasets to datasets.csv
echo "PatronusAI/financebench,,train" >> datasets.csv
echo "FinGPT/fingpt-sentiment-train,,train" >> datasets.csv
# Train all at once
python train_all.py# On server (Raspberry Pi)
cd distributed
python server.py
# On worker machines (your PC + friends' PCs)
python worker.py --server http://raspberrypi.local:8765 --name my-pc
# Submit tasks from anywhere
python client.py submit PatronusAI/financebench
python client.py submit FinGPT/fingpt-sentiment-train
# Monitor at http://raspberrypi.local:8081
python dashboard.py --server http://raspberrypi.local:8765python main.py chat
# Or use the quick prompt script
python run_prompt.pyProblem: Out of memory
Solution: Reduce BATCH_SIZE or enable USE_GRAD_CHECKPOINTING in config
Problem: Slow training
Solution: Enable GPU, use --accelerate on, increase BATCH_SIZE
Problem: NaN loss
Solution: Reduce LEARNING_RATE, check MAX_GRAD_NORM is set
Problem: Workers can't connect to server
Solution: Check firewall, use correct IP/port, verify AUTH_PASSWORD
Problem: Model not syncing
Solution: Ensure models/finai_gpt.pt exists on server, check permissions
Problem: Dashboard shows "offline"
Solution: Verify server is running, check SERVER_URL in dashboard config
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details
- Hugging Face - Transformers, Datasets, Accelerate
- PyTorch - Deep learning framework
- OpenAI - GPT architecture inspiration
- Anthropic - Modern training techniques
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: your.email@example.com
Built with for the financial AI community