Skip to content

MeridianAlgo/FinAI

Repository files navigation

πŸš€ FinAI - Financial Language Model

A modern, production-ready GPT-style language model optimized for financial data and continuous learning.

FinAI is a lightweight yet powerful transformer-based language model that trains on financial datasets with state-of-the-art optimization techniques. Features include distributed training, real-time dashboards, and a single unified model that continuously improves with each dataset.


Table of Contents


Features

Core Capabilities

  • Single Unified Model: All training contributes to one model (models/finai_gpt.pt)
  • Continuous Learning: Load and continue training from any checkpoint
  • Modern Architecture: GPT-style transformer with RoPE, SwiGLU, Flash Attention
  • Optimized Training: AdamW optimizer, cosine LR schedule, gradient accumulation
  • Accurate ETA: Exponential moving average for smooth, reliable time estimates

Distributed Training

  • Multi-Machine Training: Train with friends across multiple computers
  • Automatic Synchronization: Workers pull/push model checkpoints automatically
  • Task Queue Management: Coordinate training across multiple workers

Training Modes

  1. Single Dataset (train_single.py): Quick training on one dataset
  2. Sequential (train_sequential.py): Train datasets one-by-one with commits
  3. Batch (train_all.py): Combine all pending datasets into one training run
  4. Distributed (distributed/): Coordinate training across multiple machines

πŸš€ Quick Start

Installation

# Clone the repository
git clone <your-repo-url>
cd FinAI

# Install dependencies
pip install -r requirements.txt

Train Your First Model

# Option 1: Train from a text file
python main.py train datasets/my_data.txt

# Option 2: Train from Hugging Face dataset
python main.py train_hf PatronusAI/financebench

# Option 3: Train on a single dataset with dashboard
python train_single.py <dataset-name>

Chat with Your Model

python main.py chat

Training Modes

1. Single Dataset Training

Train on one Hugging Face dataset with automatic dashboard:

python train_single.py PatronusAI/financebench

Features:

  • Automatic training dashboard at http://localhost:8080
  • Real-time metrics: loss, ETA, progress
  • Automatic CSV tracking (moves to trained_datasets.csv)
  • Opens browser automatically

2. Sequential Training

Train datasets one-by-one from datasets.csv:

python train_sequential.py

Features:

  • Processes each dataset individually
  • Git commit after each dataset
  • Skips already trained datasets
  • Updates CSV status automatically

3. Batch Training

Combine all pending datasets and train once:

python train_all.py

Features:

  • Merges all pending datasets into one file
  • Single training run for efficiency
  • Git commits for each dataset
  • Automatic cleanup

4. Distributed Training

Train across multiple machines:

# On server (Raspberry Pi or always-on machine)
cd distributed
python server.py

# On each worker machine
python worker.py --server http://server-ip:8765

# Monitor with dashboard
python dashboard.py --server http://server-ip:8765

Features:

  • Coordinate training across unlimited workers
  • Automatic model synchronization
  • Real-time monitoring dashboard
  • Task queue management

Full Distributed Training Guide


Distributed Training

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Server     β”‚  ← Coordinates tasks, stores model
β”‚ (Raspberry Pi)β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚        β”‚         β”‚         β”‚
β”Œβ”€β”€β–Όβ”€β”€β”  β”Œβ”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”  β”Œβ”€β”€β”€β–Όβ”€β”€β”
β”‚Workerβ”‚  β”‚Workerβ”‚  β”‚Workerβ”‚  β”‚Workerβ”‚
β”‚  #1  β”‚  β”‚  #2 β”‚  β”‚  #3 β”‚  β”‚  #4  β”‚
β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜

Setup

  1. Start Server (on always-on machine):
cd distributed
python server.py
  1. Start Workers (on each training machine):
python worker.py --server http://server-ip:8765
  1. Submit Tasks (from any machine):
python client.py submit PatronusAI/financebench

Key Features

  • Single Model: All workers contribute to models/finai_gpt.pt
  • Auto-sync: Workers download latest model before training
  • Fault Tolerant: Failed tasks automatically reassigned

Distributed Training Documentation
Remote Access Setup


Model Architecture

Transformer Specifications

Architecture: GPT-style Decoder-only Transformer
Parameters: ~15M (configurable)
Layers: 4
Attention Heads: 4
Embedding Dimension: 256
Context Window: 256 tokens
Vocabulary: ~50,000 tokens (BPE)

Modern Features

  • RoPE (Rotary Position Embeddings): Better position encoding
  • SwiGLU Activation: Improved over ReLU/GELU
  • Flash Attention: 2-4x faster attention computation
  • Gradient Checkpointing: 40% memory savings
  • Weight Tying: Shared input/output embeddings

Training Optimizations

  • AdamW Optimizer: L2 regularization for better generalization
  • Cosine LR Schedule: Smooth learning rate decay
  • Gradient Accumulation: Simulate larger batch sizes
  • Mixed Precision (bf16): 50% memory reduction, full accuracy
  • Gradient Clipping: Prevents training instability

Configuration

All settings in src/config.py:

Model Architecture

N_LAYER = 4              # Transformer layers
N_HEAD = 4               # Attention heads  
N_EMBD = 256             # Embedding dimension
BLOCK_SIZE = 256         # Context window
DROPOUT = 0.05           # Dropout rate

Training Parameters

TRAIN_STEPS = 5000       # Training steps
BATCH_SIZE = 16          # Batch size
GRADIENT_ACCUM_STEPS = 4 # Gradient accumulation
LEARNING_RATE = 6e-4     # Learning rate
WEIGHT_DECAY = 0.1       # L2 regularization
WARMUP_STEPS = 100       # LR warmup steps
MAX_GRAD_NORM = 1.0      # Gradient clipping

Generation Settings

MAX_NEW_TOKENS = 512     # Max generation length
TEMPERATURE = 0.7        # Sampling temperature
TOP_K = 40               # Top-k sampling
TOP_P = 0.9              # Nucleus sampling

Paths

MODEL_DIR = "models"
LANGUAGE_MODEL_PATH = "models/finai_gpt.pt"  # Single unified model
TOKENIZER_PATH = "models/tokenizer.pkl"
DATASET_DIR = "datasets"

πŸ“ Commands Reference

Main CLI (main.py)

# Train from text file
python main.py train <file.txt> [--steps N] [--batch-size N] [--lr RATE]

# Train from Hugging Face dataset
python main.py train_hf <dataset-id> [--split train] [--max N]

# Interactive chat
python main.py chat

# Generate from prompt
python main.py generate "Your prompt here"

Training Scripts

# Single dataset
python train_single.py <hf-dataset-name>

# Sequential training
python train_sequential.py

# Batch training
python train_all.py

Distributed Training

# Server
cd distributed
python server.py [--port 8765]

# Worker
python worker.py --server http://server:8765 [--name worker-1]

# Client (submit tasks)
python client.py submit <dataset-name>
python client.py status
python client.py workers

πŸ“ Project Structure

FinAI/
β”œβ”€β”€ main.py                      # Main CLI entry point
β”œβ”€β”€ train_single.py              # Single dataset training
β”œβ”€β”€ train_sequential.py          # Sequential training
β”œβ”€β”€ train_all.py                 # Batch training
β”œβ”€β”€ run_prompt.py                # Quick generation script
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ datasets.csv                 # Pending datasets
β”œβ”€β”€ trained_datasets.csv         # Completed datasets
β”‚
β”œβ”€β”€ src/                         # Core source code
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ finai.py            # Main FinAI class
β”‚   β”‚   └── context.py          # Conversation context
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── language_model_pytorch.py  # GPT model implementation
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   └── tokenizer.py        # BPE tokenizer
β”‚   └── config.py               # Configuration
β”‚
β”œβ”€β”€ distributed/                 # Distributed training system
β”‚   β”œβ”€β”€ server.py               # Coordination server
β”‚   β”œβ”€β”€ worker.py               # Training worker
β”‚   β”œβ”€β”€ client.py               # Task submission client
β”‚   β”œβ”€β”€ server_config.json      # Server configuration
β”‚   └── worker_config.json      # Worker configuration
β”‚
β”œβ”€β”€ scripts/                     # Utility scripts
β”‚   β”œβ”€β”€ manage_datasets.py      # Dataset CSV management
β”‚   └── export_hf_to_txt.py     # HF dataset export
β”‚
β”œβ”€β”€ models/                      # Model checkpoints
β”‚   β”œβ”€β”€ finai_gpt.pt            # Unified model (single file)
β”‚   └── tokenizer.pkl           # Tokenizer
β”‚
β”œβ”€β”€ datasets/                    # Training data
β”‚   └── temp_*.txt              # Temporary training files
β”‚
└── docs/                        # Documentation
    β”œβ”€β”€ README.md               # Distributed training docs
    β”œβ”€β”€ QUICKSTART.md           # Quick start guide
    β”œβ”€β”€ REMOTE_ACCESS_SETUP.md  # Remote access guide
    β”œβ”€β”€ EFFICIENCY_ANALYSIS.md  # Performance analysis
    β”œβ”€β”€ TRAINING_LOSS_EXPLAINED.md  # Loss behavior guide
    └── IMPLEMENTATION_COMPLETE.md  # Implementation notes

Documentation

Core Documentation

Distributed Training

Training Guides

Scripts Documentation


Requirements

Core Dependencies

torch>=2.0.0              # PyTorch (CUDA/ROCm/CPU)
transformers>=4.30.0      # HF transformers (scheduler)
datasets>=2.14.0          # HF datasets
accelerate>=0.20.0        # Multi-GPU training (optional)
requests>=2.28.0          # HTTP requests (distributed)

Optional Dependencies

torch-directml            # DirectML backend (AMD on Windows)
flash-attn               # Flash Attention (NVIDIA only)

System Requirements

Minimum:

  • Python 3.8+
  • 8GB RAM
  • 2GB disk space

Recommended:

  • Python 3.10+
  • 16GB+ RAM
  • NVIDIA GPU with 8GB+ VRAM (or AMD with ROCm)
  • 10GB disk space

Installation

# Basic installation
pip install -r requirements.txt

# With CUDA (NVIDIA)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# With ROCm (AMD)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

# With DirectML (AMD on Windows)
pip install torch-directml

Usage Examples

Example 1: Quick Training

# Train on a financial dataset
python train_single.py PatronusAI/financebench

# Dashboard opens automatically at http://localhost:8080
# Watch real-time metrics: loss, ETA, progress

Example 2: Batch Training

# Add datasets to datasets.csv
echo "PatronusAI/financebench,,train" >> datasets.csv
echo "FinGPT/fingpt-sentiment-train,,train" >> datasets.csv

# Train all at once
python train_all.py

Example 3: Distributed Training

# On server (Raspberry Pi)
cd distributed
python server.py

# On worker machines (your PC + friends' PCs)
python worker.py --server http://raspberrypi.local:8765 --name my-pc

# Submit tasks from anywhere
python client.py submit PatronusAI/financebench
python client.py submit FinGPT/fingpt-sentiment-train

# Monitor at http://raspberrypi.local:8081
python dashboard.py --server http://raspberrypi.local:8765

Example 4: Chat with Model

python main.py chat

# Or use the quick prompt script
python run_prompt.py

πŸ”§ Troubleshooting

Training Issues

Problem: Out of memory
Solution: Reduce BATCH_SIZE or enable USE_GRAD_CHECKPOINTING in config

Problem: Slow training
Solution: Enable GPU, use --accelerate on, increase BATCH_SIZE

Problem: NaN loss
Solution: Reduce LEARNING_RATE, check MAX_GRAD_NORM is set

Distributed Issues

Problem: Workers can't connect to server
Solution: Check firewall, use correct IP/port, verify AUTH_PASSWORD

Problem: Model not syncing
Solution: Ensure models/finai_gpt.pt exists on server, check permissions

Problem: Dashboard shows "offline"
Solution: Verify server is running, check SERVER_URL in dashboard config


Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

MIT License - see LICENSE file for details


Acknowledgments

  • Hugging Face - Transformers, Datasets, Accelerate
  • PyTorch - Deep learning framework
  • OpenAI - GPT architecture inspiration
  • Anthropic - Modern training techniques

Support


Built with for the financial AI community

About

We are researching and developing our own in-house LLM, which will be focused on finance-based chats and requests.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published