Skip to content

Tejas163/ArchGene

Repository files navigation

ArchGene

Verify your LLM architecture before you waste $50K on compute.

Training an LLM costs $10K–$100K+. The #1 reason training fails? Architecture misconfiguration — hidden dimension misalignment, attention bugs, incompatible layer configurations.

ArchGene catches these issues BEFORE you spend on GPU time.

The Problem

You spend $50K on GPU cluster
↓
Start training
↓
Day 3: OOM errors, NaN outputs, training crashes
↓
Why? Hidden dimension not divisible by attention heads
↓
$50K wasted

ArchGene prevents this.

What It Does

Feature What It Tells You
Z3 Verification "Your architecture is mathematically valid" or "Here's what's broken"
Cost Estimation "This will cost $12K to train on 8x A100s"
Benchmark Projections "Expected MMLU score: ~42%"
Model Zoo Compare against GPT-2, Llama-2, Mistral, etc.
Design Session Conversational Q&A that designs a verified architecture for your use case
Kernel Generation Generates runnable PyTorch model.py, config.json, and train.py

Quick Start

# Install
pip install archgene

# Design an architecture through conversational Q&A
archgene design

# Verify your architecture BEFORE training
archgene verify --hidden 4096 --heads 32 --layers 24

# Generate runnable PyTorch code
archgene generate --session 0

# Get cost estimate
archgene cost gpt2 --gpu A100

# Check against known architectures
archgene zoo-evaluate llama2_7b

Why This Matters

  • Don't waste compute: Catch bugs before GPU costs begin
  • Know your bill: Estimate training cost before you start
  • Validate fast: Z3 proves correctness mathematically

Use Cases

  1. Building a custom LLM? Verify architecture before training
  2. Fine-tuning an existing model? Check your config is valid
  3. Comparing architectures? Benchmark against model zoo

CLI Examples

# Verify custom architecture
archgene verify --hidden 4096 --heads 32 --layers 24

# Cost estimation
archgene cost gpt2 --gpu H100 --batch-size 16

# List pre-trained architectures
archgene zoo-list

# Benchmark estimate
archgene benchmark llama2_7b

# Design an architecture through conversational Q&A
archgene design

# Generate runnable PyTorch code from a design
archgene generate -d 4096 -l 32 -n 16 -i 11008

Architecture Parameters

Parameter Description Example Values
vocab_dim Vocabulary size 32000, 50257
hidden_dim Hidden dimension 768, 4096, 8192
num_layers Layer count 12, 24, 32
num_heads Attention heads 8, 16, 32
head_dim Head dimension 64, 128
intermediate_size FFN hidden 2048, 11008

Cost Reference

Model Parameters VRAM (FP16) Training Cost (1T tokens)
GPT-2 176M 0.4 GB ~$50
Llama-2-7B 6.4B 14 GB ~$2,500
Llama-2-70B 70B 145 GB ~$25,000

Tech Stack

  • Python 3.12+
  • Z3 theorem prover (formal verification)
  • PyTorch (code generation)
  • Streamlit (optional web UI: pip install archgene[web])

Links

License

MIT

About

Self-healing multi-agent cognitive architecture evaluation system

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors