🧠 VEAH LLM

The Solana-Native Language Model for Blockchain Intelligence

VEAH LLM is an open-source Solana-specialized large language model (LLM) built to understand and reason about the Solana blockchain at a deep technical level. It's designed to serve developers, analysts, and traders who need precise, context-aware answers about Solana — from smart contract structures to tokenomics, validator behavior, and on-chain analytics.

🚀 Overview

VEAH LLM bridges blockchain intelligence and natural language understanding. Unlike generic LLMs, VEAH has been trained and fine-tuned on Solana-specific data, including:

📘 Solana validator and runtime documentation
🧩 SPL token and Raydium/Orca AMM architectures
🧠 Transaction logs, explorer data, and DeFi interactions
🧮 Rust/Anchor codebases and program instruction sets
🧰 Developer guides, SDKs, and RPC call behavior

This allows it to answer complex, real-world questions like:

"Explain how the Compute Budget affects transaction prioritization."
"Analyze this Raydium swap transaction hash for MEV behavior."
"Generate a Solana Anchor smart contract template with an escrow function."
"What are the implications of stake concentration on Solana's Nakamoto coefficient?"
"How do priority fees affect transaction inclusion during network congestion?"

🧩 Architecture

VEAH LLM is built with a modular architecture:

Component	Description
model/	Core transformer architecture and model weights
tokenizer/	Solana-specific tokenization with blockchain vocabulary
training/	Fine-tuning scripts and dataset loaders
inference/	Optimized inference pipeline for deployment
eval/	Benchmark suites and evaluation metrics

⚙️ Installation

Clone the repository:

git clone https://github.com/veah-ai/veah-llm.git
cd veah-llm

Install dependencies:

pip install -r requirements.txt

Download model weights:

# Download from Hugging Face
python scripts/download_model.py --model veah-7b

# Or use wget
wget https://huggingface.co/veah-ai/veah-7b/resolve/main/pytorch_model.bin

🧠 Quick Start

from veah import VeahLLM

# Load the model
model = VeahLLM.from_pretrained("veah-7b")

# Basic generation
response = model.generate(
    "Explain how Solana's Proof of History works",
    max_length=512,
    temperature=0.7
)

# Transaction analysis
tx_analysis = model.analyze_transaction("5RrKQY...XY2f")

# Code generation
code = model.generate_code(
    "Create an Anchor program for a token vesting contract",
    language="rust"
)

# Technical Q&A
answer = model.query(
    "How does Compute Unit pricing affect validator bribes?",
    context="technical"
)

🔧 Advanced Usage

Fine-tuning on Custom Data

from veah.training import FineTuner

# Initialize fine-tuner
trainer = FineTuner(
    base_model="veah-7b",
    dataset_path="path/to/your/solana_data.jsonl"
)

# Start fine-tuning
trainer.train(
    epochs=3,
    learning_rate=2e-5,
    batch_size=4,
    gradient_accumulation_steps=8
)

Using with vLLM for Fast Inference

from vllm import LLM, SamplingParams

llm = LLM(model="veah-ai/veah-7b")
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

prompts = ["Explain Solana's tower BFT consensus"]
outputs = llm.generate(prompts, sampling_params)

🧬 Training Data

VEAH is trained on a comprehensive Solana dataset:

Documentation: 500K+ pages from docs.solana.com, Anchor, Metaplex
Code: 10,000+ open-source Solana programs (Rust/Anchor)
Transactions: 500M+ decoded mainnet transactions
Analytics: DeFi protocol data, NFT collections, validator metrics
Community: Technical discussions from Discord, Stack Overflow, GitHub

Dataset Statistics:

Total Tokens: 100B+ Solana-specific tokens
Unique Programs: 50,000+ on-chain programs analyzed
Time Range: Genesis block to present (continuously updated)

📊 Benchmarks

Benchmark	Score	Description
SolanaQA	94.2%	Accuracy on Solana technical questions
TxDecode	91.7%	Transaction interpretation accuracy
CodeGen	87.3%	Valid Anchor code generation
RustSyntax	95.8%	Syntactically correct Rust output
DeFiLogic	89.1%	Understanding of DeFi protocol mechanics

🛠️ Model Variants

Model	Parameters	Context	VRAM	Use Case
veah-7b	7B	32K	16GB	Best balance of performance and accuracy
veah-3b	3B	16K	8GB	Edge deployment, mobile devices
veah-13b	13B	64K	32GB	Maximum accuracy, research
veah-turbo	7B	8K	16GB	Optimized for speed (2x faster)

🏗️ Training Your Own

Prerequisites

4x A100 80GB GPUs (minimum)
500GB+ storage for datasets
CUDA 11.8+

Training Script

python train.py \
  --model_name veah-7b \
  --dataset solana_corpus \
  --num_epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5 \
  --warmup_steps 1000 \
  --save_steps 5000 \
  --output_dir ./checkpoints

🔬 Evaluation

Run the evaluation suite:

python evaluate.py --model veah-7b --benchmark all

# Individual benchmarks
python evaluate.py --model veah-7b --benchmark solana_qa
python evaluate.py --model veah-7b --benchmark code_generation
python evaluate.py --model veah-7b --benchmark transaction_decode

🤝 Contributing

We welcome contributions! Areas where help is needed:

Datasets: Transaction logs, program code, documentation
Evaluations: New benchmarks and test cases
Optimizations: Inference speed, memory efficiency
Features: RAG integration, tool use, agent capabilities
Languages: TypeScript/JavaScript SDK, Rust bindings

Please see CONTRIBUTING.md for guidelines.

📜 License

MIT License — free to use, modify, and build upon. Please include attribution to VEAH LLM when deploying derived works.

🔗 Resources

💻 GitHub: github.com/veah-ai/veah-llm
📚 Paper: https://veah.gitbook.io/veah-docs/
🐦 Twitter/X: @veahllm

_{Built by the Solana community, for the Solana community.}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
data		data
docs		docs
examples		examples
model		model
veah_tools		veah_tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 VEAH LLM

🚀 Overview

🧩 Architecture

⚙️ Installation

🧠 Quick Start

🔧 Advanced Usage

Fine-tuning on Custom Data

Using with vLLM for Fast Inference

🧬 Training Data

Dataset Statistics:

📊 Benchmarks

🛠️ Model Variants

🏗️ Training Your Own

Prerequisites

Training Script

🔬 Evaluation

🤝 Contributing

📜 License

🔗 Resources

About

Uh oh!

Releases

Packages

Languages

License

Veah-AI/VeahLLM

Folders and files

Latest commit

History

Repository files navigation

🧠 VEAH LLM

🚀 Overview

🧩 Architecture

⚙️ Installation

🧠 Quick Start

🔧 Advanced Usage

Fine-tuning on Custom Data

Using with vLLM for Fast Inference

🧬 Training Data

Dataset Statistics:

📊 Benchmarks

🛠️ Model Variants

🏗️ Training Your Own

Prerequisites

Training Script

🔬 Evaluation

🤝 Contributing

📜 License

🔗 Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages