HiveAI Knowledge Refinery

What is it?

HiveAI Knowledge Refinery is an AI-powered research system that autonomously discovers authoritative web sources, extracts structured knowledge into a persistent graph, generates polished "Golden Book" reference documents, and publishes them immutably to the Hive blockchain. It then distills this knowledge into LoRA adapters — compressed skill artifacts that encode the ability to produce knowledge, not just stored facts.

It operates fully offline with Ollama — no API keys required — while seamlessly scaling to high-performance configurations with async job queues, automatic hardware tuning, and incremental knowledge graph updates. The system features merge cycling (progressive LoRA training), MoLoRA domain routing, a secure code sandbox, and a Decentralized Brain Collective (DBC) for on-chain knowledge sharing.

Features

Research Pipeline

URL Discovery — Intelligent source discovery using DuckDuckGo, Brave Search API, and SearXNG with LLM-powered fallback for offline operation
Mass Web Crawling — Asynchronous browser automation via Playwright with JavaScript rendering, page caching (configurable TTL), and automatic retry logic
Semantic Chunking — Intelligent content splitting with embedding-based topic boundary detection (or fast semchunk fallback)
Chain-of-Thought Extraction — Multi-step LLM reasoning: claim identification → atomic facts → confidence assessment → triple generation with full justification

Knowledge Graph

Triples with Attribution — Subject-predicate-object triples linked to source chunks, crawled pages, and confidence scores
Entity Resolution — Normalizes aliases (JavaScript→JS, Machine Learning→ML) and deduplicates entities using embedding similarity
Contradiction Detection — Identifies opposing predicates and numeric conflicts, resolves by confidence score
Community Detection — Louvain clustering groups related triples into topic communities with LLM-generated summaries for cohesive broad-topic answering
GraphRAG Integration — Persistent, incremental knowledge graph with full traceability and community-based context compression

Golden Books

Multi-Step Generation — Automated outline creation → prose writing with outline context → coherence review → conditional rewrite for quality
Quality Scoring — Five-dimension automatic evaluation with auto-rewrite for low-quality output
Cross-Book Referencing — Maintains context relationships across multiple generated books
Semantic Novelty Filtering — Detects and excludes redundant content across existing books

AI Chat ("Ask the Keeper")

Multi-Hop RAG — Three-stage retrieval: semantic vector search → entity extraction for second-hop search → cross-encoder reranking
Semantic Vector Search — HNSW-indexed pgvector (server) or numpy cosine similarity (local) with context-aware embeddings
Cross-Encoder Reranking — ms-marco-MiniLM-L-6-v2 model dramatically improves result relevance after initial retrieval
GraphRAG Context — Leverages community summaries for broader topical questions
Auto-Learn — Detects knowledge gaps and automatically triggers background research jobs, re-answers with new knowledge when available
Code Self-Verification — Code blocks in chat responses are executed in a secure sandbox before returning to the user, catching runtime errors before they reach you
MoLoRA Domain Routing — Automatic domain detection routes queries to specialized models (Python, Hive, JavaScript, Rust, etc.) when enabled

LoRA Training Pipeline

Self-Distillation — 16 prompt templates generate training pairs from the model's own knowledge (standard + O1-style reasoning)
Claude Opus Distillation — 5,500+ expert-curated training pairs across 996+ batch files covering 80+ technical domains (Python, TypeScript, React, CSS, DevOps, databases, security, ML/AI, system design, compilers, distributed systems, Hive blockchain, and more). Loaded dynamically via scripts/claude_distill_v2.py
Thinking-Trace Curriculum — 1,638 thinking-trace training pairs with <think> blocks across 4 cognitive phases: Foundation (434 pairs: debugging, security, performance, architecture, testing, concurrency, data modeling), Advanced (401 pairs: backtracking, adversarial self-testing, analogical reasoning, tradeoffs, uncertainty), Meta-Cognition (403 pairs: reasoning quality, confidence calibration, Socratic method, learning from mistakes, code quality judgment), and Autonomy (400 pairs: training data generation, self-evaluation, curriculum design, meta-learning, autonomous improvement, quality assurance). Target mix: 72% direct-answer + 28% thinking-trace
Self-Improvement Pipeline — Autonomous "get smarter" loop: self-eval → data generation → QLoRA training → GGUF export → validation. Includes LoRA bank, curriculum meta-learning, error-driven learning, and online distillation (p400-p407 batch series)
Genetic Expansion — Mutation operators (rephrase, constrain, generalize, error-inject, multi-step) expand top pairs for data diversity
Quality Scoring v5 — Multi-dimensional scoring with code quality cap (0.35), no-code gate, MIN_CODE_BLOCKS requirement, and tiered dedup (exact/paraphrase/near)
Merge Cycling — Train LoRA → merge into base → train new LoRA on improved base → repeat. Each cycle bakes specialization deeper into core weights
MoLoRA (Mixture of LoRA Experts) — Domain-specialized LoRAs with intelligent keyword-based query routing. Each domain gets its own merged Ollama model with graceful fallback to the generalist
Continuous Improvement Loop — One-command orchestrator (scripts/improve.py): train → deploy → eval → hunt weaknesses → generate targeted pairs → prep next cycle. Includes regression rollback gate (>5% score drop flagged), adapter validation, data quality checks, and rich cycle history tracking (per-category scores, dimensions, deltas)
Weakness Hunter — Analyzes eval results by category, identifies gaps, auto-generates targeted training pairs using the miner/distiller to close specific weaknesses
Early Stopping — 5% validation split with EarlyStoppingCallback (patience=3 evals / 150 steps) prevents overtraining and saves hours
Eval System — 165 coding challenges across 18 categories and 4 dimensions: code correctness (30%), test quality (30%), conceptual depth (20%), explanation (20%). Supports parallel evaluation (--workers 3 for 3x speed)
Model Benchmark Harness — 12-task harness across 4 categories (single-shot generation, refactoring, debugging, context retention) measuring tokens/sec, TTFT, VRAM, speculative decoding acceptance rate, and task success. Built-in model comparison mode
Speculative Decoding — 0.5B draft model (Qwen2.5-Coder-0.5B) for 1.5-2x faster inference via llama-server's --model-draft flag
Multi-Backend Serving — Ollama for standard models, llama-server for LoRA adapters, OpenRouter for cloud models. Automatic routing via smart_call()
Per-Backend Circuit Breakers — Isolated failure tracking per backend (llama-server, Ollama, OpenRouter, embedding) prevents cascading failures
Connection Pooling — Persistent HTTP sessions with Keep-Alive for all local backend communication
RAG Query Cache — 30-minute TTL cache for repeated knowledge queries (40-50% of queries are repeats)

Blockchain Publishing

Hive Blockchain Integration — Publishes Golden Books to Hive with configurable multipart splitting for large content
Tagged Organization — Namespace-based tagging (archivedcontenthaf, hiveaiknowledgehaf) for discoverability
Immutable Timestamping — Permanent, verifiable record on distributed ledger with full source attribution

Decentralized Brain Collective (DBC)

On-Chain Knowledge Sharing — Nodes propose and vote on training pairs via Hive custom_json operations
HP-Weighted Consensus — Hive Power-weighted voting prevents sybil attacks; epoch-based timeout ensures liveness
Pair Encoding — Gzip + base64 encoding fits training pairs into Hive's custom_json size limits
Secrets Scanner — 10-pattern scanner prevents accidental credential leaks in training data
RC Hysteresis — Automatically pauses operations when Resource Credits drop below floor, resumes when recovered
HivePoA Storage — IPFS + GitHub Releases fallback for large adapter files with resume-capable downloads

Local-First Operation

Ollama Support — Fully offline operation using Ollama with Qwen3, Phi-4, or any compatible model
Hardware Auto-Detection — Automatic profile selection (low/medium/high) based on CPU and RAM
Configurable Embedding Models — BAAI/bge-m3, Qwen3-Embedding, or any sentence-transformers model with single-command re-embedding
Confidence Routing — smart_call() classifies query difficulty → routes trivial/simple queries to fast model, complex ones to reasoning model. 60-70% of queries use the fast model.
Zero External Dependencies — Optional search APIs (Serper.dev, Brave) but fully functional without them

Secure Code Sandbox

Safe Execution — Python and JavaScript code runs in isolated subprocesses with timeouts, resource limits, and restricted imports
Chat Verification — When enabled (CHAT_VERIFY_CODE=true), all code in chat responses is executed before delivery, catching syntax and runtime errors

Job Queue & Scaling

Async Background Processing — SQLAlchemy-based queue with automatic worker threads
Batch Job Creation — Create hundreds of research jobs in a single request
Pause/Resume — Pause entire queue without losing progress, resume seamlessly
Per-Job Cancellation — Cancel individual jobs mid-pipeline with cleanup
Progress Tracking — Real-time status, stage transitions, and error logging per job

Spirit Bomb — Community GPU Cloud (21 modules)

A permissionless GPU pooling system where community members share graphics cards to create a collective AI brain. More GPUs = smarter AI. See How GPU Sharing Works.

One-command launch: python scripts/start_spiritbomb.py

Module	What it does
`community_coordinator.py`	Polls HivePoA every 15min, groups GPUs into clusters, publishes tier manifests on Hive blockchain
`cluster_manager.py`	Geo-aware GPU clustering (<50ms latency groups), Helix-style placement scoring
`elastic_moe.py`	Dynamic MoE expert activation — 2/4/8 experts scale with tier
`inference_worker.py`	GPU inference contribution mode with /ping server, latency probing, heartbeat
`distributed_inference.py`	vLLM pipeline/tensor/expert parallel + DualModeRouter (Local vs Cluster)
`tier_autoscaler.py`	Automatic tier transitions with hysteresis, EMA smoothing, drain periods
`helix_placement.py`	Max-flow GPU layer placement optimizer (ASPLOS 2025 Helix-inspired)
`speculative_decoding.py`	EAGLE-3 adaptive draft for 3-6.5x inference speedup
`expert_sharding.py`	IPFS-based MoE expert weight distribution with SHA-256 verification
`vllm_launcher.py`	vLLM process manager with graceful tier-change restart
`distributed_training.py`	Federated LoRA + Hivemind DDP + DisTrO pretraining (1000x bandwidth reduction)
`training_worker.py`	Executes training jobs with TOPLOC proof submission
`training_verification.py`	TOPLOC cryptographic training contribution verification (INTELLECT-2 inspired)
`incentives.py`	HBD reward calculation with tier multipliers, uptime/quality bonuses
`reward_settlement.py`	Hourly settlement cycle bridging contributions to HBD payouts
`model_registry.py`	Task-aware model selection per tier (code/chat/reasoning/creative)
`kv_cache_router.py`	KV-cache affinity routing for multi-turn conversations (llm-d inspired)
`lmcache_config.py`	LMCache hierarchical KV cache config (GPU→CPU→disk→Redis)
`latency_prober.py`	RTT/bandwidth measurement between community nodes
`nccl_benchmark.py`	GPU interconnect benchmark for cluster eligibility + geohash service
`benchmark_speculative.py`	EAGLE-3 speculative decoding benchmark harness

Tier System: <15 GPUs = Tier 1 (local 14B model), 15-39 = Tier 2 (cluster 32B), 40+ = Tier 3 (full 80B MoE brain)

Multi-Machine Setup:

# Computer A (head):
docker compose -f docker-compose.spiritbomb.yml up

# Computer B (worker — joins via Ray):
RAY_HEAD_ADDRESS=COMPUTER_A_IP:6379 docker compose -f docker-compose.spiritbomb.yml --profile multi-machine up vllm-worker

Tests: 51 tests covering autoscaler, clustering, MoE, placement, speculative decoding, KV-cache, latency, training, incentives

Quick Start (Windows)

1. Install Prerequisites

Python 3.11+ — Download from python.org. During install, check "Add Python to PATH".
Git — Download from git-scm.com
Ollama — Download from ollama.com. This runs AI models locally on your PC.

2. Clone & Install

git clone https://github.com/YOUR-USERNAME/hiveai-knowledge-refinery.git
cd hiveai-knowledge-refinery
setup.bat

3. Pull AI Models

Open a separate terminal and pull the models Ollama will use:

ollama pull qwen3:14b
ollama pull qwen3:8b

These are the sweet spot defaults (32GB RAM). For 16GB RAM PCs, use smaller models:

ollama pull qwen3:8b
ollama pull qwen3:4b

Then update .env:

OLLAMA_MODEL_REASONING=qwen3:8b
OLLAMA_MODEL_FAST=qwen3:4b

4. Run

run.bat

Open your browser to http://localhost:5001 — that's it! Fully local, no API keys, no cloud dependencies.

Quick Start (Linux / macOS)

Prerequisites

Python 3.11+
Git
Ollama (recommended) or OpenRouter API key

Installation

git clone https://github.com/YOUR-USERNAME/hiveai-knowledge-refinery.git
cd hiveai-knowledge-refinery
chmod +x setup.sh && ./setup.sh

Configuration

cp .env.example .env
nano .env

Run

./run.sh

Open your browser to http://localhost:5001

Local-First Mode (No API Keys — Recommended)

HiveAI is designed to run fully locally on your own hardware. No cloud accounts, no API keys, no subscriptions.

Setup Ollama

Install Ollama
- Windows: Download from ollama.com/download/windows
- macOS: brew install ollama or download from ollama.com
- Linux: curl -fsSL https://ollama.com/install.sh | sh

Pull Models (adjust based on your RAM — see sweet spot table below)

ollama pull qwen3:14b   # Reasoning model — sweet spot (32GB RAM)
ollama pull qwen3:8b    # Fast model

Configure HiveAI (.env)

DATABASE_URL=sqlite:///hiveai.db
LLM_BACKEND=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_REASONING=qwen3:14b
OLLAMA_MODEL_FAST=qwen3:8b

Start HiveAI
```
python -m hiveai.app
```

That's it — fully offline operation with zero API keys, zero cloud dependencies, and all data stays on your machine.

Recommended Models by Hardware (Sweet Spot Guide)

The system uses two model tiers. The reasoning model does the heavy lifting — knowledge extraction (the actual gold mining) and book writing. The fast model handles lighter tasks like outlines, reviews, and summaries.

RAM	Reasoning Model	Fast Model	Quality	Notes
8 GB	`qwen3:4b`	`qwen3:1.7b`	Basic	Works but slower, less accurate extraction
16 GB	`qwen3:8b`	`qwen3:4b`	Good	Solid quality for most topics
32 GB	`qwen3:14b`	`qwen3:8b`	Sweet spot	Best quality-per-compute ratio (default)
64 GB+	`qwen3:32b`	`qwen3:14b`	Diminishing returns	Marginally better, significantly slower

Going beyond 32GB/14b is like panning for gold and getting mostly pebbles — the quality improvement per extra compute drops off sharply.

Extraction Quality

By default, HiveAI uses the reasoning model for knowledge extraction (EXTRACTION_QUALITY=high). This means your strongest model does the actual fact-mining, which is where quality matters most. Set EXTRACTION_QUALITY=standard if you prefer faster processing with the lighter model.

GPU Acceleration

Ollama automatically uses your GPU if available. NVIDIA GPUs with 8GB+ VRAM significantly speed up inference. No extra configuration needed — Ollama handles it automatically.

Cloud Mode (OpenRouter — Optional)

For access to cutting-edge models like Qwen3-30B, Phi-4, Claude, etc. without local GPU:

Setup OpenRouter

Get API Key — Create account and get key at openrouter.ai/keys

Configure HiveAI (.env)

LLM_BACKEND=openrouter
AI_INTEGRATIONS_OPENROUTER_API_KEY=sk-or-v1-your-key-here
AI_INTEGRATIONS_OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

Optional Search APIs

SERPER_API_KEY=your-key-here
BRAVE_API_KEY=your-key-here

LoRA Training

HiveAI trains LoRA adapters that compress coding knowledge into lightweight model extensions. The pipeline supports multiple training versions, merge cycling, and domain-specialized models.

Training in WSL2

LoRA training requires GPU access. On Windows, use WSL2 with an Ubuntu distro:

# Enter WSL2
wsl -d Ubuntu-24.04

# Activate training environment
source /opt/hiveai-env/bin/activate
cd /opt/hiveai/project

# Prepare training data (loads all batch files + JSONL sources, deduplicates, curriculum-orders)
python scripts/prepare_v5_data.py --export --stats

# Train (Qwen2.5-Coder-14B QLoRA via Unsloth)
python scripts/train_v5.py --no-kl --force-unsloth 2>&1 | tee logs/train_v6.log

Post-Training Deployment

After training completes, merge the LoRA adapter and export GGUF for llama-server:

# Deploy: merge LoRA + export GGUF (run in WSL)
python scripts/deploy_v6.py --quant q5_k_m

# Serve with llama-server + speculative decoding (run on Windows)
llama-server --model models/hiveai-v6/*.gguf \
  --model-draft models/qwen2.5-coder-0.5b-instruct-q8_0.gguf \
  --port 11435 --n-gpu-layers 999 --ctx-size 8192 \
  --flash-attn on --cache-type-k q8_0 --cache-type-v q4_0

# Eval
python scripts/run_eval.py --model hiveai-v6 --base-url http://localhost:11435

Continuous Improvement Loop

The one-command improvement cycle — train, deploy, eval, hunt weaknesses, prep next round:

# Full cycle (train → deploy → eval → hunt → prep)
python scripts/improve.py

# Check current loop status
python scripts/improve.py --status

# Skip training (adapter already exists), just deploy + eval + hunt
python scripts/improve.py --skip-train

# Analyze weaknesses from latest eval
python scripts/weakness_hunter.py

# Generate targeted pairs for weak categories
python scripts/weakness_hunter.py --generate --pairs 20

Each cycle identifies where the model is weakest and generates targeted training data to close those gaps. The model's weaknesses drive the next round of training, creating a self-improving feedback loop with no ceiling.

Merge Cycling

Merge LoRA adapters into base weights to compound improvements across cycles:

# Merge adapter into base (Unsloth path for QLoRA)
python scripts/auto_cycle.py --adapter loras/v6 --deploy --eval

# View merge history
python scripts/auto_cycle.py --history

Domain-Specialized Training

Train domain experts from filtered training data:

# See available domains
python scripts/train_domain.py --list

# Train Python specialist
python scripts/train_domain.py --domain python --dry-run

# Deploy domain model to Ollama
python scripts/deploy_domain.py --domain python

MoLoRA Routing

Enable domain-based routing to automatically send queries to specialized models:

# In .env
MOLORA_ENABLED=true
MOLORA_DEFAULT_DOMAIN=general

When enabled, smart_call() classifies each query's domain (Python, Hive, JavaScript, Rust, etc.) and routes it to the matching specialized model. Falls back to the generalist model if a domain model isn't available.

Eval System

Evaluate model quality with 165 coding challenges across 18 categories:

# Eval against Ollama model
python scripts/run_eval.py --model qwen3:14b

# Eval against llama-server (LoRA adapter)
python scripts/run_eval.py --model hiveai-v6 --base-url http://localhost:11435

# Parallel eval (3x faster)
python scripts/run_eval.py --model hiveai-v6 --base-url http://localhost:11435 --workers 3

# Compare with baseline
python scripts/run_eval.py --model hiveai-v6 --compare qwen3:14b

Model Benchmark Harness

Benchmark coding models across 4 task categories with detailed performance metrics:

# Benchmark a single model
python -m bench.run_bench --model qwen2.5-coder:14b-q5_K_M

# Head-to-head comparison
python -m bench.run_bench --model qwen3.5:9b --compare qwen2.5-coder:14b-q5_K_M

# Speculative decoding via llama-server
python -m bench.run_bench --model qwen2.5-coder-14b --backend llama-server --draft qwen2.5-coder-1.5b

# Run specific category only
python -m bench.run_bench --model qwen3.5:9b --category single_shot debug

Categories: single-shot generation (5 tasks), refactoring (2), debugging (3), context retention (2) Metrics: score, pass rate, tokens/sec, time-to-first-token, VRAM peak, speculative decoding acceptance rate

Results are saved as timestamped JSON in bench/results/ for tracking improvements across training cycles.

LoRA Version History

Version	Base Model	Status	Pairs	Eval Score	Notes
v1	Qwen3-14B	Ready	1,104	0.853 (+15%)	First successful training, proven
v1.5	Qwen3-14B	Cancelled	—	—	Superseded by v2
v2	Qwen3.5-35B-A3B	Killed	—	—	Superseded by v3
v3	Qwen3.5-35B-A3B (pruned)	Failed	2,385	—	Gate-expert alignment bug
v4	Qwen3.5-35B-A3B (pruned)	Blocked	2,414	—	MoE-aware ESFT + KL-anchored SFT
v5	Qwen3.5-9B / Qwen3.5-4B	Failed	—	—	VLM architecture incompatible with QLoRA (loss diverged)
v6	Qwen2.5-Coder-14B	Training	5,585	—	QLoRA r=32, seq=4096, loss 0.34-0.57 at step 585/621, 5% val split + early stopping
v7	Qwen2.5-Coder-14B	Prep	6,698	—	v6 data + 995 new pairs, ready for next cycle

Claude Opus Distillation Corpus

The project includes 5,500+ expert-curated training pairs distilled from Claude Opus 4.6, organized across 996+ batch files in scripts/distill_batches/. These break into two categories:

Direct-Answer Pairs (~3,900+ pairs): Cover 80+ domains including Python stdlib, async patterns, TypeScript, React, CSS, DevOps, databases, authentication, AI/ML (GRPO, DPO, RAG, multi-agent systems, speculative decoding, federated learning), system design, compilers, distributed systems, security, Hive blockchain (dhive, beem, custom_json, DeFi, NFTs, governance, node setup, indexing), algorithmic thinking (DP, graphs, trees, greedy, geometry), real-world debugging, code review/refactoring, and an autonomous self-improvement pipeline (p400-p407).

Thinking-Trace Pairs (1,638 pairs): A 4-phase cognitive curriculum teaching step-by-step reasoning via <think> blocks:

Phase 1 — Foundation (434 pairs): Systematic debugging, security analysis, performance optimization, architecture design, code review, testing strategy, concurrency/distributed systems, data modeling/DevOps
Phase 2 — Advanced (401 pairs): Backtracking & dead-end recovery, adversarial self-testing, analogical reasoning, multi-perspective analysis, first-principles reasoning, tradeoff analysis, uncertainty reasoning
Phase 3 — Meta-Cognition (403 pairs): Reasoning quality evaluation, confidence calibration, Socratic method, context switching, learning from mistakes, code quality judgment
Phase 4 — Autonomy (400 pairs): Training data generation, self-evaluation, curriculum design, meta-learning, autonomous improvement, quality assurance, integration

Target training mix: 72% direct-answer + 28% thinking-trace. The data bridge (scripts/prepare_v5_data.py) loads all sources, deduplicates, quality-filters, enforces the mix ratio, orders thinking pairs by curriculum phase (Foundation first, Autonomy last), interleaves them throughout direct-answer pairs, and exports as v5.jsonl ready for train_v5.py. See scripts/distill_batches/THINKING_CURRICULUM.md for the full curriculum plan.

Configuration Reference

Required Settings

Variable	Purpose	Example
`DATABASE_URL`	Database connection (PostgreSQL or SQLite)	`postgresql://user:pass@localhost:5432/hiveai` or `sqlite:///hiveai.db`

LLM Backend

Variable	Options	Default
`LLM_BACKEND`	`auto`, `ollama`, `openrouter`	`auto`
`OLLAMA_BASE_URL`	Local Ollama URL	`http://localhost:11434`
`OLLAMA_MODEL_REASONING`	Reasoning model name	`qwen3:14b`
`OLLAMA_MODEL_FAST`	Fast model name	`qwen3:8b`
`EXTRACTION_QUALITY`	`high` or `standard`	`high`
`AI_INTEGRATIONS_OPENROUTER_API_KEY`	OpenRouter key	(blank)

llama-server (LoRA Adapter Serving)

Variable	Purpose	Default
`LLAMA_SERVER_BASE_URL`	llama-server endpoint	`http://localhost:11435`
`LLAMA_SERVER_MODELS`	Comma-separated model names routed to llama-server	`hiveai-v1,hiveai-v1.5,hiveai-v2,hiveai-v6`
`LLAMA_SERVER_MODEL`	Currently active llama-server model	`hiveai-v6`

MoLoRA (Domain Routing)

Variable	Purpose	Default
`MOLORA_ENABLED`	Enable domain-based model routing	`false`
`MOLORA_DEFAULT_DOMAIN`	Default domain when no match	`general`

Embedding & Semantics

Variable	Purpose	Default
`EMBEDDING_MODEL`	Embedding model	`BAAI/bge-m3`
`SEMANTIC_SIMILARITY_THRESHOLD`	Duplicate detection (0.0-1.0)	`0.82`
`HNSW_EF_SEARCH`	pgvector search accuracy	`100`
`SEMANTIC_CHUNKING`	Embedding-based chunking	`false`

Quality Gates

Variable	Purpose	Default
`MIN_TRAINING_QUALITY`	Minimum quality for training pairs	`0.80`
`LORA_EXPORT_QUALITY`	Minimum quality for LoRA export	`0.75`
`MIN_CODE_BLOCKS`	Minimum code blocks required per pair	`1`
`DEDUP_EXACT_THRESHOLD`	Exact duplicate threshold	`0.95`
`DEDUP_PARAPHRASE_THRESHOLD`	Paraphrase duplicate threshold	`0.85`
`DEDUP_NEAR_THRESHOLD`	Near-duplicate threshold	`0.75`

Chat & Sandbox

Variable	Purpose	Default
`CHAT_VERIFY_CODE`	Execute code blocks before returning to user	`true`

DBC (Decentralized Brain Collective)

Variable	Purpose	Default
`DBC_ENABLED`	Enable on-chain knowledge sharing	`false`
`DBC_ACCOUNT`	Hive account for broadcasting	(blank)
`DBC_POSTING_KEY`	Hive posting key	(blank)
`DBC_MIN_ONCHAIN_QUALITY`	Minimum quality for on-chain pairs	`0.85`
`DBC_EPOCH_TIMEOUT_HOURS`	Voting epoch timeout	`24`
`DBC_RC_FLOOR_PERCENT`	RC floor for hysteresis pause	`20`
`DBC_RC_RESUME_PERCENT`	RC level to resume operations	`50`

Hardware Profile

Variable	Options	Auto-Detection
`HARDWARE_PROFILE`	`auto`, `low`, `medium`, `high`	≥12GB RAM + ≥6 CPU → `high`; ≥6GB RAM + ≥3 CPU → `medium`; else → `low`

Performance Tuning

Variable	Purpose	Low	Medium	High
`CRAWL_WORKERS`	Concurrent crawl threads	2	4	8
`LLM_WORKERS`	Concurrent extraction threads	1	2	4
`EMBEDDING_BATCH_SIZE`	Batch size for embeddings	8	16	32
`MAX_CRAWL_PAGES`	URLs per research job	5	10	20
`DB_POOL_SIZE`	DB connection pool	3	5	8

Search APIs (Optional)

Variable	Purpose	Get Key
`SERPER_API_KEY`	Real web search URL discovery	serper.dev
`BRAVE_API_KEY`	Brave Search (2000/month free)	brave.com/search/api

Caching & Limits

Variable	Purpose	Default
`CRAWL_CACHE_TTL_HOURS`	Cached page expiry	`168` (7 days)
`MAX_RAW_CONTENT_SIZE`	Max chars per page	`100000`
`MAX_CHUNK_TEXT_FOR_LLM`	Max chars per LLM batch	`15000`

See .env.example for complete reference.

Hardware Requirements

Local Desktop (Recommended for Personal Use)

CPU: 4+ cores
RAM: 16 GB (8 GB minimum)
GPU: Optional — NVIDIA with 8GB+ VRAM for faster LLM inference
Storage: 10 GB (+ model weights)
Database: SQLite (automatic, no setup)
LLM: Ollama with qwen3:8b
Profile: medium or high (auto-detected)

LoRA Training (WSL2 on Windows)

GPU: NVIDIA with 16GB+ VRAM (RTX 4070 Ti SUPER or better)
RAM: 32GB+ system RAM
WSL2: Ubuntu 24.04 with CUDA support
Python: 3.11+ with PyTorch, transformers, peft, trl, bitsandbytes
Storage: 50GB+ for model weights and training data

Cloud / Server (Production)

CPU: 2+ vCPU
RAM: 2+ GB
Storage: 50+ GB
Database: PostgreSQL 13+ with pgvector
LLM: OpenRouter API
Profile: medium or high (auto-detected)

Hardware Profile Auto-Detection

Profile	CPU	RAM	Crawl Workers	LLM Workers
`low`	1-2	< 6 GB	2	1
`medium`	3-5	6-12 GB	4	2
`high`	6+	12+ GB	8	4

Override with HARDWARE_PROFILE environment variable.

API Endpoints

Job Management

Method	Endpoint	Purpose
`POST`	`/api/jobs`	Create single research job
`POST`	`/api/jobs/batch`	Batch create multiple research jobs
`GET`	`/api/jobs`	List all jobs (last 50)
`GET`	`/api/jobs/<id>`	Get job details and stats
`GET`	`/api/jobs/<id>/stream`	Stream job progress (chunks, triples, book)
`POST`	`/api/jobs/<id>/cancel`	Cancel a running job

Queue Control

Method	Endpoint	Purpose
`GET`	`/api/queue/status`	Queue status and worker health
`POST`	`/api/queue/pause`	Pause all processing
`POST`	`/api/queue/resume`	Resume all processing

Chat & RAG

Method	Endpoint	Purpose
`POST`	`/api/chat`	Chat with knowledge base (multi-hop RAG)

Knowledge Graph

Method	Endpoint	Purpose
`GET`	`/api/graph`	Global knowledge graph statistics
`POST`	`/api/graph/rebuild`	Rebuild communities and summaries

System Info

Method	Endpoint	Purpose
`GET`	`/api/hardware`	Hardware profile and detected resources
`GET`	`/api/llm-status`	Active LLM backend, Ollama availability
`GET`	`/api/embedding-status`	Current embedding model, mismatch detection
`POST`	`/api/reembed`	Re-embed all content after model switch

Book Publishing

Method	Endpoint	Purpose
`POST`	`/api/books/<id>/publish`	Publish Golden Book to Hive
`POST`	`/api/books/<id>/rewrite`	Force quality rewrite

Architecture

Data Flow

URL Discovery
    ↓ (DuckDuckGo, Brave, SearXNG, LLM)
Mass Web Crawl
    ↓ (Playwright with caching)
Chunk & Clean
    ↓ (Semantic or fast chunking)
Triple Extraction
    ↓ (Chain-of-thought LLM reasoning)
Entity Resolution & Contradiction Detection
    ↓
Knowledge Graph
    ↓
Golden Book Generation
    ↓ (Outline → Write → Review → Rewrite)
Quality Scoring
    ↓
Training Pair Generation
    ↓ (Self-distillation + genetic expansion)
LoRA Training
    ↓ (Merge cycling: train → merge → repeat)
Model Deployment
    ↓ (Ollama / llama-server)
Blockchain Publishing
    ↓ (Hive + IPFS for LoRA weights)
Immutable Record

Components

Component	Technology	Purpose
Web Framework	Flask	REST API and dashboard UI
Database	SQLite or PostgreSQL + pgvector	Triples, embeddings, content storage
LLM	Ollama / llama-server / OpenRouter	Reasoning, extraction, writing, chat
Embeddings	sentence-transformers	Vector search, semantic similarity
Web Scraping	crawl4ai (Playwright)	JavaScript-aware page extraction
Graph Analysis	NetworkX, Louvain	Community detection and summaries
Job Queue	SQLAlchemy + threading	Async pipeline orchestration
LoRA Training	Unsloth, PEFT, transformers, trl	QLoRA fine-tuning with merge cycling
Code Sandbox	subprocess (isolated)	Safe Python/JS execution
Blockchain	Hive RPC (beem)	Immutable publishing + DBC
Domain Router	MoLoRA	Keyword-based query→model routing (Ollama + llama-server)

Tech Stack

Layer	Technologies
Language	Python 3.11+
Web Framework	Flask
Database	SQLite (local) or PostgreSQL 13+ with pgvector (production)
LLM Backends	Ollama, llama-server (LoRA), OpenRouter (Claude, Qwen3, Phi-4)
Embeddings	sentence-transformers (BAAI/bge-m3, Qwen3-Embedding)
Web Scraping	crawl4ai (Playwright)
Graph Computing	NetworkX, Louvain (community detection)
Chunking	semchunk (fast) or embedding-based (precise)
Structured LLM	instructor (JSON schema validation)
Vector Search	pgvector HNSW (PostgreSQL) or numpy cosine (SQLite)
LoRA Training	PEFT, transformers, trl, bitsandbytes (WSL2 + CUDA)
Job Queue	SQLAlchemy ORM + threading
Blockchain	Hive RPC nodes (beem, direct RPC, lighthive)
Model Serving	Ollama, llama.cpp (llama-server)

File Structure

hiveai-knowledge-refinery/
├── hiveai/
│   ├── app.py                 # Flask web server + dashboard
│   ├── config.py              # Configuration, hardware detection, quality gates
│   ├── hardware.py            # Hardware profile auto-detection
│   ├── models.py              # SQLAlchemy ORM (TrainingPair, LoraVersion, ChatFeedback, etc.)
│   ├── chat.py                # Multi-hop RAG chat pipeline with auto-learn
│   ├── sandbox.py             # Secure code execution sandbox (Python + JS)
│   ├── vectorstore.py         # Vector search (pgvector / numpy cosine)
│   ├── llm/
│   │   ├── client.py          # LLM routing (Ollama/llama-server/OpenRouter), smart_call(), MoLoRA
│   │   └── prompts.py         # System prompts, Golden Book templates, CODING_SYSTEM_PROMPT
│   ├── lora/
│   │   ├── distiller.py       # 16 prompt templates, scorer v5, self-refinement, genetic expansion
│   │   ├── trainer.py         # LoRA training wrapper (r=32 standard / r=8 micro, DoRA)
│   │   ├── adapter_manager.py # Runtime multi-LoRA hot-swap
│   │   ├── merge_cycle.py     # Merge cycling orchestrator (train → merge → repeat)
│   │   ├── molora.py          # MoLoRA domain router (7 domains, keyword scoring)
│   │   ├── brain_export.py    # IPFS pinning + Hive LoRA publication
│   │   ├── exporter.py        # Golden Books → JSONL training pairs
│   │   ├── dedup.py           # Embedding-based deduplication gate
│   │   ├── benchmark.py       # Held-out evaluation harness
│   │   └── miner.py           # Multi-source training pair miner
│   ├── dbc/
│   │   ├── chain.py           # Hive blockchain abstraction, pair encoding, protocol logic
│   │   ├── node.py            # DBC node daemon with MockChain for testing
│   │   └── hivepoa.py         # IPFS/GitHub adapter storage client
│   ├── pipeline/
│   │   ├── url_discovery.py   # Source discovery (DuckDuckGo, Brave, SearXNG)
│   │   ├── crawler.py         # Web crawling with caching
│   │   ├── cleaner.py         # Text cleaning and normalization
│   │   ├── reasoner.py        # Chain-of-thought triple extraction
│   │   ├── entity_resolver.py # Alias normalization and deduplication
│   │   ├── contradiction.py   # Contradiction detection
│   │   ├── communities.py     # Louvain clustering and summaries
│   │   ├── writer.py          # Golden Book generation pipeline
│   │   ├── scorer.py          # Quality scoring (v5)
│   │   ├── publisher.py       # Hive blockchain publishing
│   │   ├── queue_worker.py    # Async job orchestration
│   │   └── orchestrator.py    # High-level pipeline coordinator
│   ├── storage/
│   │   └── graph_store.py     # Knowledge graph persistence
│   ├── hive_templates/        # Ready-to-use Hive app scaffolding
│   │   ├── social_app/        # Blog/social app (JS + Python)
│   │   ├── voting_bot/        # Automated curation bot
│   │   ├── token_app/         # Hive Engine token operations
│   │   ├── haf_indexer/       # HAF blockchain indexer
│   │   └── keychain_auth/     # Hive Keychain authentication
│   └── templates/             # Jinja2 HTML templates
│       ├── dashboard.html     # System dashboard with merge cycle stats
│       ├── chat.html          # Chat UI with domain badges
│       ├── forge.html         # Training management
│       ├── eval.html          # Eval results viewer
│       ├── graph_explorer.html
│       ├── job_detail.html
│       └── book_review.html
├── scripts/
│   ├── improve.py             # One-command improvement loop (train→deploy→eval→hunt→prep)
│   ├── weakness_hunter.py     # Eval-driven weakness analysis + targeted pair generation
│   ├── run_eval.py            # 165-challenge eval harness (18 categories, 4 dimensions, parallel --workers)
│   ├── train_v5.py            # QLoRA training (Qwen2.5-Coder-14B via Unsloth)
│   ├── deploy_v6.py           # Post-training deployment (Unsloth merge → GGUF → llama-server)
│   ├── deploy_v5.py           # Legacy deployment (PEFT merge → GGUF → Ollama)
│   ├── auto_cycle.py          # Merge cycling CLI (supports both Unsloth and PEFT backends)
│   ├── validate_training.py   # Training monitor (log parsing + checkpoint generation test)
│   ├── prepare_v5_data.py     # Training data bridge (batch files + JSONL → curriculum-ordered JSONL)
│   ├── calibrate_eval.py      # Eval anchor calibration
│   ├── claude_distill.py      # Claude distillation pipeline (v1)
│   ├── claude_distill_v2.py   # Claude Opus distillation v2 (batch loader for all pairs)
│   ├── distill_batches/       # 996+ batch files: 3,900+ direct-answer + 1,638 thinking-trace pairs
│   ├── fix_broken_batches.py  # Batch file repair utility (triple-quote conflict resolution)
│   ├── distill_multilang.py   # Multi-language distillation support
│   ├── mine_hive_knowledge.py # Hive-specific pair mining
│   ├── distill_supervisor.py  # Continuous distillation manager
│   └── setup_check.py         # First-run environment validator
├── bench/
│   ├── run_bench.py           # Model benchmark CLI (compare models, track regressions)
│   ├── runner.py              # Ollama + llama-server backends, VRAM monitoring, spec decode
│   ├── tasks.py               # 12 tasks across 4 categories with automatic evaluators
│   └── results/               # Timestamped JSON benchmark results
├── evals/
│   └── anchors/               # 18 domain-specific eval anchor sets
├── loras/
│   ├── training_data/         # JSONL training datasets (v6: 5,585 pairs, v7: 6,698 pairs)
│   │   └── weakness_patches/  # Auto-generated targeted pairs from weakness_hunter.py
│   ├── v1/                    # LoRA v1 adapter (Qwen3-14B) — proven, eval 0.853
│   ├── v6/                    # LoRA v6 adapter (Qwen2.5-Coder-14B) — training
│   └── domains/               # Domain-specialized adapters (MoLoRA)
├── models/                    # Local model weights
├── Dockerfile                 # Multi-stage container build
├── docker-compose.yml         # Full stack (app + Ollama + GPU)
├── BLUEPRINT.md               # LoRA pipeline architecture plan
├── .env.example               # Configuration template
├── pyproject.toml             # Python project metadata
├── requirements.txt           # Python dependencies
├── LICENSE                    # MIT License
└── README.md                  # This file

Usage Examples

Create a Research Job

curl -X POST http://localhost:5001/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"topic": "Quantum Machine Learning 2025"}'

Batch Create Jobs

curl -X POST http://localhost:5001/api/jobs/batch \
  -H "Content-Type: application/json" \
  -d '{
    "topics": [
      "Artificial General Intelligence",
      "Quantum Computing",
      "Biological Neural Networks"
    ]
  }'

Monitor Queue

curl http://localhost:5001/api/queue/status

Chat with Knowledge Base

curl -X POST http://localhost:5001/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What are the latest advances in quantum computing?",
    "history": []
  }'

Check System Status

# Hardware profile
curl http://localhost:5001/api/hardware

# LLM backend
curl http://localhost:5001/api/llm-status

# Embedding model
curl http://localhost:5001/api/embedding-status

Deployment

Windows Desktop

run.bat

Linux / macOS

./run.sh

Production with Gunicorn (Linux)

export PRODUCTION=1
export WEB_WORKERS=4
python -m hiveai.app

Docker

docker-compose up -d

Database Setup

HiveAI supports both PostgreSQL and SQLite:

SQLite (Recommended for Desktop/Local)

SQLite is ideal for local development and desktop deployments with no server setup needed:

# Simply set the DATABASE_URL environment variable
export DATABASE_URL=sqlite:///hiveai.db
python -m hiveai.app

The database file (hiveai.db) is created automatically in the current directory.

PostgreSQL (Recommended for Production)

For production deployments, use PostgreSQL with pgvector extension:

# Create database
createdb hiveai

# Enable pgvector
psql -d hiveai -c "CREATE EXTENSION IF NOT EXISTS vector;"

Set DATABASE_URL to your PostgreSQL connection string:

Local: postgresql://user:pass@localhost:5432/hiveai
Neon: postgresql://user:pass@ep-xxx.us-east-2.aws.neon.tech/hiveai?sslmode=require
Supabase: postgresql://postgres:pass@db.xxx.supabase.co:5432/postgres

License

MIT License — see LICENSE file for details.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

Setting up development environment
Code style and conventions
Testing and validation
Submitting pull requests
Community standards

Roadmap

Phase 1 — Information Compression (complete)

The refinery: raw web pages → extracted triples → knowledge graph → Golden Books → Hive blockchain.

Web crawl → triple extraction → knowledge graph
Golden Book generation with quality scoring
Hive blockchain publishing with multipart splitting
Multi-hop RAG chat ("Ask the Keeper")
Secure code sandbox with chat self-verification

Phase 2 — Skill Compression (current)

The trainer: accumulated knowledge → LoRA adapters that encode the ability to produce knowledge.

Phase 3 — On-Chain Skill Storage (in progress)

Decentralized Brain Collective: knowledge and skills stored on Hive, reconstructible by anyone.

DBC protocol (on-chain pair proposals, HP-weighted voting, epoch consensus)
3-backend chain abstraction (beem → direct RPC → lighthive)
Pair encoding (gzip + base64), secrets scanner, RC hysteresis
HivePoA adapter storage (IPFS + GitHub Releases fallback)
Live deployment and testing on Hive testnet
Multi-node federation and peer discovery

Support

Documentation — docs/
Issues — GitHub Issues
Chat — Use the built-in "Ask the Keeper" chat interface
Config Help — Run python -m hiveai.app --help for options

Acknowledgments

Built with:

crawl4ai — Web scraping
sentence-transformers — Embeddings
instructor — Structured LLM outputs
NetworkX — Graph analysis
PEFT — Parameter-efficient fine-tuning
Hive Blockchain — Decentralized publishing
Ollama — Local LLM serving
llama.cpp — GGUF model serving

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
.github/workflows		.github/workflows
baselines		baselines
bench		bench
configs		configs
data		data
desktop		desktop
docs		docs
evals		evals
evidence_campaign		evidence_campaign
hiveai		hiveai
logs		logs
loras		loras
scripts		scripts
skills		skills
tests		tests
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
3-gems-plan.md		3-gems-plan.md
BLUEPRINT.md		BLUEPRINT.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
GEMS_BLUEPRINT.md		GEMS_BLUEPRINT.md
HIVE_AI_DBC_PLAN.md		HIVE_AI_DBC_PLAN.md
LESSONS_V3.md		LESSONS_V3.md
LICENSE		LICENSE
README.md		README.md
UPGRADE_QWEN35.md		UPGRADE_QWEN35.md
clean_git.sh		clean_git.sh
docker-compose.spiritbomb.yml		docker-compose.spiritbomb.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
score_ledger.json		score_ledger.json
setup.bat		setup.bat
setup.sh		setup.sh
test_confidence_calibrator.py		test_confidence_calibrator.py
test_critique_memory.py		test_critique_memory.py
test_telemetry_burnin.py		test_telemetry_burnin.py

Folders and files

Latest commit

History

Repository files navigation

HiveAI Knowledge Refinery

What is it?

Features

Research Pipeline

Knowledge Graph

Golden Books

AI Chat ("Ask the Keeper")

LoRA Training Pipeline

Blockchain Publishing

Decentralized Brain Collective (DBC)

Local-First Operation

Secure Code Sandbox

Job Queue & Scaling

Spirit Bomb — Community GPU Cloud (21 modules)

Quick Start (Windows)

1. Install Prerequisites

2. Clone & Install

3. Pull AI Models

4. Run

Quick Start (Linux / macOS)

Prerequisites

Installation

Configuration

Run

Local-First Mode (No API Keys — Recommended)

Setup Ollama

Recommended Models by Hardware (Sweet Spot Guide)

Extraction Quality

GPU Acceleration

Cloud Mode (OpenRouter — Optional)

Setup OpenRouter

LoRA Training

Training in WSL2

Post-Training Deployment

Continuous Improvement Loop

Merge Cycling

Domain-Specialized Training

MoLoRA Routing

Eval System

Model Benchmark Harness

LoRA Version History

Claude Opus Distillation Corpus

Configuration Reference

Required Settings

LLM Backend

llama-server (LoRA Adapter Serving)

MoLoRA (Domain Routing)

Embedding & Semantics

Quality Gates

Chat & Sandbox

DBC (Decentralized Brain Collective)

Hardware Profile

Performance Tuning

Search APIs (Optional)

Caching & Limits

Hardware Requirements

Local Desktop (Recommended for Personal Use)

LoRA Training (WSL2 on Windows)

Cloud / Server (Production)

Hardware Profile Auto-Detection

API Endpoints

Job Management

Queue Control

Chat & RAG

Knowledge Graph

System Info

Book Publishing

Architecture

Data Flow

Components

Tech Stack

File Structure

Usage Examples

Create a Research Job

Batch Create Jobs

Monitor Queue

Packages