Distributed AI Ensemble (Local Multi-Agent LLM Network)

A fully local distributed AI research system with orchestration, ensemble inference, reproducible benchmarking, statistical testing, visual analytics, and publication-ready artifacts.

Overview

This repository implements a complete local distributed AI network for research.

The workflow is simple:

A user query goes to one orchestrator node.
The orchestrator sends that query to multiple local model agents.
Each agent runs a different LLM and returns answer + latency + token metadata.
The orchestrator aggregates the responses using switchable ensemble strategies.
All metrics are logged and exported into publication-ready outputs.

This project is designed for reproducibility and IEEE-style reporting.

Research Video (Graphical Walkthrough)

For a visual explanation of the full research pipeline, watch:

https://youtu.be/Zybs0Omg600

Click the image to open the full video on YouTube.

Authors

Md Anisur Rahman Chowdhury¹*, Kefei Wang¹
¹ Dept. of Computer and Information Science, Gannon University, USA
Emails: engr.aanis@gmail.com, wang039@gannon.edu

What We Built and Why

1) Distributed inference cluster

Why: test multi-model collaboration instead of single-model output.
Built: one orchestrator VM + four model-agent VMs on a private LAN.

2) Modular ensemble strategies

Why: compare aggregation algorithms under the same benchmark pipeline.
Built: Majority, Weighted, ISP, Topic Routing, Debate.

3) Reproducible benchmark framework

Why: produce repeatable, statistically valid evaluation.
Built: MMLU, GSM8K, TruthfulQA runners with deterministic controls.

4) Publication artifact generation

Why: directly support conference submission workflow.
Built: CSV/JSON exports, PNG plots, IEEE LaTeX tables, IEEE paper source, Overleaf package, and PDF.

5) GitHub-hosted visual interface

Why: make results easy to explore and present.
Built: docs/ web site + interactive dashboard with charts.

System Architecture

VM mapping

Orchestrator: 172.16.185.223
Agent 1: 172.16.185.209 (llama3.2:3b)
Agent 2: 172.16.185.218 (qwen2.5:3b)
Agent 3: 172.16.185.220 (phi3:mini)
Agent 4: 172.16.185.222 (gemma2:2b)

Logical flow (ASCII)

User Query
   |
   v
Orchestrator (FastAPI)
   |---> Agent-1 (Ollama)
   |---> Agent-2 (Ollama)
   |---> Agent-3 (Ollama)
   |---> Agent-4 (Ollama)
   v
Aggregation + Metrics + Statistical Testing + Artifact Export

Interactive Architecture (Mermaid)

flowchart TD
    U[User Query] --> O[Orchestrator FastAPI]
    O --> A1[Agent 1 llama3.2:3b]
    O --> A2[Agent 2 qwen2.5:3b]
    O --> A3[Agent 3 phi3:mini]
    O --> A4[Agent 4 gemma2:2b]
    A1 --> G[Aggregation + Metrics + Statistics]
    A2 --> G
    A3 --> G
    A4 --> G
    G --> R[Final Response + Artifacts]

Interactive public version:

https://anis151993.github.io/Distributed-AI/#architecture

Orchestrator endpoints

GET /health
GET /agents
POST /query

Agent Models

Each agent exposes Ollama REST API on port 11434 and returns:

response
latency_ms
token_count
model_id

Configured model set in this run:

llama3.2:3b
qwen2.5:3b
phi3:mini
gemma2:2b

You can swap to other models (CPU/GPU permitting) through agents/agent_config.yaml.

Aggregation Strategies

Implemented and switchable at runtime:

majority
weighted
isp
topic
debate

All strategies are modular in orchestrator code:

orchestrator/aggregator.py
orchestrator/router.py
orchestrator/debate.py

Runtime Optimizations Added

To reduce bottlenecks in CPU-friendly settings:

Shared direct-strategy fan-out (single agent call reused by majority/weighted/ISP/topic)
Debate early-stop when round-1 agreement is complete
Conservative token budgets and timeout tuning for local hardware
Better answer normalization for benchmark parsing stability

Evaluation and Statistics

Benchmarks

MMLU
GSM8K
TruthfulQA

Metrics

Accuracy
F1
Mean latency
Latency std
Agreement rate
CPU / GPU usage

Statistical tests

Paired t-test
Wilcoxon signed-rank
Confidence intervals

Generated Outputs

Outputs are generated as:

CSV
JSON
PNG plots
IEEE LaTeX tables
Protected manuscript package (.tar.gpg)
Decryption guide (.txt)

Primary artifact folders:

artifacts/benchmark_runs/run_20260226_193331/
artifacts/optimization_runs/

Key files:

overall_summary.csv
overall_summary.json
overall_significance.csv
paper/tables/aggregate_strategy_results.tex
paper/tables/per_benchmark_results.tex
paper/tables/significance_highlights.tex
paper/IEEE_Distributed_AI_Ensemble_Protected.tar.gpg
paper/PAPER_ACCESS_INSTRUCTIONS.txt

Visual Results

Web Display (GitHub Pages)

Web UI files:

docs/index.html (public research portal)
docs/interactive_dashboard.html (interactive charts)
docs/styles.css
docs/script.js
docs/assets/data/report_data.js (embedded chart data)
docs/assets/paper/IEEE_Distributed_AI_Ensemble_Protected.tar.gpg (encrypted paper package)
docs/assets/paper/PAPER_ACCESS_INSTRUCTIONS.txt (public decryption guide)
docs/assets/results/*.csv|*.json (public downloadable benchmark outputs)

Public portal features:

Interactive architecture map (click nodes + flow simulation)
Research video section with embedded YouTube walkthrough
Live AI playground (/query API caller with strategy controls)
Individual model selection (choose any subset of agents per query)
Per-query visual analytics (per-model latency/tokens and answer distribution charts)
Password-gated protected paper package access
Direct links to project repository and GitHub profile

Enable on GitHub:

Open repository Settings.
Go to Pages.
Source: main branch, folder /docs.
Save.

Local preview:

cd docs
python3 -m http.server 8080
# open http://localhost:8080

Public URL:

https://anis151993.github.io/Distributed-AI/

Public playground connection (required once)

If you want browser visitors to run live queries from GitHub Pages, set CORS on the orchestrator:

export CORS_ALLOW_ORIGINS="https://anis151993.github.io,http://localhost:8080,http://127.0.0.1:8080"
uvicorn orchestrator.main:app --host 0.0.0.0 --port 8000

Then, in the website playground, set your public orchestrator endpoint (for example, your tunnel URL) and run a query.

Quick Start

1) Health check

curl -s http://127.0.0.1:8000/health | jq .
curl -s http://127.0.0.1:8000/agents | jq .

2) Test one query

curl -s -X POST http://127.0.0.1:8000/query \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "What is 2+2? Return only the final numeric answer.",
    "strategy": "majority",
    "seed": 42,
    "temperature": 0.0,
    "deterministic": true,
    "max_tokens": 24
  }' | jq .

3) Smoke benchmark

python run_experiments.py \
  --orchestrator-url http://127.0.0.1:8000 \
  --benchmarks mmlu,gsm8k,truthfulqa \
  --strategies majority,weighted,isp,topic,debate \
  --repetitions 1 \
  --samples-per-benchmark 2 \
  --seed 42 \
  --deterministic \
  --max-agents 2

4) Full benchmark

python run_experiments.py \
  --orchestrator-url http://127.0.0.1:8000 \
  --benchmarks mmlu,gsm8k,truthfulqa \
  --strategies majority,weighted,isp,topic,debate \
  --repetitions 5 \
  --samples-per-benchmark 20 \
  --seed 42 \
  --deterministic

Live Validation Passed (February 27, 2026)

Validated from the public GitHub Pages portal and public orchestrator endpoint.

Public site: https://anis151993.github.io/Distributed-AI/
Public orchestrator endpoint: https://ai.marcbd.site
CORS status: verified for https://anis151993.github.io
Agent health: 4/4 healthy
Playground majority query: passed
Returned answer: 4
Query timestamp (UTC): 2026-02-27T06:47:49.561498+00:00

Evidence snapshot (from public Playground response):

{
  "strategy": "majority",
  "aggregate": {
    "answer": "4",
    "agreement_rate": 1
  },
  "agent_count": 4,
  "total_latency_ms": 33847.143
}

Reproducibility Controls

Fixed random seed support (--seed)
Deterministic mode (--deterministic)
Configurable temperature
Fixed benchmark repetitions
Explicit independent/dependent variables in the paper

Scale to N Agents

Add agent entries in agents/agent_config.yaml.
Ensure each agent endpoint is reachable from orchestrator.
Increase worker/fan-out limits in orchestrator config.
Re-run health check (/agents) and benchmark runner.

No algorithm rewrite is needed; strategies consume dynamic agent lists.

Security Best Practices (Local Deployment)

Keep agents on private LAN only.
Restrict inbound firewall rules to required ports.
Avoid exposing Ollama ports directly to the public internet.
Use reverse proxy auth/TLS if remote access is required.
Rotate tokens/credentials and avoid committing secrets.
Keep VM packages updated.

Repository Structure

Distributed-AI/
├── orchestrator/
├── agents/
├── benchmarks/
├── deploy/
├── scripts/
├── artifacts/
│   ├── benchmark_runs/
│   └── optimization_runs/
├── docs/
├── paper/
│   ├── IEEE_Distributed_AI_Ensemble_Protected.tar.gpg
│   ├── PAPER_ACCESS_INSTRUCTIONS.txt
│   ├── references.bib
│   ├── figures/
│   ├── tables/
│   └── overleaf/
├── run_experiments.py
└── docker-compose.yml

Citation

@article{chowdhury2026distributedai,
  title={A Local Distributed Multi-Agent LLM Ensemble: Architecture, Optimization, and Reproducible Evaluation},
  author={Chowdhury, Md Anisur Rahman and Wang, Kefei},
  year={2026},
  institution={Gannon University}
}

Authors

Md Anisur Rahman Chowdhury¹*, Kefei Wang¹
¹ Dept. of Computer and Information Science, Gannon University, USA
Emails: engr.aanis@gmail.com, wang039@gannon.edu

Copyright

Copyright (c) 2026 Md Anisur Rahman Chowdhury and Kefei Wang
Emails: engr.aanis@gmail.com, wang039@gannon.edu
Affiliation: Dept. of Computer and Information Science, Gannon University, USA
All rights reserved.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
agents		agents
artifacts		artifacts
benchmarks		benchmarks
deploy		deploy
docs		docs
logs		logs
orchestrator		orchestrator
paper		paper
results		results
scripts		scripts
.gitignore		.gitignore
COPYRIGHT		COPYRIGHT
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
index.html		index.html
requirements.txt		requirements.txt
run_experiments.py		run_experiments.py

Folders and files

Latest commit

History

Repository files navigation

Distributed AI Ensemble (Local Multi-Agent LLM Network)

Overview

Research Video (Graphical Walkthrough)

Authors

What We Built and Why

1) Distributed inference cluster

2) Modular ensemble strategies

3) Reproducible benchmark framework

4) Publication artifact generation

5) GitHub-hosted visual interface

System Architecture

VM mapping

Logical flow (ASCII)

Interactive Architecture (Mermaid)

Orchestrator endpoints

Agent Models

Aggregation Strategies

Runtime Optimizations Added

Evaluation and Statistics

Benchmarks

Metrics

Statistical tests

Generated Outputs

Visual Results

Web Display (GitHub Pages)

Public playground connection (required once)

Quick Start

1) Health check

2) Test one query

3) Smoke benchmark

4) Full benchmark

Live Validation Passed (February 27, 2026)

Reproducibility Controls

Scale to N Agents

Security Best Practices (Local Deployment)

Repository Structure

Citation

Authors

Copyright

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages