FLocKit

A FLock.io project.

Production-Grade Federated Learning Toolkit for the FLock Protocol

FLocKit is a modular, template-based toolkit for building federated learning systems on the FLock network.
Pick a template, point it at your data, and deploy as a FLock node quickly.

Getting Started · Templates · Architecture · Deployment · Extend

Highlights

	Feature	Description
🔌	Plug-and-Play Templates	Self-contained implementations for LLM, ASR, Time Series, and Graph tasks — ready to train
⚙️	Configuration-Driven	All tuning knobs live in YAML. Zero code changes between experiments
🍎	Multi-Backend	PyTorch (CUDA / CPU) and Apple MLX — auto-detected at runtime
🔗	FLock Native	Built-in Flask server speaks the FLock protocol out of the box
📦	Shared Infrastructure	FedAvg aggregation, serialisation, optimiser factory, device management — write once, use everywhere
📋	Structured Logging	`loguru`-based logging with `FLOCKIT_LOG_*` env-driven sinks (level / file / rotation / retention) and a `log_duration` helper for timed FL round events
🐳	Docker-Ready	Single `Dockerfile` for convenient containerised deployment

How It Works

_{Each client trains locally, exchanges model updates through the FLock protocol, and earns incentives.}

┌─────────────────────────────────────────────────────────────────┐
│                     FLock Network (Blockchain)                  │
│                                                                 │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐                 │
│   │ Client A │    │ Client B │    │ Client C │   ...            │
│   └────┬─────┘    └────┬─────┘    └────┬─────┘                 │
│        │               │               │                        │
│   ┌────▼───────────────▼───────────────▼────┐                  │
│   │          FlockSDK  (Flask HTTP)         │                  │
│   │   POST /call  { train | evaluate |      │                  │
│   │              aggregate | get_model_info }│                  │
│   └────────────────────┬────────────────────┘                  │
│                        │                                        │
│   ┌────────────────────▼────────────────────┐                  │
│   │       FlockModel  (Abstract Interface)  │                  │
│   │  init_dataset · train · evaluate ·      │                  │
│   │  aggregate · get_model_info             │                  │
│   └────────────────────┬────────────────────┘                  │
│                        │                                        │
│   ┌────────────────────▼────────────────────┐                  │
│   │         Template Implementation         │                  │
│   │  (LLM / ASR / TimeSeries / Graph)       │                  │
│   └─────────────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────────────┘

🚀 Getting Started

Prerequisites

Python 3.11+
CUDA-capable GPU (recommended) or Apple Silicon Mac
HuggingFace account (for model/dataset access)

1. Install

git clone https://github.com/FLock-io/FLocKit.git
cd FLocKit

# Option A: using uv (recommended)
uv sync

# Option B: using pip
pip install -r requirements.txt

2. Choose a Template & Run

# LLM Fine-tuning (LoRA / QLoRA)
python main.py --conf templates/llm_finetuning/configs/qwen2_5_7B_instruct_aya_zsm.yaml

# Automatic Speech Recognition (Whisper)
python main.py --conf templates/automatic_speech_recognition/configs/asr_whisper_small_sarawakmalay_fedavg.yaml

# Time Series Prediction (Glucose LSTM)
python main.py --conf templates/time_series_prediction/glucose_prediction/configs/glucose_prediction.yaml

# Graph Node Classification (FedGCN)
python main.py --conf templates/graph_node_classification/fedgcn/configs/fedgcn.yaml

3. Override Any Parameter from CLI

python main.py \
    --conf templates/llm_finetuning/configs/qwen2_5_7B_finetune.yaml \
    common_args.run_mode=simulation \
    train_args.proposer_num_epochs=5 \
    train_args.proposer_learning_rate=2e-5

📦 Templates

FLocKit ships with four production-ready templates:

LLM Fine-tuning

templates/llm_finetuning/

Fine-tune large language models with LoRA / QLoRA in a federated setting. Supports any HuggingFace causal LM.

Capability	Details
Models	LLaMA 3, Qwen2.5, Mixtral, TinyLlama, and any HuggingFace causal LM
Methods	LoRA, QLoRA (4-bit / 8-bit quantisation via `bitsandbytes`)
Backends	PyTorch (CUDA) and MLX (Apple Silicon — auto-detected)
Data	HuggingFace datasets, local JSONL, instruction-tuning formats
Prompters	Alpaca, ChatML, Vicuna, and custom prompt templates
Configs	6 ready-to-use YAML configurations

Automatic Speech Recognition

templates/automatic_speech_recognition/

Federated fine-tuning of OpenAI Whisper models with WER/CER evaluation.

Capability	Details
Models	Whisper (small, medium, large) via HuggingFace
Methods	Full fine-tuning or decoder-only, with optional PEFT/LoRA
Evaluation	Word Error Rate (WER) and Character Error Rate (CER)
Data	HuggingFace audio datasets, Sarawak Malay speech data
Configs	3 ready-to-use YAML configurations (FedAvg, conservative, decoder-only)

Time Series Prediction

templates/time_series_prediction/glucose_prediction/

LSTM-based glucose prediction for healthcare federated learning.

Capability	Details
Models	`GlucoseLSTM`, `GlucoseLSTMAttention` (with multi-head attention)
Methods	Full model training with LR scheduling and gradient clipping
Evaluation	RMSE on sliding-window glucose prediction
Data	OhioT1DM and compatible glucose monitoring datasets
Configs	1 ready-to-use YAML configuration

Graph Node Classification

templates/graph_node_classification/fedgcn/

Federated Graph Convolutional Networks for node classification tasks.

Capability	Details
Models	`SimpleGCN` (multi-layer graph convolution)
Methods	FedGCN — federated training on partitioned graph data
Evaluation	Accuracy on node classification
Data	Cora, CiteSeer, and custom graph datasets (PKL / NPZ / edge-list)
Configs	1 ready-to-use YAML configuration

🏗 Architecture

FLocKit/
│
├── main.py                          # Entry point — parse config → init → serve
├── bootstrap.py                     # Template initialisation orchestrator
├── arguments.py                     # CLI + YAML argument loader (OmegaConf)
├── constants.py                     # Project-wide constants & template registry keys
├── flockit_logger.py                # Centralised logging (loguru wrapper)
│
├── flock_sdk/                       # ─── SDK Layer (stable API) ───
│   ├── flock_model.py               #   FlockModel abstract base class
│   └── flock_sdk.py                 #   Flask HTTP server for FLock protocol
│
├── templates/                       # ─── Template Layer ───
│   ├── llm_finetuning/              #   LLM fine-tuning (LoRA / QLoRA / MLX)
│   ├── automatic_speech_recognition/#   Whisper ASR
│   ├── time_series_prediction/      #   Glucose LSTM prediction
│   ├── graph_node_classification/   #   FedGCN node classification
│   └── utils/                       #   Shared utilities
│       ├── aggregation.py           #     FedAvg aggregation (PyTorch + NumPy)
│       ├── serialization.py         #     State dict (de)serialisation
│       ├── optimizer_factory.py     #     Adam / AdamW / SGD factory
│       ├── device.py                #     Hardware detection & device mapping
│       ├── client_splits.py         #     Per-client dataset path / index helpers
│       ├── file_operations.py       #     tar.gz compression & extraction
│       ├── output_manager.py        #     Centralised output / checkpoint paths
│       ├── s3_storage_manager.py    #     AWS S3 integration
│       └── nami_cloud_manager.py    #     NAMI Cloud (S3-compatible) integration
│
├── local_tests/                     # Local FL simulation test scripts
├── scripts/                         # Data preparation & deployment utilities
└── Dockerfile                       # Production container image

Layer Model

Layer	Modules	Responsibility
Entry	`main.py`, `bootstrap.py`, `arguments.py`	Parse config, seed RNGs, resolve template, start server
SDK	`flock_sdk/`	Stable interface — `FlockModel` ABC + Flask HTTP bridge
Templates	`templates/<task>/`	Self-contained FL implementations per ML task
Shared Utils	`templates/utils/`	Task-agnostic infra: aggregation, serialisation, optimisers, device, output paths, client splits, file ops, S3 / NAMI cloud

⚙️ Configuration

Every operational parameter is defined in YAML — no magic defaults buried in code.

# Example: templates/llm_finetuning/configs/qwen2_5_7B_finetune.yaml

common_args:
  template_name: "llm_finetuning"
  project_name: "FLockLLM_finetune"
  run_mode: "simulation"               # "simulation" | "production"
  random_seed: 42

model_args:
  foundation_model_name: "Qwen/Qwen2.5-7B-Instruct"
  foundation_model_source: "huggingface"
  finetune_adapter: "qlora"            # "lora" | "qlora"
  lora_r: 16
  lora_alpha: 32

data_args:
  data_source: "huggingface"           # "huggingface" | "local"
  huggingface_dataset_name: "malaysia-ai/filtered-aya-dataset-zsm"

train_args:
  proposer_num_epochs: 1
  proposer_learning_rate: 0.0001
  proposer_train_batch_size: 16
  federated_optimizer_name: "fedavg"

evaluation_args:
  voter_val_set_size: 5

🐳 Deployment

Docker

# Build
docker build -t flockit .

# Run with GPU support
docker run --gpus all -p 5000:5000 \
    -e HF_TOKEN=hf_your_token \
    -e FLOCKIT_CONF=templates/llm_finetuning/configs/qwen2_5_7B_finetune.yaml \
    flockit

Production note: the default main.py server uses Flask's built-in runtime. For internet-facing production deployments, run behind a process manager / reverse proxy.

FLock Network

Once the SDK server is running, it exposes a single HTTP endpoint that the FLock protocol coordinator calls:

POST /call
Content-Type: application/json

{
  "method": "train" | "evaluate" | "aggregate" | "get_model_info",
  "parameters": "<base64-encoded model weights>"
}

The FlockSDK handles serialisation, error recovery, and loss validation automatically.

Pure Local / Offline Mode (FL-Alliance-Client)

When deploying to a fully local environment (no internet, air-gapped or LAN-only), you use FL-Alliance-Client to run the local chain and clients. A key requirement is MODEL_DEFINITION_HASH, which is defined and used in the FL-Alliance-Client repo.

What is MODEL_DEFINITION_HASH?
It is the SHA-256 hash of your model archive (model.tar.gz). The FL-Alliance-Client deployer writes this hash into the FlockTask contract; clients then load the model from data/shared/models/{hash}.

How to obtain it from FLocKit:

Adapt the task by setting FLOCKIT_CONF to your desired config:

# Example: pick a different template config at runtime
docker run --gpus all -p 5000:5000 \
  -e FLOCKIT_CONF=templates/time_series_prediction/glucose_prediction/configs/glucose_prediction.yaml \
  flockit

Package the entire FLocKit (not just a single model) into model.tar.gz:

The archive root should contain main.py, templates/, and the rest of the project files directly. Do not wrap them under an extra FLocKit/ directory in the archive.

Output must go outside the source directory to avoid "Can't add archive to itself":
```
# From the FLocKit repo root — output to parent directory
cd FLocKit/
tar -czf ../model.tar.gz .
# model.tar.gz is created in the parent directory
```
Dev mode (S3): Use python scripts/build_and_upload_s3.py --storage s3 — it always packages the full FLocKit repository root (archive root starts at main.py, not FLocKit/main.py) and uploads to S3, then prints the hash for make chain. You can run it from any working directory; use --source-dir only if you need to override the source path explicitly.

Compute the SHA-256 hash (from parent dir, since model.tar.gz is there):

cd ..
# Linux
sha256sum model.tar.gz
# macOS
shasum -a 256 model.tar.gz
# Example output: 6c13c6659f3865ef176cc9cc695c99cd901c78b1bf150f6bb6cd7b4703fa1489  model.tar.gz

Use the hash in FL-Alliance-Client when starting the local chain:

# In FL-Alliance-Client repo
make chain MODEL_DEFINITION_HASH=6c13c6659f3865ef176cc9cc695c99cd901c78b1bf150f6bb6cd7b4703fa1489

For offline mode, place the archive in shared storage before starting clients:

mkdir -p data/shared/models
cp model.tar.gz data/shared/models/<hash>   # filename = hash, no extension

For full details (Dev vs Offline mode, shared storage setup, LAN deployment), see the FL-Alliance-Client Local Chain Simulation documentation.

🧪 Local Testing

Simulate federated learning locally before deploying to the FLock network:

# Run any template with 3 clients for 3 rounds
python local_tests/test_llm.py
python local_tests/test_asr.py
python local_tests/test_time_series.py
python local_tests/test_gnn.py

# Custom configuration
python local_tests/run_local_test.py \
    --config templates/llm_finetuning/configs/qwen2_5_7B_finetune.yaml \
    --num_clients 5 \
    --num_rounds 10

🔧 Creating a New Template

Extend FLocKit with your own ML task in five steps:

1. Scaffold the directory

templates/my_task/
├── __init__.py
├── setup.py                 # Template init function
├── flock_model_my_task.py   # FlockModel implementation
└── configs/
    └── default.yaml

2. Implement the FlockModel interface

from flock_sdk import FlockModel

class FLockModelMyTask(FlockModel):
    def init_dataset(self, dataset_path: str) -> None:
        """Load and prepare local data."""
        ...

    def train(self, parameters: bytes | None, *, comm_round_idx: int | None = None) -> bytes:
        """Run one round of local training. Return updated weights."""
        ...

    def evaluate(self, parameters: bytes | None) -> float:
        """Evaluate global parameters locally. Return loss."""
        ...

    def aggregate(self, parameters_list: list[bytes], *, comm_round_idx: int | None = None) -> bytes:
        """Merge client updates (e.g. FedAvg). Return merged weights."""
        ...

3. Create the setup function

# templates/my_task/setup.py
def my_task_init(args):
    from .flock_model_my_task import FLockModelMyTask
    return FLockModelMyTask(args, verbose=True)

4. Register the template

# templates/__init__.py
TEMPLATE_REGISTRY["my_task"] = "templates.my_task.setup.my_task_init"

5. Use shared utilities — don't reinvent the wheel:

from templates.utils.aggregation import fedavg_aggregate_torch
from templates.utils.serialization import serialize_torch_state, deserialize_torch_state
from templates.utils.optimizer_factory import create_optimizer
from templates.utils.device import resolve_torch_device

📁 Data Preparation

Pre-built scripts for partitioning datasets across federated clients:

# LLM — partition Aya ZSM dataset into 8 IID clients
python scripts/prepare_llm_aya_zsm_client_ids.py --num_clients 8 --mode iid --out_dir ./data_splits/aya

# ASR — partition Sarawak Malay speech data
python scripts/prepare_asr_sarawakmalay_whisper_format_client_ids.py --num_clients 4 --out_dir ./data_splits/asr

# LLM — partition Malay dialect instruction data
python scripts/prepare_llm_malay_dialect_sarawak_client_ids.py --num_clients 4 --mode iid --out_dir ./data_splits/sarawak

License

FLocKit is licensed under the Apache License 2.0. See LICENSE for the full text.

_{Built with care by the FLock.io team.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLocKit

Highlights

How It Works

🚀 Getting Started

Prerequisites

1. Install

2. Choose a Template & Run

3. Override Any Parameter from CLI

📦 Templates

LLM Fine-tuning

Automatic Speech Recognition

Time Series Prediction

Graph Node Classification

🏗 Architecture

Layer Model

⚙️ Configuration

🐳 Deployment

Docker

FLock Network

Pure Local / Offline Mode (FL-Alliance-Client)

🧪 Local Testing

🔧 Creating a New Template

📁 Data Preparation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
data		data
flock_sdk		flock_sdk
local_tests		local_tests
scripts		scripts
templates		templates
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
bootstrap.py		bootstrap.py
constants.py		constants.py
flockit_logger.py		flockit_logger.py
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FLocKit

Highlights

How It Works

🚀 Getting Started

Prerequisites

1. Install

2. Choose a Template & Run

3. Override Any Parameter from CLI

📦 Templates

LLM Fine-tuning

Automatic Speech Recognition

Time Series Prediction

Graph Node Classification

🏗 Architecture

Layer Model

⚙️ Configuration

🐳 Deployment

Docker

FLock Network

Pure Local / Offline Mode (FL-Alliance-Client)

🧪 Local Testing

🔧 Creating a New Template

📁 Data Preparation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages