Skip to content

benmagnifico/SmartFed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmartFed: Don't Reinvent the Wheel, Just Realign the Spokes

Resource-Efficient Federated Fine-Tuning via Rank-Wise Expert Assembly

Paper Code Dataset LoRA Modules License


🔥 News

  • [2026/05] SmartFed is accepted to ICML 2026 as a Spotlight paper. 🎉
  • [Coming Soon] Source code, processed datasets, and pre-trained LoRA modules will be publicly released. Please ⭐ star and watch this repo for updates.

📖 Overview

SmartFed is a resource-efficient federated fine-tuning framework that adapts Large Language Models (LLMs) to downstream tasks by reusing existing LoRA modules instead of training from scratch. The open-source community has accumulated a wealth of task-specific LoRA modules — SmartFed asks a simple question: why manufacture new spokes from raw materials when the wheel already exists? Just realign the spokes you already have.

Instead of optimizing billions of LoRA parameters across hundreds of communication rounds, edge devices in SmartFed only train a lightweight router that dynamically composes knowledge from a frozen pool of rank-wise experts, drastically reducing computation, communication, and energy cost.

🔑 Highlights

  • 🧩 Mixture of Rank-Wise Experts (MoRE) — decomposes each LoRA module along the rank dimension, turning monolithic modules into a pool of fine-grained, lightweight experts that the router can selectively activate based on input semantics.
  • ⚖️ Elastic Expert Quota Allocation (EEQA) — adaptively redistributes the expert budget across parameter matrices according to their measured contribution, concentrating capacity where it matters most.
  • 📡 From training to composing — only a tiny router (< 0.1% of LLM parameters) is synchronized across clients; the LoRA experts remain frozen and cached locally.
  • 🛡️ Theoretical guarantees — provable noise suppression, algebraic elimination of destructive cross-task interference, and a near-optimality bound for EEQA (proofs in the paper appendix).
  • 🚀 Strong empirical gains — up to +10.21% average accuracy, 3.95× faster convergence, 31.47× lower communication overhead, and 3.61× less energy consumption versus state-of-the-art baselines.

🏗️ Method

SmartFed reformulates federated fine-tuning as a two-stage compose-and-route process.

1. Rank-Wise Decomposition. A standard LoRA update is rewritten as a sum of rank-one projections, and each rank-one component is treated as an independent expert:

$$\Delta \mathbf{W}\mathbf{x} = \sum_{i=1}^{r} \mathbf{B}_{:,i}\big(\mathbf{A}_{i,:}\mathbf{x}\big), \qquad E_i(\mathbf{x}) = (\mathbf{B}_{:,i}\mathbf{A}_{i,:})\mathbf{x}.$$

2. Sparse Routing. A trainable router computes top-$K$ gating weights and aggregates only the selected experts:

$$\mathbf{h}' = \mathbf{W}_0 \mathbf{x} + \sum_{m=1}^{M} \tilde{g}_m \cdot E_m(\mathbf{x}).$$

3. Adaptive Quota Allocation. EEQA reallocates the expert budget across parameter matrices each round based on per-matrix importance, using a two-phase (proportional + greedy residual) procedure.

📊 Main Results

SmartFed consistently outperforms both knowledge-free (train-from-scratch) and knowledge-reuse baselines across three skill-composition tasks and three LLM backbones. See Table 1 of the paper for the full comparison.

Efficiency. Compared with knowledge-free federated baselines, SmartFed slashes wall-clock training time, communication payload, and energy footprint by large margins:

Data Efficiency. With only 10% of the training data, SmartFed already surpasses FedIT trained on the full dataset:

📁 Repository Structure

SmartFed/
├── assets/                          # Figures and illustrations
├── federated_learning/              # Federated training loop and aggregation
│   ├── fed_trainer.py               #   FedAvg-style trainer for the router
│   ├── client.py                    #   Per-device local update logic
│   └── aggregator.py                #   Server-side aggregation of router + importance scores
├── smartfed/                        # Core algorithmic components
│   ├── more.py                      #   Mixture of Rank-Wise Experts (decomposition + routing)
│   ├── router.py                    #   Lightweight, input-conditioned router
│   ├── eeqa.py                      #   Elastic Expert Quota Allocation strategy
│   └── importance.py                #   Per-expert importance scoring (Eq. 4 in the paper)
├── lora_pool/                       # Utilities for loading and managing reusable LoRA modules
│   ├── registry.py                  #   Index of task-specific LoRA modules
│   └── loaders.py                   #   Loading / caching from LoRAHub-style repositories
├── training_scripts/                # Shell entry points for federated runs
│   ├── run_smartfed_cn_math.sh      #   Chinese + Math skill composition
│   ├── run_smartfed_cn_code.sh      #   Chinese + Code skill composition
│   └── run_smartfed_math_code.sh    #   Math  + Code skill composition (GSM-Hard)
├── evaluation/                      # Evaluation harnesses
│   ├── mgsm/                        #   Chinese mathematical reasoning (MGSM)
│   ├── doit/                        #   Chinese code generation (DoIT, Pass@1)
│   └── gsm_hard/                    #   Hard math-word problems (execution accuracy)
├── utils/                           # Shared utilities
│   ├── template.py                  #   Chat / instruction templates
│   ├── process_dataset.py           #   Dataset registration and partitioning
│   └── carbon.py                    #   CodeCarbon hooks for energy reporting
├── main_smartfed.py                 # Top-level federated training entry point
├── config.py                        # Centralized CLI / config schema
├── requirements.txt                 # Python dependencies
├── setup.sh                         # Environment setup script
├── LICENSE
└── README.md

🚀 Quick Start

🚧 The training and evaluation code is not yet open-sourced. The commands below will be runnable once the official release lands. We are actively cleaning up the code and preparing the release.

1. Clone and install:

git clone https://github.com/<org>/SmartFed.git
cd SmartFed
conda create -n smartfed python=3.10 -y
conda activate smartfed
pip install -r requirements.txt
source setup.sh

2. Launch a federated SmartFed run:

python main_smartfed.py \
  --model_name_or_path "meta-llama/Llama-2-7b-hf" \
  --lora_pool "cn_chat,en_math" \
  --task "cn_math" \
  --num_clients 20 \
  --sample_clients 2 \
  --num_rounds 20 \
  --local_steps 10 \
  --batch_size 16 \
  --learning_rate 5e-4 \
  --topk_experts 32 \
  --enable_eeqa \
  --output_dir "./output/smartfed_cn_math"

Key arguments:

  • --lora_pool — comma-separated list of task-specific LoRA modules to compose (resolved via lora_pool/registry.py).
  • --topk_experts — per-matrix expert budget $K$ used by MoRE.
  • --enable_eeqa — toggle Elastic Expert Quota Allocation.

🏋️ Training

Three skill-composition tasks are supported out of the box. Each has a dedicated launcher under training_scripts/:

Task LoRA Modules Composed Training Data Eval Set Metric
Chinese Mathematical Reasoning Chinese Chat + English Math Math23K MGSM Accuracy
Chinese Code Generation Chinese Chat + English Code DoIT (expanded to 20K) DoIT Pass@1
Hard Math-Word Problems English Math + English Code MathCodeInstruct GSM-Hard Execution Accuracy

Federated protocol (defaults). 20 clients in total, 10% sampled per round, 10 local steps per client, 20 communication rounds, AdamW with lr = 5e-4, batch size = 16. Each base LoRA module is rank 32, alpha 64, injected into the Query and Value attention matrices.

📏 Evaluation

Evaluation harnesses for the three benchmarks live under evaluation/. Each subdirectory will ship with its own README and run script.

  • MGSM — multilingual chain-of-thought math reasoning, Chinese split.
  • DoIT — Chinese code generation, Pass@1.
  • GSM-Hard — execution accuracy on Program-Aided Language Model (PAL) outputs.

📦 Datasets & LoRA Modules

SmartFed reuses three publicly available task-specific LoRA modules:

  • Chinese Chat LoRA — trained on the 52K Okapi Chinese instruction set.
  • English Math LoRA — trained on the 395K MetaMath corpus.
  • English Code LoRA — trained on the 186K Magicoder OSS-Instruct corpus.

📂 To facilitate reproduction, we plan to release (i) the three task-specific LoRA modules used in our experiments, (ii) the federated-partitioned training splits, and (iii) the expanded DoIT 20K subset. These resources will be uploaded to Hugging Face alongside the code release.

🗓️ Release Plan

We are actively preparing the open-source release. The intended timeline:

  • 🧹 Code cleanup and refactor for public release
  • 📦 Release of the three reusable LoRA modules
  • 🗂️ Release of federated-partitioned datasets (Math23K, expanded DoIT, MathCodeInstruct)
  • 🧪 Release of evaluation scripts for MGSM, DoIT, and GSM-Hard

If you would like to be notified when the code drops, please ⭐ star or 👀 watch this repository.

🤝 Acknowledgements

This project builds on the shoulders of many excellent open-source efforts, including LoRA, LoRAHub, PEFT, OpenFedLLM, and the broader Hugging Face ecosystem. We thank the authors and maintainers of these libraries for making collaborative research possible.

📝 Citation

If you find SmartFed useful in your research, please consider citing our paper:

@inproceedings{wu2025smartfed,
  title         = {Don't Reinvent the Wheel, Just Realign the Spokes: 
                   Resource-Efficient Federated Fine-Tuning via Rank-Wise Expert Assembly}, 
  author        = {Yebo Wu and Jingguang Li and Zhijiang Guo and Li Li},
  year          = {2025},
  eprint        = {2512.00902},
  archivePrefix = {arXiv},
  primaryClass  = {cs.DC},
  url           = {https://arxiv.org/abs/2512.00902}, 
}

About

[ICML 2026 Spotlight] SmartFed is a resource-efficient framework that circumvents expensive training from scratch by intelligently reusing knowledge embedded in existing LoRA modules.

Resources

License

Stars

Watchers

Forks

Contributors