MOSAIC

Public code release for the paper "MOSAIC: Multi-Objective Slice-Aware Iterative Curation for Alignment."

MOSAIC is short for Multi-Objective Slice-Aware Iterative Curation for Alignment.

This repository contains the core evaluation, data-mixture search, and fine-tuning utilities used to study closed-loop supervised fine-tuning under a fixed token budget. The public release is intentionally trimmed down for GitHub: large datasets, generated results, virtual environments, and internal-only sync tooling are not included.

What is in this repository

A unified L1-L3 evaluation interface for XGuard, OrBench, and IFEval
Slice-aware closed-loop data mixture search with an agent-based proposal step
LoRA fine-tuning utilities and vLLM inference helpers
Minimal configs needed to reproduce the MOSAIC workflow once the datasets are prepared

What is intentionally excluded

Raw or processed datasets
Experiment outputs under results/
Local virtual environments
Archived exploratory scripts and notebooks
Internal cloud storage / synchronization tooling

Quickstart

1. Create the environment

uv sync

To run local fine-tuning or managed vLLM inference, install the extra training dependencies:

uv sync --extra finetune

2. Configure environment variables

Copy .env.example to .env and fill in the keys you actually use.

At minimum, the evaluation and proposal-agent pipeline expects:

DEEPSEEK_API_KEY for L3 extraction and the proposal agent
MODEL_PATH when using local model inference or local fine-tuning

3. Prepare the dataset layout

Create the following directories locally:

dataset/
  train/
    xguard-train.parquet
    orbench.parquet
    tulu-3.parquet
  eval/
    xteaming.jsonl
    orbench.jsonl
    tulu-3.jsonl

The repository expects preprocessed training pools and evaluation sets to follow these names unless you modify the scripts.

4. Pre-score the training pools

uv run python -m score.xguard_eval
uv run python -m score.orbench_eval
uv run python -m score.ifeval_eval

5. Merge the evaluation signals back into parquet files

uv run python scripts/merge_eval_to_parquet.py

6. Run the MOSAIC search loop

uv run python main.py --max-iters 5 --T-max 1000000 --quality-threshold 4.0

main.py is a thin wrapper around scripts/run_loop.py.

Repository layout

mosaic/
|- configs/                     # Small benchmark-specific configs
|- instruction_following_eval/  # IFEval support code
|- packages/finetune/           # LoRA fine-tuning utilities
|- score/                       # Benchmark scoring and aggregation
|- scripts/                     # Closed-loop orchestration scripts
|- .env.example                 # Minimal environment variable template
|- main.py                      # Entry point for the MOSAIC loop
|- pyproject.toml               # uv project metadata
`- schema.py                    # Shared record schema

Recommended workflow

Prepare the three training pools and three evaluation sets.
Run the benchmark-specific evaluators to extract L3 annotations and aggregate scores.
Merge the resulting slice/score/need columns back into the training parquet files.
Launch the closed-loop search with a fixed token budget.
Inspect the per-iteration results/run_*/ outputs, Pareto archive, and agent traces.

Notes on reproducibility

The public repository does not ship the datasets used in the paper.
The release keeps the original experimental code structure, but trims large artifacts and outdated internal utilities.
Some scripts still assume the benchmark file names listed above. If your local files differ, update the path constants in the relevant script before running.

Citation

If you use this repository, cite the MOSAIC paper:

@article{dou2026mosaic,
  title   = {MOSAIC: Multi-Objective Slice-Aware Iterative Curation for Alignment},
  author  = {Dou, Yipu and Yang, Wang},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOSAIC

What is in this repository

What is intentionally excluded

Quickstart

1. Create the environment

2. Configure environment variables

3. Prepare the dataset layout

4. Pre-score the training pools

5. Merge the evaluation signals back into parquet files

6. Run the MOSAIC search loop

Repository layout

Recommended workflow

Notes on reproducibility

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs/ifeval_constraints		configs/ifeval_constraints
instruction_following_eval		instruction_following_eval
packages/finetune		packages/finetune
score		score
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
schema.py		schema.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

MOSAIC

What is in this repository

What is intentionally excluded

Quickstart

1. Create the environment

2. Configure environment variables

3. Prepare the dataset layout

4. Pre-score the training pools

5. Merge the evaluation signals back into parquet files

6. Run the MOSAIC search loop

Repository layout

Recommended workflow

Notes on reproducibility

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages