RYS

RYS is a small reproducibility repo for relayering experiments on decoder LLMs.

The core idea is simple: duplicate part of a model's existing layer path without changing any weights. A standard single-block configuration is written as (i, j):

run layers 0 .. j-1
then jump back and run layers i .. N-1
so layers i .. j-1 are traversed twice

Baseline is (0,0), which means no duplication.

This repo contains the pieces needed to reproduce the main experimental workflows:

scanner for full (i, j) sweeps
fixed Math and EQ probe sets
multi-block beam search
XGBoost surrogate pipeline
model exporter for writing relayered Hugging Face checkpoints
heatmap and balanced Math+EQ analysis code

It does not include the private historical runs, blog drafts, ad hoc notebooks, or dataset generation/calibration code.

Repo Contents

datasets/
- math_16.json
- math_120.json
- eq_16.json
- eq_140.json
- manifest.json
src/core/
- config parsing
- layer-list expansion
- relayer wrappers for dense and MoE-style stacks
src/workers/
- Math and EQ benchmark workers
- queue handling
- model loading helpers
src/utils/
- balanced Math+EQ analysis
- heatmap helpers
- surrogate utilities
scripts/
- sweep setup
- ExLlama workers
- beam search
- surrogate pipeline
- repeat-sweep helpers
hf_export/
- checkpoint export
- HF upload helper
- Colab notebook

Probe Sets

This repo ships the fixed benchmark subsets used by the public workflow:

datasets/math_16.json
datasets/math_120.json
datasets/eq_16.json
datasets/eq_140.json

Important notes:

eq_16 and eq_140 are first-pass-only EQ subsets.
datasets/manifest.json records provenance and checksums.
The file named eq_140.json currently contains 139 records; this is documented in the manifest and preserved for continuity with the original naming.

Setup

Python uses uv:

uv sync

For ExLlama scanning you also need:

a local exllamav3 checkout
an EXL3-compatible model directory
CUDA-capable hardware if you want real scan throughput

Set:

export EXLLAMAV3_PATH=/path/to/exllamav3

Quick Start

1. Create a full `(i, j)` sweep queue

Example for a 64-layer model:

uv run python scripts/init_queue.py \
  --num-layers 64 \
  --queue-file results/demo/queue.json \
  --results-file results/demo/combined_results.pkl

This writes canonical layer-list configs, including the baseline (0,0).

2. Run the ExLlama combined scanner

This is the main fast scan path in the public repo. It loads the EXL3 model once and scores Math and EQ in one mixed pass per config.

uv run python scripts/run_exllama_math_eq_combined_worker.py \
  --queue-file results/demo/queue.json \
  --combined-results-file results/demo/combined_results.pkl \
  --math-results-file results/demo/math_results.pkl \
  --eq-results-file results/demo/eq_results.pkl \
  --model-dir /path/to/model.exl3 \
  --math-dataset-path datasets/math_16.json \
  --eq-dataset-path datasets/eq_16.json \
  --math-max-new 64 \
  --eq-max-new 64 \
  --auto-cache

3. Analyze and render heatmaps

uv run python scripts/analyze_results.py \
  --math-scores results/demo/math_results.pkl \
  --eq-scores results/demo/eq_results.pkl \
  --out-dir results/demo/analysis \
  --num-layers 64

This produces:

top-ranked balanced configs
scatter plots
balanced Math+EQ heatmap artifacts

Containerized ExLlama Run

If you want a simple Docker entrypoint:

MODEL_DIR=/path/to/model.exl3 \
EXLLAMAV3_PATH=/path/to/exllamav3 \
./scripts/run_exllama_docker.sh

Useful overrides:

QUEUE_FILE
COMBINED_RESULTS_FILE
MATH_RESULTS_FILE
EQ_RESULTS_FILE
MATH_DATASET
EQ_DATASET
DEVICE
RESERVE_PER_DEVICE
USE_PER_DEVICE

Beam Search

Beam search composes multiple repeated blocks and benchmarks only unseen configs.

Example:

uv run python scripts/beam_search.py \
  --model-path /path/to/hf-model \
  --num-layers 64 \
  --seed-math-results results/demo/math_results.pkl \
  --seed-eq-results results/demo/eq_results.pkl \
  --math-dataset-path datasets/math_16.json \
  --eq-dataset-path datasets/eq_16.json \
  --work-dir results/demo/beam-search

Notes:

beam search uses the Hugging Face worker path in src/workers/
seed pickles should come from an already measured single-block scan
state under --work-dir is resume-friendly

Surrogate Pipeline

The surrogate uses per-layer repeat counts as features. Predicted scores are only for ranking candidates; the final benchmark scores must be measured.

Train

uv run python scripts/train_surrogate.py \
  --single-block-math-results results/demo/math_results.pkl \
  --single-block-eq-results results/demo/eq_results.pkl \
  --beam-math-results results/demo/beam-search/beam_math_results.pkl \
  --beam-eq-results results/demo/beam-search/beam_eq_results.pkl \
  --out-dir results/demo/surrogate \
  --num-layers 64

Generate candidates

uv run python scripts/generate_candidates.py \
  --num-layers 64 \
  --max-extra-layers 12 \
  --count 2000000 \
  --output-csv results/demo/surrogate/candidates.csv

Score candidates and build a top-k config file

uv run python scripts/score_candidates.py \
  --model-dir results/demo/surrogate \
  --candidates-file results/demo/surrogate/candidates.csv \
  --output-csv results/demo/surrogate/top_scored.csv \
  --top-k 100

uv run python scripts/build_topk_config.py \
  --top-candidates-csv results/demo/surrogate/top_scored.csv \
  --num-layers 64 \
  --output-config results/demo/surrogate/top100.config

Then benchmark those configs with the same Math/EQ harness used elsewhere.

Model Export

hf_export writes a relayered Hugging Face checkpoint with the duplicated layers physically materialized into the safetensor shards.

Single block:

uv run python -m hf_export.export_model \
  --source /path/to/base-model \
  --source-repo-id some/model \
  --output exports/model-block-30-34 \
  --blocks "30,34"

Multi-block:

uv run python -m hf_export.export_model \
  --source /path/to/base-model \
  --source-repo-id some/model \
  --output exports/model-31_34__43_45 \
  --blocks "31,34;43,45"

Upload:

export HF_TOKEN=...

uv run python -m hf_export.upload_to_hf \
  --folder exports/model-block-30-34 \
  --repo-id your-name/model-block-30-34

The export manifest is written to rys_export_manifest.json.

A minimal Colab notebook is available at:

hf_export/colab/export_upload_minimal.ipynb

Heatmaps

The reusable plotting helpers live in src/utils/heatmaps.py.

For standard (i, j) scans, the normal entrypoint is:

uv run python scripts/analyze_results.py \
  --math-scores results/demo/math_results.pkl \
  --eq-scores results/demo/eq_results.pkl \
  --out-dir results/demo/analysis \
  --num-layers 64

For per-layer repeat sweeps:

uv run python scripts/plot_repeat_heatmaps.py \
  --results-file results/demo/repeatx8_math_results.pkl \
  --manifest-file results/demo/repeatx8_manifest.json \
  --out-dir results/demo/repeatx8_heatmaps

Current Assumptions and Limits

The public scan path is ExLlama-first.
Beam search uses the Hugging Face worker path rather than ExLlama.
This repo assumes decoder-layer architectures; unsupported architectures should fail explicitly.
The HF exporter currently detects decoder stacks under:
- model.language_model.layers.
- model.layers.
- language_model.layers.
Dataset generation and recalibration are intentionally out of scope here.

Suggested Reproduction Path

If you are starting from scratch, do this in order:

run a small (i, j) scan with math_16 + eq_16
inspect the heatmaps and top balanced configs
run beam search seeded from the single-block scan
train the surrogate on measured configs and benchmark its top candidates
rerun the strongest candidates on math_120 + eq_140
export any final variants you want to share

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
docker		docker
hf_export		hf_export
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RYS

Repo Contents

Probe Sets

Setup

Quick Start

1. Create a full `(i, j)` sweep queue

2. Run the ExLlama combined scanner

3. Analyze and render heatmaps

Containerized ExLlama Run

Beam Search

Surrogate Pipeline

Train

Generate candidates

Score candidates and build a top-k config file

Model Export

Heatmaps

Current Assumptions and Limits

Suggested Reproduction Path

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

RYS

Repo Contents

Probe Sets

Setup

Quick Start

1. Create a full (i, j) sweep queue

2. Run the ExLlama combined scanner

3. Analyze and render heatmaps

Containerized ExLlama Run

Beam Search

Surrogate Pipeline

Train

Generate candidates

Score candidates and build a top-k config file

Model Export

Heatmaps

Current Assumptions and Limits

Suggested Reproduction Path

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

1. Create a full `(i, j)` sweep queue

Packages