GENEB — Genomic Embedding Benchmark

GENEB is a multi-task benchmark for DNA sequence encoders: 100 classification tasks in 13 functional categories, evaluated with a linear probe on precomputed embeddings under full-, 10-shot, and 1-shot regimes (reported metrics: MCC, accuracy, macro-F1).

Paper: https://arxiv.org/abs/2606.04525
Source code: https://github.com/darlednik/geneb
Leaderboard: https://huggingface.co/spaces/darlednik/geneb-leaderboard
Task data: https://huggingface.co/datasets/darlednik/geneb-tasks
Submit a model: CONTRIBUTING.md

Repository overview

This repository contains three coordinated parts:

Evaluation harness (harness/) — reference code to compute embeddings, train the protocol-defined logistic-regression probe, and emit a submission file.
Benchmark definition (benchmark/) — task list, category map, probe settings, and model registry metadata.
Leaderboard artifacts (leaderboard/) — static site and JSON tables built from reviewed submissions (regenerated by CI; not edited manually).

Baseline and contributed results are stored as submissions/<model_id>.json. To add a new model, contributors provide a small extractor module that loads their encoder and produces sequence embeddings; the repository stores this code and the resulting metrics, but not third-party model weights.

Repository layout

benchmark/
  benchmark_spec.json       # task list, categories, metrics, probe protocol, dataset pin
  model_meta.json           # display names, parameter counts, links, provenance labels
submissions/
  <model_id>.json           # per-model results (reviewed before merge)
leaderboard/
  index.html                # Hugging Face Space UI
  leaderboard.json          # macro scores by functional category (generated)
  leaderboard_tasks.json    # per-task scores (generated)
  README.md                 # Space metadata
tools/
  validate_submission.py    # schema and completeness checks (CI)
  build_leaderboard.py      # aggregate submissions into leaderboard JSON
  pack_submission.py        # convert legacy per-task JSON logs to submission format
  sync_geneb_dataset.py     # download or publish pinned task CSVs on Hugging Face
model_cards/
  <model_id>.md             # optional training-data and disclosure notes
harness/
  run_GENEB.py              # end-to-end local evaluation → submission JSON
  extractors/
    base.py                 # extractor interface
    <module>.py             # model-specific embedding module (per submission)
.github/workflows/
  validate.yml              # PR checks on submissions and spec
  build-and-sync.yml        # rebuild leaderboard and push to the Space (on merge to main)

From model to leaderboard row

To add a model, a contributor:

implements an embedding extractor under harness/extractors/;
runs harness/run_GENEB.py locally on the pinned GENEB task data;
opens a pull request with:
- submissions/<model_id>.json,
- the extractor module,
- an optional model card.

Maintainers and CI then:

validate the submission schema and completeness;
review the extractor and metadata for plausibility;
merge the pull request;
rebuild leaderboard/*.json;
sync the updated leaderboard to the Hugging Face Space.

The benchmark definition lives in benchmark/benchmark_spec.json: it specifies the task list, task categories, metrics, probe settings, random seeds, and the dataset revision used for evaluation. Files under leaderboard/ are generated from reviewed submissions and should not be edited manually.

Evaluation and reproducibility

GENEB does not provide a central re-scoring service. Contributors run evaluation locally using the published harness and the dataset revision specified in benchmark_spec.json. Maintainers review pull requests for schema compliance, completeness, and plausibility, but do not re-run every model on maintainer infrastructure.

Mechanism	Role
Dataset revision in `benchmark_spec.json`	Fixes the exact benchmark data used for evaluation
Fixed probe protocol	Keeps downstream evaluation comparable across models
Submission schema + CI validation	Checks completeness, metric ranges, and file consistency
Extractor code in the PR	Shows how embeddings were produced and helps others reproduce the run
Model card	Documents architecture, training data, and possible benchmark overlap
Provenance metadata	Marks externally submitted runs as self-reported

Community submissions are marked as self-reported in metadata. A run is considered reproducible when the submitted extractor, the GENEB harness version, and the pinned dataset revision are sufficient for another user to repeat the evaluation.

Citation

@misc{ledneva2026genebgenomicmodelshard,
  title         = {GENEB: Why Genomic Models Are Hard to Compare},
  author        = {Daria Ledneva and Mikhail Nuridinov and Denis Kuznetsov},
  year          = {2026},
  eprint        = {2606.04525},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2606.04525}
}

Contact

Repository and leaderboard contact: Daria Ledneva.

Questions, suggestions, feedback, and model submissions are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENEB — Genomic Embedding Benchmark

Repository overview

Repository layout

From model to leaderboard row

Evaluation and reproducibility

Citation

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
benchmark		benchmark
harness		harness
leaderboard		leaderboard
model_cards		model_cards
submissions		submissions
tools		tools
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

GENEB — Genomic Embedding Benchmark

Repository overview

Repository layout

From model to leaderboard row

Evaluation and reproducibility

Citation

Contact

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages