Add Nadir router (verifier-gated cascade + cost-min baseline) by doramirdor · Pull Request #112 · RouteWorks/RouterArena

doramirdor · 2026-05-27T18:56:35Z

Summary

Submits nadir-cascade-v2, a verifier-gated 2-tier cascade router from Nadir.
On the main split (n=10,018, 8,400 prompts + 1,618 optimality entries), the
official scorer reports:

Metric	Value
arena_F	0.7358
Accuracy	0.7518
Cost / 1k queries	$0.2986

Algorithm

Per prompt:

Classifier (wide_deep_asym_v3, a 3-class softmax + confidence head trained on RouterBench) emits a (tier, confidence) pair.
We collapse to 2 tiers: simple -> cheap, complex -> strong, medium -> cheap if conf >= 0.65 else strong.
Pick the cheapest cached model in the assigned tier with a successful cached response.
Score the cheap-tier response with our verifier (DeBERTa-v3-small, INT8 quantized, CPU; verifier.score(prompt, cheap_answer, None) matching our production cascade).
If verifier_score < 0.80, escalate to the strong tier and pick its cheapest cached model. Otherwise ship the cheap response.

Verifier threshold tau=0.80 matches the production deployment at api.getnadir.com.

Pool

Cheap (4 models): gpt-4o-mini, qwen/qwen3-235b-a22b-2507, deepseek/deepseek-v3.2, claude-3-haiku-20240307

Strong (5 models, cheapest-first): openai/gpt-5-mini, deepseek/deepseek-reasoner, deepseek/deepseek-v4-flash, grok-4-1-fast-reasoning, anthropic/claude-sonnet-4

Live escalation source: anthropic/claude-sonnet-4-5 via AWS Bedrock for 426 prompts (5.1%) where the verifier rejected the cheap response and no strong-tier cached model existed. Total live spend $0.89.

Methodology disclosures

Cached replay format. Classifier and verifier outputs are replayed from a precomputed snapshot, matching RouterArena's cached-replay format used by all submissions on the leaderboard.
Classifier reproducibility (available today). The wide_deep_asym_v3 classifier weights are open source under MIT in NadirClaw at https://github.com/NadirRouter/NadirClaw. The full N-tier cascade architecture (PR [Feat.] Add batch evaluation pipeline with parallel processing #60 in that repo) reproduces the routing logic used here.
Trained verifier reproducibility. The verifier that produced the verifier_score column in this submission is a fine-tuned DeBERTa-v3-small (INT8 quantized variant ~440MB, FP32 ~570MB; the v3 SentencePiece vocab is large). NadirClaw v0.19.0 has shipped the code path (pip install nadirclaw[trained]; NADIRCLAW_TIERS_PROFILE=n2_trained). The weights are released under MIT at huggingface.co/nadirclaw/cascade-verifier-v1, so any reviewer can reproduce 0.7358 end-to-end. The training pipeline and adaptive retraining loop remain proprietary; only the frozen weights are released.
Reproducibility error bar. Running the classifier + verifier from scratch may yield arena_F within roughly +/-0.005 of 0.7358 due to: (a) different cheap-tier picks based on the reproducer's local cache state, (b) verifier scoring against actual cheap responses vs the snapshot's proxy.
Live escalation on 426 prompts (5.1%). Real Sonnet 4.5 responses via AWS Bedrock. Cost computed as actual_tokens * pricing (matches model_cost.json to the cent). Labeled anthropic/claude-sonnet-4-5 in predictions.
Blind routing. Routing decisions read only the prompt text, classifier tier, classifier confidence, and verifier score. No decision reads evaluation_result.score. Verified by independent audit.
Full cascade cost. Per-prompt cost includes cheap-tier inference + verifier surcharge ($0.00005/hop) + strong-tier inference when escalation fires.
ModelNameManager addition. Adds one mapping entry: "anthropic/claude-sonnet-4-5": "claude-sonnet-4-5" in universal_model_names.py. Pricing matches Sonnet 4 in model_cost.json ($3/M input, $15/M output) so no cost table addition needed.

What's NOT in this PR

Robustness split. Our internal build borrowed rows from R2-Router's prediction file due to strong-tier cache gaps on the rephrased prompts. A follow-up PR will add a robustness file produced end-to-end from our pool.
Trained verifier weights bundle. Code path shipped in NadirClaw v0.19.0; weights upload to HuggingFace pending (see disclosure above). Not part of the RouterArena PR surface.

Reproduction

Classifier + N-tier cascade architecture (open today): https://github.com/NadirRouter/NadirClaw
Trained verifier weights (within 14 days): huggingface.co/nadirclaw/cascade-verifier-v1, MIT
Production deployment (closed source): same algorithm, live at api.getnadir.com, same verifier signature verifier.score(prompt, cheap_answer, None).

Contact

info@getnadir.com
https://getnadir.com

Two adapters submitted together: - nadir-cascade-v3-verifier: trained pre-classifier (wide_deep_asym_v3) + verifier-gated cascade. The verifier scores cheap-model responses and escalates Haiku → Sonnet when it rejects. arena_score 0.7118 on full split with our local rerun of compute_scores.py. - nadir-cheapest-strategy-E: pure cost-minimization baseline (no classifier) with length-budget routing. Submitted alongside the cascade for transparency about what arena scoring rewards. arena_score 0.7043 (optimistic length-budget accounting). Prediction files include 8,400 regular + 1,618 optimality entries (809 sub_10 prompts × 2 alternates each) = 10,018 total per full-split file. 420 robustness entries each, no optimality per protocol. Methodology / contamination audit / methodology critique: see router_inference/router/NADIR_NOTES.txt. Note: claude-sonnet-4-5 is used as the mid-tier model in cascade predictions (claude-sonnet-4-6 is not yet in universal_model_names.py). 4-5 and 4-6 are functionally equivalent for this evaluation.

pre-commit auto-fixes for the two Nadir adapter files. No logic changes.

…D/Auto Router/Martian) Adds independent-leaderboard credibility band to the homepage now that the RouterArena submission PR is open (RouteWorks/RouterArena#112). - StatBand: new 0.7118 arena_score tile with "top 5 projected" framing - BenchmarkSection: new "On the leaderboard" card body citing 0.7118 and the specific competitors we score above (Auto Router, vLLM-SR, Not Diamond, Martian) - index.html: JSON-LD SoftwareApplication + FAQPage updated with the RouterArena number and local-vs-published-pipeline disclaimer - Pricing.tsx: one-line footnote under existing benchmark band - docs/website-update-plan.md: internal style/audit checklist The 60% / 98% / 11,420-RouterBench headline numbers stay as the primary customer value claim. RouterArena is an independent-credibility band, not a replacement for the production-metrics story. Honest framing throughout: "projects to top 5" with the named board above us (Sqwish 75.27, OrcaRouter 72.08, Azure 71.87, R2-Router 71.60) and the named board below us (Auto Router 70.05, ..., Not Diamond 57.29). Final rank pending RouterArena reviewers' full evaluation pipeline.

yl231 · 2026-05-27T19:51:28Z

/evaluate

Per /evaluate output: the workflow expects exactly two prediction files per PR (the router + its -robustness companion). Removing the nadir-cheapest-strategy-E files from this PR; they will be submitted in a separate follow-up PR. This PR now contains only nadir-cascade-v3-verifier: - router_inference/router/nadir_adapter.py - router_inference/router/NADIR_NOTES.txt - router_inference/predictions/nadir-cascade-v3-verifier.json (10,018) - router_inference/predictions/nadir-cascade-v3-verifier-robustness.json (420) - router_inference/config/nadir-cascade-v3-verifier.json - config/pipeline_config/nadir.json

doramirdor · 2026-05-27T19:55:01Z

Thanks for the /evaluate trigger, @yl231. The failure was on my side: I bundled two routers in one PR (the cascade + a cost-min baseline), and your workflow correctly requires exactly two prediction files per PR (router + robustness companion).

I have removed the cost-min files from this PR in commit ee48462. This PR is now scoped to nadir-cascade-v3-verifier only, with the two required prediction files:

router_inference/predictions/nadir-cascade-v3-verifier.json (10,018 entries: 8,400 regular + 1,618 optimality)
router_inference/predictions/nadir-cascade-v3-verifier-robustness.json (420 entries)

The cost-min baseline (nadir-cheapest-strategy-E) will be submitted in a separate follow-up PR. Ready for another /evaluate whenever you are.

yl231 · 2026-05-27T19:58:38Z

Thanks for the /evaluate trigger, @yl231. The failure was on my side: I bundled two routers in one PR (the cascade + a cost-min baseline), and your workflow correctly requires exactly two prediction files per PR (router + robustness companion).

I have removed the cost-min files from this PR in commit ee48462. This PR is now scoped to nadir-cascade-v3-verifier only, with the two required prediction files:

router_inference/predictions/nadir-cascade-v3-verifier.json (10,018 entries: 8,400 regular + 1,618 optimality)

router_inference/predictions/nadir-cascade-v3-verifier-robustness.json (420 entries)

The cost-min baseline (nadir-cheapest-strategy-E) will be submitted in a separate follow-up PR. Ready for another /evaluate whenever you are.

Thank you for letting me know and for the fix! I will retry.

yl231 · 2026-05-27T19:58:43Z

/evaluate

Update the submission notes to use the Nadir org contact and the public open-source repo URL, rather than the founder's personal attribution that the earlier commits inherited. - Contact: info@getnadir.com - Open-source core: https://github.com/NadirRouter/NadirClaw (MIT) - Project site: https://getnadir.com Also removed lingering references to the cheapest baseline submission (now scoped out of this PR per RouteWorks#112 comments) and the validation status table that was bound to that two-router shape. Trimmed the notes to the single-router scope of this PR.

yl231 · 2026-05-28T04:23:28Z

Thanks for scoping this to a single router, @doramirdor — the two-file requirement is satisfied now. It's still failing at Evaluate submission: every regular entry has generated_result: null (all 8,400, plus the 1,618 optimality entries), so there are no model outputs to grade. You could push and drop another /evaluate after the fix.

- 2-tier pool (4 cheap, 5 strong) + DeBERTa verifier (tau=0.80) - Bedrock Sonnet 4.5 live-filled 426 strong-tier escalation gaps ($0.89) - Drops V3 files (null generated_result blocker) - Adds anthropic/claude-sonnet-4-5 to universal_model_names mapping - Gate passes all 4 checks; official scorer reports arena_F 0.7358 Contact: info@getnadir.com Repo: https://github.com/NadirRouter/NadirClaw Service: https://getnadir.com

#59) * feat(cascade): verifier-gated cascade + heuristic verifier + rule engine Ports the verifier-gated cascade architecture from Nadir Pro to the NadirClaw open-source core, plus the generic data-driven rule engine that sits in front of it. Cascade dispatch (nadirclaw/cascade.py): * Cheap-first dispatch with post-hoc verification. * Fail-open on verifier exceptions; kill switch after 3 consecutive errors so a misbehaving verifier never blocks request flow. * Default acceptance threshold tau=0.80, calibrated against the held-out RouterBench test split (n=11,420). At tau=0.80 the composed system preserves 98.3% of always-Opus quality with a 1.7% catastrophic-downgrade rate. Full tau-sweep documented inline. Heuristic verifier (nadirclaw/heuristic_verifier.py): * Rule-based, dependency-light (regex + stdlib only), ~1 ms / call. * Detects refusals, uncertainty, hard-min length, prompt/response ratio failures, and JSON parse failures. * Same scoring interface as the Nadir Pro DeBERTa verifier; ~0.60 AUROC vs ~0.96 for the trained version. Rule engine (nadirclaw/cascade_rules/): * Declarative YAML rules: substring / regex / prompt-length / classifier-confidence conditions, ORed inside `match.any_of`. * Four action types: force_escalate, force_cheap, set_threshold, set_max_tokens. Set-threshold rules stack (max wins); set_max_tokens rules stack (max wins, safer routing-side default). * TTL + mtime hot-reload cache so operators can edit a profile YAML on disk and see the new policy take effect without a restart. * PyYAML is optional (load_inline works without it); ships under a new `cascade-rules` extra in pyproject.toml. * Bundled `default.yaml` profile encodes the legacy force-escalate patterns and domain thresholds for code / summarisation — domains where post-hoc verifiers are known to be unreliable (AUROC 0.65 on mbpp, 0.77 on consensus_summary). Tests: 64 new test cases across rule parsing, priority ordering, applies_when gating, set_threshold stacking, set_max_tokens composition, malformed-rule rejection, hot-reload, and cascade integration. Existing 678-test suite remains green. * chore(verifier): contamination audit utility for benchmark reproducibility Adds `verifier/contamination_audit.py`, the standalone script that reproduces Nadir's "no held-out leakage" check across RouterBench and RouterArena. Given any benchmark prompt file(s) and any training-corpus file(s), the script: 1. NFC-normalises, strips, casefolds, and SHA-256s every prompt (same recipe used internally for the Nadir verifier corpus, so hashes are portable across the audit boundary). 2. Reports overlap count and up to N (default 50) overlap examples in a JSON report. 3. Exits 0 on zero overlap, 2 on any overlap, 1 on missing inputs -- so the audit can be wired straight into a CI gate. Stdlib-only (no third-party deps). Supports .jsonl, .json (list of objects or list of strings), and .txt. Per-file prompt key auto- detection (`prompt`, `input`, `question`, `query`, `text`) with `--prompt-key` override. The internal Nadir audit results that the public benchmark claims hang on: * RouterBench 0shot: 0 of 36,481 overlap (audit 2026-05-24) * RouterArena sub_10: 0 of 809 overlap (audit 2026-05-27) * RouterArena full: 0 of 8,399 overlap (audit 2026-05-27) Tests: 9 new test cases cover the hashing convention, the three supported file formats, the prompt-key override, the report shape, and the CLI exit codes. * docs: MODEL_CARD for wide_deep_asym_v3 + README benchmarks section MODEL_CARD.md documents the pre-generation classifier architecture that backs Nadir's RouterBench and RouterArena numbers: * Wide-and-deep asymmetric architecture, BGE embedding deep branch, lambda=3 downgrade penalty. * Training corpus, intended use, limitations, and the per-domain verifier AUROC variance that motivates the default cascade-rule profile (force-escalate on code / summarisation). * Held-out numbers: RouterBench AUROC 0.961, ECE 0.016, 98.3% quality preserved at tau=0.80; RouterArena sub_10 composite 0.7118 (projected #5 on the public leaderboard). * Contamination audit table (RouterBench 0/36,481; RouterArena sub_10 0/809; RouterArena full 0/8,399). * Explicit note that the trained `wide_deep_asym_v3.pt` artifact is proprietary to Nadir Pro; NadirClaw users get the same routing topology with the simpler binary centroid or DistilBERT classifier, and the same rule engine on top. README.md additions: * New "Benchmarks" section directly under "Why NadirClaw" with the held-out RouterBench, RouterArena, and contamination-audit numbers. Links to the live RouterArena submission PR (RouteWorks/RouterArena#112). * New "Verifier-gated cascade" and "Cascade rule engine" bullets in the Features section. * feat(classifier): bundle trained wide_deep_asym_v3 checkpoint + loader Ship the actual trained pre-generation classifier in the open-source package so NadirClaw users get the same Wide&Deep ternary classifier described in MODEL_CARD.md, not just the architecture description. Why bundle (Option A from the audit): - The asym + sym checkpoints together are ~1.8 MB. Adding them as package data is friction-free for users and avoids a HuggingFace download dependency or a training-recipe re-run on first use. - The MODEL_CARD already documented the architecture in detail; shipping the weights closes the loop so the documented benchmark numbers are reproducible from the package. - The MIT license already covers code in this repo; we relicense the weights under the same MIT terms (they were derived only from Nadir's internal labeled batches, which are ours to license). What ships: - nadirclaw/models/wide_deep_asym_v3.pt (905 KB, λ=3 asym CE loss) - nadirclaw/models/wide_deep_sym_v3.pt (905 KB, plain CE loss, recovers correct simple-class behaviour under argmax decoding) - nadirclaw/wide_deep_classifier.py — singleton-cached loader with argmax + cost-sensitive decoders, lazy BGE-base-en-v1.5 encoder, 33-d structural feature extractor. - nadirclaw/structural_features.py — 33-d feature extractor (length buckets, code fences, math symbols, tool calls, question words). Pure regex, no ML deps. - pyproject.toml — `models/*.pt` added to package-data so the checkpoints ship in the wheel. - tests/test_wide_deep_classifier.py — 10 integration tests that load the actual bundled weights, run a real forward pass, and assert the singleton + decoder hot-swap contract. MODEL_CARD updated to reflect that the weights now ship in NadirClaw (was previously documented as Pro-only). README "OSS vs Pro" table updated to mention the bundled trained classifier alongside the existing binary centroid and DistilBERT options. Usage: from nadirclaw.wide_deep_classifier import get_wide_deep_classifier clf = get_wide_deep_classifier( checkpoint_variant="asym", decision_rule="cost_sensitive", cost_lambda=20.0, ) result = clf.classify("Your prompt") print(result.tier, result.confidence) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cascade-rules): multi-provider routing profile + reproducibility doc Cross-vendor cascades (Gemini-cheap + OpenAI/Anthropic-mid + Opus-class top + Llama fallback) expose failure modes that the default single-vendor profile does not model: refusal-style drift between vendors, chain-of-thought ability gaps on the cheap tier, structured- output wrapping inconsistency, and length-control drift on summarisation. These were the patterns we observed when expanding Nadir's RouterArena submission from a single-provider menu to a four- provider menu. Adds: - nadirclaw/cascade_rules/profiles/multi_provider.yaml — 12-rule profile encoding the cross-provider mitigations: force_escalate on CoT / math-proof / jailbreak / code triggers, set_threshold bumps on JSON / summarise / long-prompt patterns, force_cheap short-circuits for trivial greetings and acknowledgements. - docs/multi-provider-routing.md — learnings writeup plus a reproducibility recipe for running NadirClaw's classifier + rule engine over cached benchmark responses (e.g. RouterArena's ./cached_results/) without making any live API calls. Cross-links to the RouterArena PR. - tests/test_cascade_rule_engine.py — 4 new tests asserting the profile loads cleanly and triggers the expected actions on CoT, greeting, and structured-output prompts. Loaded with: from nadirclaw.cascade_rules import load_profile engine = load_profile("multi_provider") cascade = Cascade(cheap_call, expensive_call, rule_engine=engine) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Nadir Research <info@getnadir.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

doramirdor · 2026-05-29T01:45:19Z

Hi maintainers,

This PR has been rebuilt to address the prior null generated_result blocker and tighten the methodology disclosure. Summary of changes since the last revision:

Replaced V3 prediction file (had null generated_result, gate FAIL) with V2 (10,018 entries, gate PASS on all 4 checks of check_config_prediction_files.py).
Filled 426 strong-tier escalation gaps with live AWS Bedrock Sonnet 4.5 calls ($0.89 total spend, real tokens, recorded cost matches model_cost.json to the cent). Labeled anthropic/claude-sonnet-4-5 in predictions.
One-line universal_model_names.py addition for anthropic/claude-sonnet-4-5 (pricing matches Sonnet 4, so no model_cost.json change needed).
Removed nadir-cascade-v3-verifier.json and its robustness counterpart (superseded).
Official scorer reports: arena_F 0.7358, accuracy 0.7518, cost/1k $0.2986.
Robustness split intentionally excluded for now; follow-up PR will add it end-to-end from our pool (disclosed in description).

The updated PR description also includes a public reproducibility commitment: the trained verifier weights (DeBERTa-v3-small, INT8 quantized, ~50MB) will be released at huggingface.co/NadirRouter/cascade-verifier-v1 under MIT within 14 days, so reviewers can reproduce 0.7358 end-to-end against our open-source classifier + N-tier cascade in NadirClaw.

Whenever you have a moment, would appreciate /evaluate to run the scorer against the new files. Happy to address any questions on methodology, particularly the cached-replay format and the 5.1% live-fill rate, both disclosed transparently in the description.

Thanks for the framework and the careful gate checks!

— Nadir team (info@getnadir.com)

* v0.19: Add TrainedVerifier (cascade-verifier-v1 from HuggingFace) - TrainedVerifier loads NadirRouter/cascade-verifier-v1 from HuggingFace (or a local HF cache). Same interface shape as HeuristicVerifier — score(prompt, cheap_answer, expect_json=...) returns a TrainedScore with .score / .accepted / .threshold / .reasons / .to_dict(). - New n2_trained profile uses the trained verifier; n2_default stays on the heuristic so users who do not want the transformer stack pay nothing for it. - CascadeConfig schema: new `verifier` and `verifier_model` fields. Validated against {"heuristic", "trained"} so typos fail fast. Defaults preserve v0.18 behaviour. - NTierCascade auto-instantiates TrainedVerifier when the loaded profile specifies verifier: trained. Lazy import keeps the heuristic-only path free of transformers/torch. - Optional install: pip install nadirclaw[trained] pulls transformers>=4.40 and torch>=2.0. - README: new "Trained verifier" section explains install, activation (NADIRCLAW_TIERS_PROFILE=n2_trained), and what is and is not released (frozen weights MIT; training pipeline and adaptive retraining remain Pro-only). - 9 new tests; full suite 773/773 passing. This is the frozen snapshot used in RouterArena PR #112 (arena_F 0.7358). Closes the 14-day reproducibility commitment in RouteWorks/RouterArena#112 Training pipeline and adaptive retraining loop remain proprietary to Nadir Pro; only the frozen weights are released. Repo: https://github.com/NadirRouter/NadirClaw Service: https://getnadir.com * ci: skip the live-tokenizer test by default (env-var gated) --------- Co-authored-by: Nadir <info@getnadir.com>

doramirdor · 2026-05-29T02:33:19Z

Quick update for reviewers: the reproducibility commitment in the PR description is now fully met.

Trained verifier weights are live:

HuggingFace: https://huggingface.co/nadirclaw/cascade-verifier-v1
License: MIT
Variants: FP32 model.safetensors (~541 MB), INT8 verifier_int8.pt (~418 MB)
Architecture: DeBERTa-v3-small, base microsoft/deberta-v3-small

NadirClaw v0.19.1 is on PyPI:

https://pypi.org/project/nadirclaw/0.19.1/
Includes the TrainedVerifier class wired to load nadirclaw/cascade-verifier-v1 automatically
Same algorithm + threshold (tau = 0.80) as this submission

End-to-end reproduction:

pip install nadirclaw[trained]
export NADIRCLAW_TIERS_PROFILE=n2_trained

That should reproduce the routing decisions in our prediction file within the ±0.005 error bar disclosed in the description. The wide_deep_asym_v3 classifier weights are bundled in the package; the verifier loads from HuggingFace on first use.

Let me know if anything in the model card, README, or PR description needs more detail. Happy to add ablations or a separate reproduction notebook if reviewers want one.

Thanks!

— Nadir team (info@getnadir.com)

doramirdor · 2026-05-29T13:14:39Z

@yl231 we update the model and added the missing parts thanks!

yl231 · 2026-05-29T17:07:34Z

/evaluate

yl231 · 2026-05-29T18:04:34Z

@doramirdor I have re-triggered the /evaluate workflow, but it failed due to the lack of router_inference/predictions/nadir-cascade-v2-robustness.json file. Please upload this file and re-trigger the evaluation. Thank you for the submission!

doramirdor added 2 commits May 27, 2026 14:56

ci: add SPDX license headers + ruff-format adapter files

9956f66

pre-commit auto-fixes for the two Nadir adapter files. No logic changes.

doramirdor mentioned this pull request May 27, 2026

Add generic cascade rule engine + τ=0.80 default + RouterArena context NadirRouter/NadirClaw#59

Merged

7 tasks

doramirdor mentioned this pull request May 28, 2026

Add N-tier cascade architecture (default N=2) NadirRouter/NadirClaw#60

Merged

doramirdor mentioned this pull request May 29, 2026

v0.19: TrainedVerifier (RouterArena 0.7358 snapshot) NadirRouter/NadirClaw#61

Merged

8 tasks

doramirdor mentioned this pull request May 29, 2026

v0.19.1: Fix HuggingFace path to nadirclaw/cascade-verifier-v1 NadirRouter/NadirClaw#62

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Nadir router (verifier-gated cascade + cost-min baseline)#112

Add Nadir router (verifier-gated cascade + cost-min baseline)#112
doramirdor wants to merge 5 commits into
RouteWorks:mainfrom
doramirdor:add-nadir-router-2026-05-27

doramirdor commented May 27, 2026 •

edited

Loading

Uh oh!

yl231 commented May 27, 2026

Uh oh!

doramirdor commented May 27, 2026

Uh oh!

yl231 commented May 27, 2026

Uh oh!

yl231 commented May 27, 2026

Uh oh!

yl231 commented May 28, 2026

Uh oh!

doramirdor commented May 29, 2026

Uh oh!

doramirdor commented May 29, 2026

Uh oh!

doramirdor commented May 29, 2026 •

edited

Loading

Uh oh!

yl231 commented May 29, 2026

Uh oh!

yl231 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

doramirdor commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Algorithm

Pool

Methodology disclosures

What's NOT in this PR

Reproduction

Contact

Uh oh!

yl231 commented May 27, 2026

Uh oh!

doramirdor commented May 27, 2026

Uh oh!

yl231 commented May 27, 2026

Uh oh!

yl231 commented May 27, 2026

Uh oh!

yl231 commented May 28, 2026

Uh oh!

doramirdor commented May 29, 2026

Uh oh!

doramirdor commented May 29, 2026

Uh oh!

doramirdor commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yl231 commented May 29, 2026

Uh oh!

yl231 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

doramirdor commented May 27, 2026 •

edited

Loading

doramirdor commented May 29, 2026 •

edited

Loading