feat: challenge registry and Docker-based execution#22
Merged
Conversation
Add OpenAI-compatible provider support to the analyzer so users can analyze benchmarks with the same provider they ran them on (OpenAI, xAI, Google, Ollama, custom) instead of requiring a separate API key. Add QuotaExceededError detection in the retry loop — when a 429 carries an `insufficient_quota` code or matching message, fail immediately instead of wasting ~14s on retries that will never succeed. - Add callOpenAIAnalyzer() alongside existing callAnthropicAnalyzer() - Add --analyzer-provider / --analyzer-url flags to run and analyze - Add -p / --api-url flags to the analyze command - Add QuotaExceededError class and isQuotaExceededError() detection - Fail fast on confirmed quota 429s, still retry transient rate limits - User-friendly error messages with retry guidance when quota is hit - Fix provider base URLs to include /v1 paths for OpenAI SDK compat
Launch a menu-driven experience when `oasis` is invoked with no arguments in a TTY, making the CLI approachable without knowing any flags. - Main menu: Run Benchmark, View Results, Configure API Keys, Advanced Mode, Exit - Run Benchmark wizard: challenge → provider → model → API key → preflight → run → analysis - Results browser: browse past runs, view details, analysis summaries, reports - Config manager: view/add/remove API keys, set default provider/model, configure URLs - Graceful Ctrl+C handling throughout all prompts - Non-TTY environments (CI/pipes) fall back to standard help text - All existing `oasis <command>` behavior unchanged (fully backwards compatible) New dependency: @inquirer/prompts (ESM-native, TypeScript-first prompt library)
- Add registry module to fetch challenges from remote index - Add Docker module for container lifecycle management (pull, run, stop, cleanup) - Update challenges command to list from registry by default (--local for local) - Update run command to support registry-based challenge selection - Integrate Docker container orchestration into interactive run flow - Extend config with registry URL and Docker settings - Add container and registry types
Resolved conflicts by keeping PR #22 features (challenge registry and Docker execution). Main branch had reverted to local-only mode, but we preserve the new registry+Docker functionality.
Treelovah
added a commit
that referenced
this pull request
Feb 23, 2026
* feat: multi-provider analyzer with quota-exceeded fail-fast Add OpenAI-compatible provider support to the analyzer so users can analyze benchmarks with the same provider they ran them on (OpenAI, xAI, Google, Ollama, custom) instead of requiring a separate API key. Add QuotaExceededError detection in the retry loop — when a 429 carries an `insufficient_quota` code or matching message, fail immediately instead of wasting ~14s on retries that will never succeed. - Add callOpenAIAnalyzer() alongside existing callAnthropicAnalyzer() - Add --analyzer-provider / --analyzer-url flags to run and analyze - Add -p / --api-url flags to the analyze command - Add QuotaExceededError class and isQuotaExceededError() detection - Fail fast on confirmed quota 429s, still retry transient rate limits - User-friendly error messages with retry guidance when quota is hit - Fix provider base URLs to include /v1 paths for OpenAI SDK compat * feat: interactive mode when running oasis with no arguments Launch a menu-driven experience when `oasis` is invoked with no arguments in a TTY, making the CLI approachable without knowing any flags. - Main menu: Run Benchmark, View Results, Configure API Keys, Advanced Mode, Exit - Run Benchmark wizard: challenge → provider → model → API key → preflight → run → analysis - Results browser: browse past runs, view details, analysis summaries, reports - Config manager: view/add/remove API keys, set default provider/model, configure URLs - Graceful Ctrl+C handling throughout all prompts - Non-TTY environments (CI/pipes) fall back to standard help text - All existing `oasis <command>` behavior unchanged (fully backwards compatible) New dependency: @inquirer/prompts (ESM-native, TypeScript-first prompt library) * feat: add challenge registry and Docker-based challenge execution - Add registry module to fetch challenges from remote index - Add Docker module for container lifecycle management (pull, run, stop, cleanup) - Update challenges command to list from registry by default (--local for local) - Update run command to support registry-based challenge selection - Integrate Docker container orchestration into interactive run flow - Extend config with registry URL and Docker settings - Add container and registry types --------- Co-authored-by: Marshall Livingston <mltechi@pm.me>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--localflag preserves the old behavior)Test plan
oasis challengesand verify registry challenges are listedoasis challenges --localand verify local challenges still workoasisinteractive mode and verify registry-based challenge selection works--categoryand--difficultyfilters work with registry challenges