Skip to content

feat: challenge registry and Docker-based execution#22

Merged
Treelovah merged 4 commits intomainfrom
feat/challenge-registry-docker
Feb 20, 2026
Merged

feat: challenge registry and Docker-based execution#22
Treelovah merged 4 commits intomainfrom
feat/challenge-registry-docker

Conversation

@pi3-code
Copy link
Copy Markdown
Contributor

Summary

  • Add a remote challenge registry so challenges are fetched from a central index instead of requiring local files (--local flag preserves the old behavior)
  • Add Docker module for full container lifecycle management — pull, run, stop, and cleanup
  • Integrate Docker-based challenge orchestration into the interactive run flow and CLI commands
  • Extend config with registry URL and Docker settings

Test plan

  • Run oasis challenges and verify registry challenges are listed
  • Run oasis challenges --local and verify local challenges still work
  • Run a challenge that uses Docker and verify container spins up and tears down correctly
  • Run oasis interactive mode and verify registry-based challenge selection works
  • Verify --category and --difficulty filters work with registry challenges

pi3-code and others added 4 commits February 19, 2026 00:34
Add OpenAI-compatible provider support to the analyzer so users can
analyze benchmarks with the same provider they ran them on (OpenAI,
xAI, Google, Ollama, custom) instead of requiring a separate API key.

Add QuotaExceededError detection in the retry loop — when a 429
carries an `insufficient_quota` code or matching message, fail
immediately instead of wasting ~14s on retries that will never succeed.

- Add callOpenAIAnalyzer() alongside existing callAnthropicAnalyzer()
- Add --analyzer-provider / --analyzer-url flags to run and analyze
- Add -p / --api-url flags to the analyze command
- Add QuotaExceededError class and isQuotaExceededError() detection
- Fail fast on confirmed quota 429s, still retry transient rate limits
- User-friendly error messages with retry guidance when quota is hit
- Fix provider base URLs to include /v1 paths for OpenAI SDK compat
Launch a menu-driven experience when `oasis` is invoked with no arguments
in a TTY, making the CLI approachable without knowing any flags.

- Main menu: Run Benchmark, View Results, Configure API Keys, Advanced Mode, Exit
- Run Benchmark wizard: challenge → provider → model → API key → preflight → run → analysis
- Results browser: browse past runs, view details, analysis summaries, reports
- Config manager: view/add/remove API keys, set default provider/model, configure URLs
- Graceful Ctrl+C handling throughout all prompts
- Non-TTY environments (CI/pipes) fall back to standard help text
- All existing `oasis <command>` behavior unchanged (fully backwards compatible)

New dependency: @inquirer/prompts (ESM-native, TypeScript-first prompt library)
- Add registry module to fetch challenges from remote index
- Add Docker module for container lifecycle management (pull, run, stop, cleanup)
- Update challenges command to list from registry by default (--local for local)
- Update run command to support registry-based challenge selection
- Integrate Docker container orchestration into interactive run flow
- Extend config with registry URL and Docker settings
- Add container and registry types
Resolved conflicts by keeping PR #22 features (challenge registry and Docker execution).
Main branch had reverted to local-only mode, but we preserve the new registry+Docker functionality.
@Treelovah Treelovah merged commit 8fe8a95 into main Feb 20, 2026
3 checks passed
@Treelovah Treelovah deleted the feat/challenge-registry-docker branch February 20, 2026 22:18
Treelovah added a commit that referenced this pull request Feb 23, 2026
* feat: multi-provider analyzer with quota-exceeded fail-fast

Add OpenAI-compatible provider support to the analyzer so users can
analyze benchmarks with the same provider they ran them on (OpenAI,
xAI, Google, Ollama, custom) instead of requiring a separate API key.

Add QuotaExceededError detection in the retry loop — when a 429
carries an `insufficient_quota` code or matching message, fail
immediately instead of wasting ~14s on retries that will never succeed.

- Add callOpenAIAnalyzer() alongside existing callAnthropicAnalyzer()
- Add --analyzer-provider / --analyzer-url flags to run and analyze
- Add -p / --api-url flags to the analyze command
- Add QuotaExceededError class and isQuotaExceededError() detection
- Fail fast on confirmed quota 429s, still retry transient rate limits
- User-friendly error messages with retry guidance when quota is hit
- Fix provider base URLs to include /v1 paths for OpenAI SDK compat

* feat: interactive mode when running oasis with no arguments

Launch a menu-driven experience when `oasis` is invoked with no arguments
in a TTY, making the CLI approachable without knowing any flags.

- Main menu: Run Benchmark, View Results, Configure API Keys, Advanced Mode, Exit
- Run Benchmark wizard: challenge → provider → model → API key → preflight → run → analysis
- Results browser: browse past runs, view details, analysis summaries, reports
- Config manager: view/add/remove API keys, set default provider/model, configure URLs
- Graceful Ctrl+C handling throughout all prompts
- Non-TTY environments (CI/pipes) fall back to standard help text
- All existing `oasis <command>` behavior unchanged (fully backwards compatible)

New dependency: @inquirer/prompts (ESM-native, TypeScript-first prompt library)

* feat: add challenge registry and Docker-based challenge execution

- Add registry module to fetch challenges from remote index
- Add Docker module for container lifecycle management (pull, run, stop, cleanup)
- Update challenges command to list from registry by default (--local for local)
- Update run command to support registry-based challenge selection
- Integrate Docker container orchestration into interactive run flow
- Extend config with registry URL and Docker settings
- Add container and registry types

---------

Co-authored-by: Marshall Livingston <mltechi@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants