Skip to content

[2.0] Add DuckDB E2E query optimization task#136

Merged
joyemang33 merged 3 commits into
mainfrom
codex/duckdb-join-order-task
Jun 5, 2026
Merged

[2.0] Add DuckDB E2E query optimization task#136
joyemang33 merged 3 commits into
mainfrom
codex/duckdb-join-order-task

Conversation

@joyemang33
Copy link
Copy Markdown
Contributor

Summary

Adds a new Frontier-CS 2.0 systems task, duckdb_e2e_query_optimization, where agents patch a pinned DuckDB checkout to improve end-to-end TPC-H style analytical query performance while preserving correctness.

The task includes Harbor-specific agent/judge Docker images, an async patch-submission workflow, a deterministic evaluator with hidden scale-factor groups and randomized query order, and patch-policy guardrails for optimizer/execution-focused DuckDB changes.

Please read CONTRIBUTING.md before submitting.

Type of Change

  • New research problem
  • New algorithmic problem
  • New Frontier-CS 2.0 problem
  • Bug fix
  • Documentation update
  • Other:

Testing

  • PYTHONPYCACHEPREFIX=/private/tmp/frontier-cs-pycache python3 -m py_compile 2.0/problems/duckdb_e2e_query_optimization/evaluator.py
  • UV_CACHE_DIR=/private/tmp/frontier-cs-uv-cache uv run --no-sync frontier show 2.0 duckdb_e2e_query_optimization
  • Generated the Harbor task through frontier_cs_2_0.main for duckdb_e2e_query_optimization
  • Built and smoked the Docker images:
    • frontiercs/duckdb-e2e-query-optimization-agent:experimental-v1.5.3
    • frontiercs/duckdb-e2e-query-optimization-judge:experimental-v1.5.3
  • Ran malicious patch-policy smoke checks for source deletion, denied-path rename/copy, TPC-H string hardcoding, denied CMake edits, and allowed optimizer subdir CMake wiring
  • Ran Harbor trial: frontier-cs-2-0-duckdb-e2e-query__XNXGS9L
    • score: 2.8865579012232834
    • reward: 0.028865579012232835
    • geomean speedup: 1.0202095982916792
    • successful submissions: 3
    • used best submission: true
  • Checked latest trial feedback for hidden leakage; no per_query, concrete hidden sf/q labels, correctness_mismatches, or stderr_tail were exposed

Checklist

  • Code follows the project structure and conventions
  • Self-review completed
  • Documentation updated (if applicable)

CI Validation (for new problems)

When adding new problems, CI will automatically validate that your reference solution achieves score > 0.

  • Algorithmic problems: Include reference.cpp in your problem directory

@joyemang33 joyemang33 marked this pull request as ready for review June 5, 2026 15:47
@joyemang33 joyemang33 merged commit d0d4a6d into main Jun 5, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant