Skip to content

Restrict SCML player protocol#111

Merged
john-b-yang merged 2 commits into
CodeClash-ai:mainfrom
Muhtasham:feat/scml-restricted-protocol
Jun 29, 2026
Merged

Restrict SCML player protocol#111
john-b-yang merged 2 commits into
CodeClash-ai:mainfrom
Muhtasham:feat/scml-restricted-protocol

Conversation

@Muhtasham

@Muhtasham Muhtasham commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • replace native SCML OneShotAgent submissions with a restricted decide(observation) policy function
  • keep SCML world objects, trusted wrapper agents, offer/response validation, scoring, and result-file handling inside the arena runtime
  • run submitted policies in isolated worker processes with startup handshakes, per-decision timeouts, and max_policy_errors fallback disabling
  • switch SCML worlds to a two-process setup so submitted policies actually participate in buy/sell negotiations
  • update starter code, docs, example config, validation, and tests for the restricted protocol

Design Choice For Review

This intentionally makes SCML more CodeClash-controlled than a native simulator-agent submission.

Instead of letting submitted code subclass SCML OneShotAgent directly, the arena exposes only a plain policy callback:

def decide(observation):
    return {}

The trusted runtime owns the actual SCML agents and converts policy decisions into validated negotiation intents:

  • proposal: {"offer": [quantity, time, unit_price]}
  • response: {"response": "accept" | "reject" | "end"}
  • {} or None: use the trusted greedy fallback

The tradeoff is deliberate:

  • pro: simulator ownership, scoring, offer validation, timeouts, and fallback behavior stay in trusted arena code
  • pro: submitted code cannot directly mutate the scored SCML agent/world objects
  • con: policies cannot use the full native SCML agent API directly, so this is less expressive than native OneShotAgent submissions

@john-b-yang could you sanity-check whether this restricted policy interface is the right CodeClash-compatible shape for SCML, or whether you would rather expose native SCML agent classes for more expressivity?

Verification

  • uv run ruff check codeclash/arenas/scml/scml.py codeclash/arenas/scml/runtime/run_scml.py tests/arenas/test_scml.py
  • uv run pytest -q tests/arenas/test_scml.py -> 10 passed
  • uv run pytest -q tests/arenas -> 190 passed
  • uv run pre-commit run --files codeclash/arenas/scml/scml.py codeclash/arenas/scml/runtime/README.md codeclash/arenas/scml/runtime/scml_agent.py codeclash/arenas/scml/runtime/run_scml.py configs/examples/SCML__dummy__r1__s2.yaml docs/reference/arenas/scml.md tests/arenas/test_scml.py
  • docker build -t codeclash/scml -f codeclash/arenas/scml/SCML.Dockerfile .
  • direct Docker starter smoke: two sims completed; all details had nonzero decisions, zero policy_errors, zero invalid_decisions, and zero disabled_policies
  • direct Docker invalid-output smoke: invalid offers/responses were rejected, logged as invalid_decisions, and the world completed using trusted fallback behavior
  • direct Docker infinite-loop smoke: looping policies hit per-decision timeout, were disabled after max_policy_errors, and the world completed using trusted fallback behavior
  • uv run python main.py configs/examples/SCML__dummy__r1__s2.yaml -o /private/tmp/codeclash-scml-protocol.sFesGl -> two launcher rounds completed, both players validated, details had active policy decisions and zero policy errors
  • after adding worker startup handshakes: rebuilt codeclash/scml and reran configs/examples/SCML__dummy__r1__s2.yaml; both launcher rounds completed with policy_errors_total: 0 and invalid_decisions_total: 0
  • uv run pytest -q -> 192 passed

@Muhtasham Muhtasham requested a review from john-b-yang June 25, 2026 14:39
@john-b-yang john-b-yang merged commit d14e326 into CodeClash-ai:main Jun 29, 2026
4 checks passed
@john-b-yang

Copy link
Copy Markdown
Contributor

Hey @Muhtasham thanks so much for doing this! Yeah to follow up with your question, I think this design totally makes sense. Looks like most of the scaffolding around the decide function is not really necessary, in that it's more for interacting/interfacing with the game rather than actually reflecting any decision making components. So I think we can definitely go with this, I don't have any strong inclination. The original is ok as well in that the agent has slightly more control over more of the code, although it increases the surface area for where agents could mess up the code (which not be the worst thing for evaluation 😛 )

But this is great! Just merged this and the #110 fixes, I think it looks great! thank u so much again for doing this, will look at abides and bomberland next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants