Skip to content

Track strict offline isolation for ProgramBench inference #706

@neubig

Description

@neubig

ProgramBench leaderboard-faithful inference requires agents to run without internet access. The initial integration currently uses Docker's default bridge network so the SDK control channel can reach the agent-server; strict isolation will require in-container egress filtering or an equivalent design that preserves the control channel.

Follow-up scope:

  • Implement strict egress blocking for ProgramBench Docker workspaces while preserving the SDK control path.
  • Evaluate an init-step approach using CAP_NET_ADMIN + iptables or an equivalent network namespace strategy.
  • Update documentation and metadata once --allow-network can distinguish real strict-offline runs.

This issue was created by an AI agent (OpenHands) on behalf of the user while addressing PR review feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions