Final polish: non-root Dockerfile, /reset task_id API support, logging cleanup, new tests by Copilot · Pull Request #2 · bigturtle679/Contract-Negotiation-Environment

Copilot · 2026-04-11T10:31:20Z

Comprehensive hardening pass addressing container security, an API feature gap, code quality, and test coverage.

Security

Dockerfile: Run as non-root appuser instead of root

API

/reset now accepts optional task_id: The environment already supported reset(task_id=...) but the API never exposed it. Now accepts {"task_id": "expert_data_protection"} body, returns 400 on invalid IDs.

# Before: only sequential cycling
POST /reset  →  next task in rotation

# After: targeted reset supported
POST /reset  {"task_id": "expert_data_protection"}  →  specific task
POST /reset  {"task_id": "bad_id"}                   →  400

Code quality

app.py: Move import logging from inside exception handler to module level — was re-importing on every 500
graders.py: Remove stale ✅ STRICT RANGE FIX implementation comments; add docstring to grade_action

Test coverage (75 → 78)

/reset with valid task_id returns correct task
/reset with invalid task_id returns 400
/evaluate-quality before any /reset returns 400

…sive tests - Add confidentiality/NDA (medium+), termination (hard++), data protection (expert) tasks - Add opponent simulation with contextual counterparty responses per action type - Enhance grading with semantic similarity (cosine+Jaccard) and clause completeness scoring - Add 3 new task-specific graders: grade_medium_plus, grade_hard_plus2, grade_expert - Add required_elements field for completeness scoring per task - Update reward formula: 35% correctness + 25% improvement + 25% risk_alignment + 10% semantic + 5% completeness - Improve inference script with adaptive multi-turn strategy for all 8 tasks - Add 21 new tests (42 total, all passing) - Update openenv.yaml, README, and all exports Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/e3c399ed-eac8-40b6-9c61-196e32f44385 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/e3c399ed-eac8-40b6-9c61-196e32f44385 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…d action_space - Fix effective_risk_high() to cover all tasks with trap_markers (was only HARD/HARD_PLUS) - Fix observation_risk_float() to boost risk for all trap-bearing tasks (was only HARD) - Add opponent stance parsing (_parse_opponent_stance) for concession/firmness detection - Add opponent-aware action adjustment in inference _choose() function - Add HTTP-client mode (--mode api) for Docker API evaluation - Add per-task scoring summary in benchmark output - Add action_space section to openenv.yaml - Remove vestigial server/app.py and test_endpoints.py - Add 7 new tests (49 total): trap coverage, accept blocking, opponent parsing - Update README with new features and ENV_SERVER_URL Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/c8dd2642-d749-42d1-9fad-91bc78d8d379 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…, clarify comments Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/c8dd2642-d749-42d1-9fad-91bc78d8d379 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…ce pydantic warnings - Track specific opponent concessions per topic (cap, liability, IP, etc.) - Feed concession summary to LLM for richer negotiation context - Add --retry-low THRESHOLD flag to re-run low-scoring tasks - Add smart ACCEPT gate: block acceptance when contract hasn't improved - Improve MODERATE strategy to front-load PROPOSE_COUNTER - Sort per-task summary by score (worst first) with best/mean stats - Silence FastAPI/pydantic internal deprecation warnings in pytest config - Add 2 new tests for concession tracking (51 total, all passing) Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/8526d5cb-221b-4b88-9ca5-4c8679367d2f Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…tants Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/8526d5cb-221b-4b88-9ca5-4c8679367d2f Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…tests, input length guards - Make python-dotenv import optional so tests pass without LLM deps - Add Literal types for risk_level (HIGH/MODERATE/LOW) and clause_type - Add Pydantic validator for opponent_responses keys (must be valid ActionType) - Add content length validation in ContractEnv (max_content_length=50_000) - Add /evaluate-quality input length guard (100_000 chars max) - Consolidate NUM_GRADED_TASKS to single definition in graders.py - Add 8 new edge-case tests: content length, max steps, done state, unicode, empty risk_keywords, opponent_response key validation, API max length - Update README: --retry-low docs, CORS_ORIGINS env var, test count (59) - All 59 tests pass Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/028275d3-079a-4945-8a77-3e3dcdf5d12a Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…iteral in models.py Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/028275d3-079a-4945-8a77-3e3dcdf5d12a Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…ion leak, pin deps - Fix negation-aware keyword matching: risk keywords in negation context (e.g. "no party is liable for consequential damages") no longer falsely trigger effective_risk_high(), which was blocking ACCEPT on the easy task even after the agent submitted the correct safe edit - Fix retry logic: --retry-low now passes task_id to reset() so it actually retries the same low-scoring task instead of cycling to the next one - Add reset(task_id=) parameter to ContractEnv for targeted task selection - Remove stack trace exposure in production error handler (app.py) - Add close()/context manager to _HTTPEnvClient for proper session cleanup - Pin requirements.txt versions to match pyproject.toml - Add 7 new tests: negation matching, safe edit acceptance, full edit+accept flow, reset with task_id, original contracts flagged correctly - 66 tests pass Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/5b50d4d5-393e-4df1-ad7e-50ef984ec409 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/5b50d4d5-393e-4df1-ad7e-50ef984ec409 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…, docstrings, new API tests - Replace 3 assertions in environment.py with RuntimeError guards that survive python -O - Replace assert in graders.py with explicit ValueError - Add Pydantic EvaluateQualityRequest model to app.py replacing raw dict - Remove dead _MAX_EVALUATE_TEXT_LEN constant and unused Any import - Add comprehensive docstrings to ContractEnv class and methods - Add docstring to evaluate_action() explaining 5-dimension rubric - Fix fragile __code__.co_varnames introspection in inference.py with try/except - Remove redundant TASKS import in environment.py tasks property - Normalize "belongs to Supplier" → "belongs to supplier" in tasks.py - Add 6 new API tests: schema, root, evaluate-quality missing/empty/success, invalid action - Update README test count from 66 → 72 Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/5ac284f3-516c-4d77-ba64-14f8ee06778c Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

… 3 new tests Security: global exception handler no longer leaks internal error details Robustness: _HTTPEnvClient now uses 30s request timeouts Robustness: opponent_responses validator rejects empty response lists Code quality: added docstrings to 10 undocumented public functions in graders.py Code quality: cleaned up dev-note comments in models.py (✅/❌ markers) Code quality: removed unused StepResponse model from models.py Code quality: added proper __init__.py to tests/ directory Test coverage: added test_step_before_reset_raises (RuntimeError check) Test coverage: added test_step_after_reset_specific_task_keeps_task Test coverage: added test_opponent_responses_non_empty_validation README: updated test count 72 → 75 Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/08a54a1a-0cc0-4519-a9c8-096c1ff8f4b8 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

…x, 3 new tests - Dockerfile: add non-root user for container security - app.py: move logging import to module level, add /reset task_id body param - graders.py: clean stale comments, add grade_action docstring - tests: add 3 new API tests (reset with task_id, invalid task_id, evaluate before reset) - README: update test count 75 → 78 Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/e910db62-1682-4ee6-9dc6-e8639ab081ea Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

Copilot AI and others added 13 commits April 11, 2026 07:41

Address code review feedback: fix help text and comment notation

7c92c77

Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/e3c399ed-eac8-40b6-9c61-196e32f44385 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

Improve _is_negated docstring for clarity

0d1222d

Agent-Logs-Url: https://github.com/bigturtle679/Contract-Negotiation-Environment/sessions/5b50d4d5-393e-4df1-ad7e-50ef984ec409 Co-authored-by: AbeerChaturvedi <171315954+AbeerChaturvedi@users.noreply.github.com>

Copilot AI assigned Copilot and AbeerChaturvedi Apr 11, 2026

Copilot created this pull request from a session on behalf of AbeerChaturvedi April 11, 2026 10:31 View session

bigturtle679 marked this pull request as ready for review April 11, 2026 10:31

bigturtle679 merged commit 3fdae1e into main Apr 11, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final polish: non-root Dockerfile, /reset task_id API support, logging cleanup, new tests#2

Final polish: non-root Dockerfile, /reset task_id API support, logging cleanup, new tests#2
bigturtle679 merged 13 commits into
mainfrom
copilot/improve-inference-script-again

Copilot AI commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 11, 2026

Security

API

Code quality

Test coverage (75 → 78)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants