Pin runner dependencies to working versions #349

sallyom · 2025-11-19T19:19:41Z

Root cause: Recent package auto-upgrades broke SDK initialization.

Fixes applied:

npm @anthropic-ai/claude-code: 2.0.46 → 2.0.41 (CRITICAL FIX!)
- Version 2.0.46 (released Nov 19) breaks SDK subprocess initialization
- Reverts to last working version from Nov 14 upstream build
anthropic[vertex]: >=0.68.0 → ==0.73.0
- Pin to tested version for stability
- 0.74.0+ untested with Vertex AI
claude-agent-sdk: >=0.1.4 → ==0.1.6
- Pin to tested version for stability
- 0.1.7+ untested

Evidence:

Working upstream image (quay.io/ambient_code/vteam_claude_runner:latest) had npm CLI 2.0.41, anthropic 0.73.0, claude-agent-sdk 0.1.6
Recent builds auto-installed 2.0.46 → broke initialization
Python packages were already correct (red herring during investigation)

The npm pin is the critical fix. Python pins add build stability.

Root cause: Recent package auto-upgrades broke SDK initialization. Fixes applied: 1. npm @anthropic-ai/claude-code: 2.0.46 → 2.0.41 (CRITICAL FIX!) - Version 2.0.46 (released Nov 19) breaks SDK subprocess initialization - Reverts to last working version from Nov 14 upstream build 2. anthropic[vertex]: >=0.68.0 → ==0.73.0 - Pin to tested version for stability - 0.74.0+ untested with Vertex AI 3. claude-agent-sdk: >=0.1.4 → ==0.1.6 - Pin to tested version for stability - 0.1.7+ untested Evidence: - Working upstream image (quay.io/ambient_code/vteam_claude_runner:latest) had npm CLI 2.0.41, anthropic 0.73.0, claude-agent-sdk 0.1.6 - Recent builds auto-installed 2.0.46 → broke initialization - Python packages were already correct (red herring during investigation) The npm pin is the critical fix. Python pins add build stability. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

github-actions · 2025-11-19T19:23:18Z

Claude Code Review

Summary

This PR addresses a critical production issue by pinning runner dependencies to known-working versions after recent package auto-upgrades broke SDK initialization. The changes are minimal, focused, and well-justified with clear evidence in the PR description. The fix is appropriate and should be merged promptly to restore functionality.

Issues by Severity

Critical Issues

1. Missing Regression Test Strategy

Location: components/runners/claude-code-runner/tests/
Issue: No automated tests exist to detect SDK initialization failures before they reach production
Impact: Critical breakages like this one are only discovered in production/CI
Recommendation: Add integration tests that verify SDK subprocess initialization succeeds, Claude Code CLI version compatibility, and basic agent session execution completes

2. Dependency Pinning Strategy Not Documented

Location: CLAUDE.md:219-225, components/runners/claude-code-runner/
Issue: CLAUDE.md specifies Prefer uv over pip but doesnt provide guidance on when to use exact pins vs ranges
Context: Recent history shows repeated pin adjustments indicating trial-and-error approach
Recommendation: Add to CLAUDE.md under Configuration Standards Python: guidance on production containers using exact pins for stability

Major Issues

3. No Automated Dependency Update Process

Issue: The breakage occurred because anthropic-ai/claude-code auto-upgraded from 2.0.41 to 2.0.46 during builds
Root Cause: Dockerfile line 10 previously used unversioned npm install allowing latest version to be installed
Recommendation: Set up Dependabot/Renovate to create PRs for dependency updates. Consider extracting to package.json or use Renovate which supports Dockerfile parsing.

4. Version Selection Rationale Missing

Issue: PR states 0.74.0+ untested and 0.1.7+ untested but doesnt explain why these versions broke or what changed
Context: Makes future maintenance difficult - if 0.74.0 has a critical security fix, unclear if it can be safely adopted
Recommendation: Document specific breaking changes. Create GitHub issue tracking: Investigate anthropic 0.74.0+ and claude-agent-sdk 0.1.7+ compatibility

5. uv.lock Diff Too Large for Manual Review

Issue: 1489 lines of uv.lock changes mostly timestamps obscure meaningful dependency changes
Impact: Difficult to spot if unexpected transitive dependencies changed
Recommendation: Add CI check to validate only expected packages changed versions

Minor Issues

6. Inline Comments Use Different Style

Location: pyproject.toml:14-15
Issue: Inline comments after dependency declarations are inconsistent with Python conventions
Severity: Minor - current style is acceptable but comments above dependencies would be more scannable

7. No Rollback Plan Documented

Issue: If this pinning strategy fails, no documented process for emergency rollback
Recommendation: Document how to identify if version pins are causing issues and emergency rollback procedure

8. BOT_TOKEN Handling Could Be More Secure

Location: wrapper.py:78-83
Observation: Token appended to WebSocket URL as query parameter
Context: While not introduced by this PR, worth noting for security review
Note: Query parameters may appear in logs. Verify token redaction in logs per CLAUDE.md standards

Positive Highlights

Excellent Root Cause Analysis: PR description provides clear evidence trail

Minimal Blast Radius: Changes are scoped to exactly what is needed - 3 version pins across 3 files

Follows CLAUDE.md Standards: Uses uv as preferred package manager, Dockerfile follows established patterns, commit message has co-author attribution

Clear Commenting: Inline comments explain why versions are pinned, not just what is pinned

No Security Vulnerabilities Introduced: All pinned versions are recent and actively maintained

CI Validation in Progress: Build and E2E tests running

Recommendations

Immediate (Before Merge)

Merge as-is - This is a critical fix that restores production functionality
Add follow-up issue for test coverage
Verify CI passes - Especially E2E tests and claude-runner build

Short-term (Next Sprint)

Document dependency strategy in CLAUDE.md
Set up npm dependency monitoring
Create tracking issues for investigating newer versions compatibility and adding SDK initialization regression tests

Long-term (Next Quarter)

Improve dependency testing: Integration tests that catch SDK breakage before production
Automate version compatibility matrix: Test runner against multiple SDK versions in CI
Consider lock file tooling: Automated lock file diff summaries in PR comments

Final Verdict

APPROVE with follow-up tasks

This PR correctly addresses an urgent production issue with appropriate evidence and minimal risk. The identified issues are primarily about preventing future occurrences rather than problems with the current fix.

Blocking concerns: None
Non-blocking improvements: 8 recommendations tracked above

Estimated merge safety: High - restores known-good configuration with clear rollback path

bobbravo2 added the priority: high label Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pin runner dependencies to working versions #349

Pin runner dependencies to working versions #349

sallyom commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pin runner dependencies to working versions #349

Are you sure you want to change the base?

Pin runner dependencies to working versions #349

Conversation

sallyom commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Claude Code Review

Summary

Issues by Severity

Critical Issues

Major Issues

Minor Issues

Positive Highlights

Recommendations

Immediate (Before Merge)

Short-term (Next Sprint)

Long-term (Next Quarter)

Final Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants