Skip to content

Conversation

@codefromthecrypt
Copy link
Collaborator

Summary

claude-code shares a persistent subprocess that previously conflated all sessions into one. This caused a lot of problems, notably tests would include state from other tests. This could add pressure to restart the subprocess, rather than solve the bookkeeping problem.

This implements session_id to the stream-json protocol and removes the messages_sent bookkeeping as sessions are handled internally to claude.

This is standard practice for stream-json SDKs: Each sends only the new user message per turn with a session_id field, and the CLI maintains conversation context internally:

This also updates providers.rs integration tests to use a unique session ID per test, and support running with claude-code and codex CLI providers.

Type of Change

  • Bug fix
  • Refactor / Code quality
  • Tests

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

$ OLLAMA_HOST=http://localhost:11434 cargo test --test providers -- test_claude_code_provider test_codex_provider test_ollama_provider --nocapture
   Compiling goose v1.23.0 (/Users/codefromthecrypt/oss/goose-2/crates/goose)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 2.08s
     Running tests/providers.rs (target/debug/deps/providers-edd2a25b142813c2)

running 3 tests
=== codex::model_listing ===
[crates/goose/tests/providers.rs:288:9] &models = [
    "gpt-5.2-codex",
    "gpt-5.2",
    "gpt-5.1-codex-max",
    "gpt-5.1-codex-mini",
]
===================
=== claude-code::model_listing ===
[crates/goose/tests/providers.rs:288:9] &models = [
    "sonnet",
    "opus",
]
===================
=== Ollama::model_listing ===
[crates/goose/tests/providers.rs:288:9] &models = [
    "all-minilm:33m",
    "all-minilm:l6-v2",
    "devstral:latest",
    "gemma3:latest",
    "llama3.2:latest",
    "nomic-embed-text:latest",
    "qwen2.5-coder:32b",
    "qwen2.5:0.5b",
    "qwen2.5:latest",
    "qwen3-vl:2b",
    "qwen3-vl:latest",
    "qwen3:0.6b",
    "qwen3:1.7b",
    "qwen3:4b",
    "qwen3:latest",
]
===================
=== claude-code::basic_response === Hello! 👋 How can I help you today?
=== claude-code::context_length_exceeded_error ===
[crates/goose/tests/providers.rs:241:9] &result = Err(
    ContextLengthExceeded(
        "Prompt is too long",
    ),
)
===================
test test_claude_code_provider ... ok
=== codex::basic_response === Hello!
=== Ollama::basic_response === Hello! 😊 How can I assist you today?
=== codex::context_length_exceeded_error ===
[crates/goose/tests/providers.rs:241:9] &result = Err(
    ContextLengthExceeded(
        "Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.",
    ),
)
===================
test test_codex_provider ... ok
=== Ollama::tool_usage === test-uuid-12345-67890
=== Ollama::image_content === the image displayed contains a simple text message on a white background. it reads:  
**"hello goose! this is a test image."**  

the text is presented in a clean, sans-serif font with black lettering. the design is minimalistic, with no additional graphics, logos, or decorative elements. the message appears to be a straightforward test payload for validation purposes.
=== Ollama::context_length_exceeded_error ===
[crates/goose/tests/providers.rs:241:9] &result = Ok(
    (
        Message {
            id: None,
            role: Assistant,
            created: 1770690098,
            content: [
                Text(
                    Annotated {
                        raw: RawTextContent {
                            text: "no",
                            meta: None,
                        },
                        annotations: None,
                    },
                ),
            ],
            metadata: MessageMetadata {
                user_visible: true,
                agent_visible: true,
            },
        },
        ProviderUsage {
            model: "qwen3-vl",
            usage: Usage {
                input_tokens: Some(
                    79,
                ),
                output_tokens: Some(
                    189,
                ),
                total_tokens: Some(
                    268,
                ),
            },
        },
    ),
)
===================
test test_ollama_provider ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 13 filtered out; finished in 56.61s


============== Providers ==============
✅ Ollama
✅ claude-code
✅ codex
=======================================

Related Issues

Copilot AI review requested due to automatic review settings February 10, 2026 02:48
)]
async fn complete_with_model(
&self,
_session_id: Option<&str>, // create_session == YYYYMMDD_N, but --session-id requires a UUID
Copy link
Collaborator Author

@codefromthecrypt codefromthecrypt Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only applies to CLI validation, not the ndjson itself.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the claude-code CLI provider to use a session_id in the stream-json protocol so a single persistent subprocess can correctly isolate multiple concurrent/serial Goose sessions (and avoids leaking state between tests).

Changes:

  • Add session_id to the claude-code stream-json input payload and remove the previous messages_sent bookkeeping.
  • Update provider integration tests to support CLI providers (claude-code, codex) and use per-test session IDs where needed.
  • Refactor CLI provider from_env construction for claude-code and codex into the ProviderDef implementation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
crates/goose/src/providers/claude_code.rs Add session_id to stream-json input; remove messages_sent; adjust session-id handling in complete_with_model.
crates/goose/src/providers/codex.rs Move env-based provider construction into ProviderDef::from_env.
crates/goose/tests/providers.rs Expand integration tests for CLI providers; introduce CLI/non-CLI session handling and skip logic.

@codefromthecrypt codefromthecrypt force-pushed the adrian/provider-cli-tests branch from ed6d261 to ff6c11a Compare February 10, 2026 02:59
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

@codefromthecrypt codefromthecrypt added this pull request to the merge queue Feb 10, 2026
Merged via the queue into main with commit 4abf91e Feb 10, 2026
24 of 25 checks passed
@codefromthecrypt codefromthecrypt deleted the adrian/provider-cli-tests branch February 10, 2026 05:32
zanesq added a commit that referenced this pull request Feb 10, 2026
…tensions-deeplinks

* 'main' of github.com:block/goose:
  [docs] update authors.yaml file (#7114)
  Implement manpage generation for goose-cli (#6980)
  docs: tool output optimization (#7109)
  Fix duplicated output in Code Mode by filtering content by audience (#7117)
  Enable tom (Top Of Mind) platform extension by default (#7111)
  chore: added notification for canary build failure (#7106)
  fix: fix windows bundle random failure and optimise canary build (#7105)
  feat(acp): add model selection support for session/new and session/set_model (#7112)
  fix: isolate claude-code sessions via stream-json session_id (#7108)
  ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088)
  docs: codex subscription support (#7104)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)
  Docs: require auth optional for custom providers (#7098)
  fix: improve text-muted contrast for better readability (#7095)
  Always sync bundled extensions (#7057)
tlongwell-block added a commit that referenced this pull request Feb 10, 2026
* origin/main:
  feat: add AGENT=goose environment variable for cross-tool compatibility (#7017)
  fix: strip empty extensions array when deeplink also (#7096)
  [docs] update authors.yaml file (#7114)
  Implement manpage generation for goose-cli (#6980)
  docs: tool output optimization (#7109)
  Fix duplicated output in Code Mode by filtering content by audience (#7117)
  Enable tom (Top Of Mind) platform extension by default (#7111)
  chore: added notification for canary build failure (#7106)
  fix: fix windows bundle random failure and optimise canary build (#7105)
  feat(acp): add model selection support for session/new and session/set_model (#7112)
  fix: isolate claude-code sessions via stream-json session_id (#7108)
  ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088)
  docs: codex subscription support (#7104)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)

# Conflicts:
#	crates/goose/src/agents/extension.rs
jh-block added a commit that referenced this pull request Feb 10, 2026
* origin/main: (30 commits)
  docs: GCP Vertex AI org policy filtering & update OnboardingProviderSetup component (#7125)
  feat: replace subagent and skills with unified summon extension (#6964)
  feat: add AGENT=goose environment variable for cross-tool compatibility (#7017)
  fix: strip empty extensions array when deeplink also (#7096)
  [docs] update authors.yaml file (#7114)
  Implement manpage generation for goose-cli (#6980)
  docs: tool output optimization (#7109)
  Fix duplicated output in Code Mode by filtering content by audience (#7117)
  Enable tom (Top Of Mind) platform extension by default (#7111)
  chore: added notification for canary build failure (#7106)
  fix: fix windows bundle random failure and optimise canary build (#7105)
  feat(acp): add model selection support for session/new and session/set_model (#7112)
  fix: isolate claude-code sessions via stream-json session_id (#7108)
  ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088)
  docs: codex subscription support (#7104)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)
  Docs: require auth optional for custom providers (#7098)
  ...
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
)

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
)

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
)

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants