Skip to content

refactor(llm-access-kiro): split anthropic/stream.rs into submodules#10

Merged
acking-you merged 2 commits into
masterfrom
refactor/split-kiro-anthropic-stream
May 30, 2026
Merged

refactor(llm-access-kiro): split anthropic/stream.rs into submodules#10
acking-you merged 2 commits into
masterfrom
refactor/split-kiro-anthropic-stream

Conversation

@acking-you
Copy link
Copy Markdown
Owner

What

Split the 2546-line llm-access-kiro/src/anthropic/stream.rs into a
stream/mod.rs facade plus 6 concern-focused submodules. Pure structural move —
no logic changes. Next step in the incremental llm-access* refactor (follows
#7 cache_sim, #4 request.rs).

Module layout

 Kiro Event
    -> [usage]           input-token resolution + usage JSON
    -> [sse_event]       SSE wire value
    -> [signature]       synthetic thinking signatures
    -> [inline_thinking] <thinking> tag extraction
    -> [state]           SSE block state machine
    -> [context]         StreamContext / BufferedStreamContext drivers
File Holds
usage.rs KiroInputTokenSource, KIRO_CONTEXT_USAGE_MIN_REQUEST_TOKENS, anthropic_usage_json, resolve_input_tokens[_with_threshold]
sse_event.rs SseEvent + impl
signature.rs synthetic_thinking_signature + proto encoders / derive helpers / wire-format consts
inline_thinking.rs build_inline_thinking_content_blocks, strip/split, quote-aware tag scanners, InlineThinkingBlock
state.rs SseStateManager, BlockState
context.rs StreamContext, BufferedStreamContext + context-only helpers (estimate_tokens, find_char_boundary, canonicalize_structured_output_json, SYNTHETIC_THINKING_PLACEHOLDER)

Module style (per the agreed standard)

  • mod.rs convention; no use super::* in submodules — explicit origin imports.
  • No pub use X::* globs: mod.rs re-exports only the 9 items used outside
    the module, by name, so the stream::X public surface (consumed by
    llm-access provider.rs and the anthropic/websearch.rs sibling) is
    unchanged.
  • StreamContext.state_manager is a pub field of type SseStateManager, so
    SseStateManager must stay pub-reachable (E0446) — re-exported by name.
  • Minimal visibility: cross-module items get pub + /// doc (state machine's
    is_block_open_of_type/has_non_thinking_blocks, the inline-thinking tag
    scanners + strip, synthetic_thinking_signature, and the 6 signature
    wire-format consts the shape test asserts against). pub(super) items now used
    only within stream were narrowed to private (InlineThinkingBlock,
    split_inline_thinking_content). No pub(crate).
  • Tests inlined: 7 usage tests, the SSE-format test, and the inline-thinking
    split test move into their submodules; the StreamContext integration cluster
    • shared proto-shape verification stay in context.rs (it's the only place
      that can reach the private process_assistant_response 3 tests exercise).

Verification

  • Item inventory diff vs original: 0 dropped, 0 added; #[test] count
    31 = 31.
  • cargo clippy -p llm-access-kiro -- -D warnings (lib and --tests): clean.
  • cargo test -p llm-access-kiro: 163 passed.
  • Reverse-dep cargo build -p llm-access: compiles against the unchanged public
    surface.
  • rustfmt on the 7 new files only; deps/lance & deps/lancedb stayed clean.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Anthropic-compatible SSE stream adapter by splitting the monolithic context.rs file into several focused submodules: inline_thinking, signature, sse_event, state, and usage, coordinated by a new mod.rs. This significantly improves code modularity and maintainability. Feedback on the changes suggests optimizing strip_inline_thinking_content in inline_thinking.rs by collecting the filtered iterator directly into a String to avoid a temporary Vec<String> allocation.

Comment on lines +182 to +191
pub fn strip_inline_thinking_content(content: &str) -> String {
split_inline_thinking_content(content, true)
.into_iter()
.filter_map(|block| match block {
InlineThinkingBlock::Text(text) => Some(text),
InlineThinkingBlock::Thinking(_) => None,
})
.collect::<Vec<_>>()
.join("")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid allocating a temporary Vec<String> and then joining them. Instead, you can collect the iterator directly into a String using .collect::<String>(), which is more efficient and idiomatic in Rust.

pub fn strip_inline_thinking_content(content: &str) -> String {
    split_inline_thinking_content(content, true)
        .into_iter()
        .filter_map(|block| match block {
            InlineThinkingBlock::Text(text) => Some(text),
            InlineThinkingBlock::Thinking(_) => None,
        })
        .collect::<String>()
}
References
  1. Avoid allocating temporary collections (such as Vec<String>) and cloning elements solely for hashing. Instead, hash borrowed segments directly by chaining iterators or updating the hasher incrementally, especially on performance-critical hot paths.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 961575d. Collect the kept Text segments directly into a String via collect::<String>(), dropping the temporary Vec<String> + join(""). Empty-separator join is plain concatenation, so output is byte-identical — hidden_thinking_strips_inline_thinking_without_signature still passes.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Anthropic-compatible SSE stream adapter by modularizing the monolithic context.rs file into several focused submodules: inline_thinking, signature, sse_event, state, and usage. This significantly improves the code's maintainability and readability. Feedback on the changes suggests optimizing strip_inline_thinking_content in inline_thinking.rs to collect directly into a String instead of allocating a temporary Vec first, which improves performance and idiomaticity.

Comment on lines +182 to +191
pub fn strip_inline_thinking_content(content: &str) -> String {
split_inline_thinking_content(content, true)
.into_iter()
.filter_map(|block| match block {
InlineThinkingBlock::Text(text) => Some(text),
InlineThinkingBlock::Thinking(_) => None,
})
.collect::<Vec<_>>()
.join("")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In strip_inline_thinking_content, collecting the filtered text blocks into a temporary Vec and then calling join("") causes unnecessary allocations and overhead. Since String implements FromIterator<String>, you can collect the iterator directly into a String to improve performance and idiomaticity.

pub fn strip_inline_thinking_content(content: &str) -> String {
    split_inline_thinking_content(content, true)
        .into_iter()
        .filter_map(|block| match block {
            InlineThinkingBlock::Text(text) => Some(text),
            InlineThinkingBlock::Thinking(_) => None,
        })
        .collect::<String>()
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 961575d. Collect the kept Text segments directly into a String via collect::<String>(), dropping the temporary Vec<String> + join(""). Empty-separator join is plain concatenation, so output is byte-identical — hidden_thinking_strips_inline_thinking_without_signature still passes.

acking-you added a commit that referenced this pull request May 30, 2026
…tring

strip_inline_thinking_content collected the kept text segments into a
temporary Vec<String> and then join("")'d them. String implements
FromIterator<String>, so collect into a String directly: one allocation that
appends each segment, instead of a Vec plus a second joined String.

Empty-separator join is plain concatenation, so the output is byte-identical;
hidden_thinking_strips_inline_thinking_without_signature still passes.

Addresses a PR #10 review finding from gemini-code-assist.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
acking-you and others added 2 commits May 31, 2026 07:07
…modules

Break the 2546-line anthropic/stream.rs into a stream/mod.rs facade plus 6
concern-focused submodules, following the agreed module style:

- usage.rs: input-token resolution + Anthropic usage JSON (KiroInputTokenSource,
  KIRO_CONTEXT_USAGE_MIN_REQUEST_TOKENS, anthropic_usage_json,
  resolve_input_tokens[_with_threshold]).
- sse_event.rs: the SseEvent wire value + impl.
- signature.rs: synthetic thinking-signature synthesis (synthetic_thinking_
  signature) with its proto encoders, derive helpers, and wire-format consts.
- inline_thinking.rs: inline <thinking> extraction (build_inline_thinking_
  content_blocks, strip/split, quote-aware tag scanners).
- state.rs: the SSE block state machine (SseStateManager, BlockState).
- context.rs: the StreamContext / BufferedStreamContext drivers plus their
  context-only helpers (estimate_tokens, find_char_boundary, canonicalize_
  structured_output_json, SYNTHETIC_THINKING_PLACEHOLDER).

Module style:
- mod.rs convention (matches the rest of the workspace).
- No `use super::*` in submodules: each imports its deps explicitly from their
  origin. No `pub use X::*` globs: mod.rs re-exports only the 9 items used
  outside the module, by name, so the stream::X public surface (consumed by
  llm-access provider.rs and the anthropic/websearch.rs sibling) is unchanged.
- StreamContext.state_manager is a pub field of type SseStateManager, so
  SseStateManager must stay pub-reachable; re-exported by name from mod.rs.
- Minimal visibility: cross-module items get `pub` + a `///` doc (the state
  machine's is_block_open_of_type / has_non_thinking_blocks, the inline-thinking
  tag scanners + strip, synthetic_thinking_signature, and the 6 signature
  wire-format consts the shape test asserts against). pub(super) items now used
  only within stream were narrowed to private (InlineThinkingBlock,
  split_inline_thinking_content). No pub(crate).
- Tests inlined: the 7 usage tests, the SSE-format test, and the inline-thinking
  split test move into their submodules (so the private items they exercise
  need no extra exposure); the StreamContext integration cluster + shared
  proto-shape verification stay in context.rs against process_assistant_response
  (private) and the public driver API.

No behavior change: item inventory diff vs the original shows zero dropped and
zero added items, #[test] count 31 = 31; clippy -D warnings clean (lib + tests),
163 kiro tests pass, and the reverse-dep llm-access binary compiles against the
unchanged public surface.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tring

strip_inline_thinking_content collected the kept text segments into a
temporary Vec<String> and then join("")'d them. String implements
FromIterator<String>, so collect into a String directly: one allocation that
appends each segment, instead of a Vec plus a second joined String.

Empty-separator join is plain concatenation, so the output is byte-identical;
hidden_thinking_strips_inline_thinking_without_signature still passes.

Addresses a PR #10 review finding from gemini-code-assist.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@acking-you acking-you force-pushed the refactor/split-kiro-anthropic-stream branch from 961575d to 2124e9f Compare May 30, 2026 23:09
@acking-you
Copy link
Copy Markdown
Owner Author

Rebased onto latest master (resolved the modify/delete conflict from #9 — its ContentLengthExceededException stop-reason fix and its test now live in stream/context.rs; verified 164 tests pass, clippy clean, llm-access builds).

/gemini review

@acking-you
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Anthropic-compatible SSE stream adapter by modularizing the monolithic context.rs file into focused submodules: inline_thinking, signature, sse_event, state, and usage. This significantly improves code organization and maintainability. Feedback on the changes suggests optimizing the strip_inline_thinking_content function in inline_thinking.rs to avoid unnecessary string allocations and clones by scanning tags and building the stripped string directly from borrowed slices.

Comment on lines +182 to +190
pub fn strip_inline_thinking_content(content: &str) -> String {
split_inline_thinking_content(content, true)
.into_iter()
.filter_map(|block| match block {
InlineThinkingBlock::Text(text) => Some(text),
InlineThinkingBlock::Thinking(_) => None,
})
.collect::<String>()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of strip_inline_thinking_content delegates to split_inline_thinking_content, which allocates a temporary Vec<InlineThinkingBlock> and clones multiple substrings into newly allocated Strings, only to immediately discard the thinking blocks and collect the text blocks.

Since this function is called on every streaming response/chunk, we can optimize it to avoid these temporary allocations and string clones by scanning the tags and building the stripped string directly from borrowed slices.

pub fn strip_inline_thinking_content(content: &str) -> String {
    let Some(start_pos) = find_real_thinking_start_tag(content) else {
        return content.to_string();
    };

    let mut result = String::with_capacity(content.len());
    let before = &content[..start_pos];
    if !before.trim().is_empty() {
        result.push_str(before);
    }

    let mut remaining = &content[start_pos + "<thinking>".len()..];
    if remaining.starts_with('\n') {
        remaining = &remaining[1..];
    }

    let end_pos = if let Some(end_pos) = find_real_thinking_end_tag(remaining) {
        end_pos
    } else if let Some(end_pos) = find_real_thinking_end_tag_at_buffer_end(remaining) {
        end_pos
    } else {
        return content.to_string();
    };

    let after_tag = &remaining[end_pos + "</thinking>".len()..];
    let after_thinking = after_tag.strip_prefix("\n\n").unwrap_or(after_tag);
    if !after_thinking.is_empty() {
        result.push_str(after_thinking);
    }

    result
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off on this one. The rewrite is correct and byte-equivalent (I checked all four branches: empty content, no start tag, no end tag, normal extraction), but the stated premise doesn't hold and the cost outweighs the gain here:

  • Not a hot path. strip_inline_thinking_content has exactly one call site workspace-wide — final_assistant_text, and only when hidden_thinking_enabled. It runs once at response finalization (via final_assistant_message/final_content_blocks, called after the stream loop in provider.rs), never inside the per-chunk process_kiro_event. So it saves one small Vec + a few String clones once per response, not per chunk.
  • Duplicates the scan control flow. The suggestion re-implements split_inline_thinking_content's tag scanning (start tag → skip leading \n → the two end-tag probes → strip \n\n). Keeping strip as a thin filter over split is a single source of truth; two parsers risk silently diverging if the thinking-tag semantics ever change.
  • Out of scope. This PR is a behavior-preserving structural split (no logic changes mixed in). Re-implementing the strip internals is a logic change, not a module move.

The previous collect::<String>() fix was a clean, local, no-duplication win — happy to take that class of change. If the micro-allocation on this cold path is worth pursuing, it'd be better as a separate perf PR that factors the scan into a shared visitor so neither copy duplicates the other.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@acking-you acking-you merged commit 6b0cf15 into master May 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant