refactor(llm-access-kiro): split anthropic/converter.rs into submodules#12
Conversation
…r/ submodules Break the 6071-line anthropic/converter.rs into a converter/mod.rs facade plus 15 concern-focused submodules, following the agreed module style. Pure structural move — no logic changes. The facade (mod.rs) keeps the shared data model (ConversionResult, NormalizedRequest, SessionTracking, the Session*/Tool* events, ConversionError, NormalizedRequest's private fields, etc.), the module consts, the invalid_request helper, and the 70 integration tests. Each conversion concern moves to a descendant submodule (descendants read the parent's private struct fields, so no field was made public): - model: model-id mapping + per-model context-window sizing. - schema: JSON-schema normalization + multimodal-compat keyword checks. - document / image: attachment normalization + format detection. - tool_name: tool-name/tool-use-id sanitization, aliasing, dup-id rewriting. - tools: tool-definition conversion + structured-output tool injection. - tool_result: tool-result text/attachment extraction. - tool_pairing: tool-use/tool-result pairing validation + orphan pruning. - session: conversation-id resolution from metadata. - identity: model-identity probes / Claude Code identity + billing-header strip. - thinking: thinking-prefix generation + injection. - system: system-prompt assembly. - validate: up-front request validation. - normalize: per-message + tool-schema normalization (normalize_request). - convert: history/current-turn assembly into a Kiro ConversationState. Module style: - mod.rs convention; no `use super::*` in submodules — explicit origin imports. Anthropic types are reached via `crate::anthropic::types::` (submodule `super` is now `converter`, not `anthropic`), wire types via `crate::wire::`. - No `pub use X::*` globs: mod.rs re-exports only the 8 entry-point fns used outside the module, by name (map_model, get_context_window_size, normalize_request, convert_normalized_request_with_resolved_session, current_user_message_range, preview_session_value, resolve_conversation_id_from_metadata, extract_tool_result_content). The public data-model types stay defined in the facade, so the converter::X surface (consumed by llm-access provider.rs/provider/kiro_error.rs and the stream submodule) is unchanged. - Minimal visibility: a moved fn is `pub` only if referenced from another module (45 of 96); the rest stay private. `convert_normalized_request_with_validation` keeps `pub(crate)`. Cross-module helper imports used only by `#[cfg(test)]` code are `#[cfg(test)]`-gated so the lib build stays warning-clean. - Tests inlined: 7 self-contained unit tests move into their owning submodule (so the private items they exercise need no extra exposure); the 70 integration tests stay in mod.rs against the public API. No behavior change: item inventory diff vs the original shows zero dropped and zero added items, #[test] count 77 = 77; clippy -D warnings clean (lib + tests), 164 kiro tests pass, and the reverse-dep llm-access binary compiles against the unchanged public surface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive conversion pipeline to translate Anthropic MessagesRequest payloads into Kiro ConversationState structures, handling message normalization, document deduplication, model mapping, and tool-use pairing. The review feedback highlights several critical improvements for robustness and performance, including avoiding potential panics from byte-based string slicing on non-UTF-8 boundaries, strictly validating UUIDs using the uuid crate, optimizing string traversal in session previews, and eliminating redundant string allocations during tool name generation.
| let pos = user_id.find("session_")?; | ||
| let session_part = &user_id[pos + 8..]; | ||
| if session_part.len() < 36 { | ||
| return None; | ||
| } | ||
|
|
||
| let uuid = &session_part[..36]; | ||
| is_valid_uuid(uuid).then(|| uuid.to_string()) |
There was a problem hiding this comment.
Slicing session_part[..36] directly can panic if the index 36 does not lie on a UTF-8 character boundary (e.g., if the string contains multi-byte characters). To prevent panics on non-character boundaries, avoid byte-based slicing and use character-based truncation instead.
let pos = user_id.find("session_")?;
let session_part = &user_id[pos + 8..];
let uuid: String = session_part.chars().take(36).collect();
(uuid.chars().count() == 36 && is_valid_uuid(&uuid)).then(|| uuid)References
- In Rust, avoid using String::truncate with byte indices on strings that may contain multi-byte UTF-8 characters to prevent panics on non-character boundaries. Use character-based truncation like chars().take(limit).collect() instead.
There was a problem hiding this comment.
Same as the resolve_conversation_id_from_metadata thread — deferring to a dedicated fix PR. Note this particular fn is #[cfg(test)] (test-only), so of the two slice sites it is the lower-priority one; the production path is resolve_conversation_id_from_metadata.
| let session_part = &user_id[pos + 8..]; | ||
| if session_part.len() < 36 { | ||
| return generated_fallback( | ||
| SessionFallbackReason::InvalidLegacySessionId, | ||
| None, | ||
| user_id_preview, | ||
| ); | ||
| } | ||
|
|
||
| let uuid = &session_part[..36]; |
There was a problem hiding this comment.
Slicing session_part[..36] directly can panic if the index 36 does not lie on a UTF-8 character boundary. To prevent panics on non-character boundaries, avoid byte-based slicing and use character-based truncation instead.
| let session_part = &user_id[pos + 8..]; | |
| if session_part.len() < 36 { | |
| return generated_fallback( | |
| SessionFallbackReason::InvalidLegacySessionId, | |
| None, | |
| user_id_preview, | |
| ); | |
| } | |
| let uuid = &session_part[..36]; | |
| let session_part = &user_id[pos + 8..]; | |
| let uuid: String = session_part.chars().take(36).collect(); | |
| if uuid.chars().count() != 36 { | |
| return generated_fallback( | |
| SessionFallbackReason::InvalidLegacySessionId, | |
| None, | |
| user_id_preview, | |
| ); | |
| } |
References
- In Rust, avoid using String::truncate with byte indices on strings that may contain multi-byte UTF-8 characters to prevent panics on non-character boundaries. Use character-based truncation like chars().take(limit).collect() instead.
There was a problem hiding this comment.
Valid pre-existing concern, but holding for a separate fix PR. This PR is a behavior-preserving structural split (verified by item-inventory diff + the same unchanged tests), and a panic-to-fallback guard is a behavior change that does not belong in it — the slice site here was relocated verbatim, not introduced.
Worth flagging: the suggested session_part.chars().take(36) also changes byte semantics to char semantics (equal only because UUIDs are ASCII). The byte-preserving guard is session_part.get(..36). That deserves its own PR with a regression test for non-UTF-8 / multibyte user_id input.
| pub fn is_valid_uuid(value: &str) -> bool { | ||
| value.len() == 36 && value.chars().filter(|ch| *ch == '-').count() == 4 | ||
| } |
There was a problem hiding this comment.
The current implementation of is_valid_uuid only checks the length and the count of hyphens, which is extremely loose and can accept invalid UUID strings (e.g., all hyphens). Since the uuid crate is already imported, we should use Uuid::parse_str to perform a strict and robust validation.
| pub fn is_valid_uuid(value: &str) -> bool { | |
| value.len() == 36 && value.chars().filter(|ch| *ch == '-').count() == 4 | |
| } | |
| pub fn is_valid_uuid(value: &str) -> bool { | |
| Uuid::parse_str(value).is_ok() | |
| } |
There was a problem hiding this comment.
Holding for the same follow-up PR. Tightening to Uuid::parse_str is a behavior change — it rejects strings the current 36-char / 4-hyphen check accepts — so it needs its own PR plus a check that upstream session_id values are always strict UUIDs before narrowing acceptance. Out of scope for this behavior-preserving structural move.
| pub fn preview_session_value(value: &str) -> String { | ||
| let mut preview = value | ||
| .chars() | ||
| .take(SESSION_SOURCE_PREVIEW_MAX_CHARS) | ||
| .collect::<String>(); | ||
| if value.chars().count() > SESSION_SOURCE_PREVIEW_MAX_CHARS { | ||
| preview.push_str("...[truncated]"); | ||
| } | ||
| preview | ||
| } |
There was a problem hiding this comment.
The preview_session_value function currently traverses the string twice: once to collect the preview characters, and once to count all characters in the string via value.chars().count(). This can be inefficient for very long strings. We can optimize this by using the iterator directly to check if there are any remaining characters.
pub fn preview_session_value(value: &str) -> String {
let mut chars = value.chars();
let mut preview: String = chars.by_ref().take(SESSION_SOURCE_PREVIEW_MAX_CHARS).collect();
if chars.next().is_some() {
preview.push_str("...[truncated]");
}
preview
}There was a problem hiding this comment.
Fixed in 34c9b57. Single iterator now: take the preview, then peek one more char (chars.next().is_some()) to decide the truncation marker — no second chars().count() pass. Byte-identical output.
| if !existing.contains(&STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_lowercase()) { | ||
| return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string(); | ||
| } | ||
| for suffix in 1.. { | ||
| let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}"); | ||
| if !existing.contains(&candidate.to_lowercase()) { | ||
| return candidate; | ||
| } |
There was a problem hiding this comment.
Since STRUCTURED_OUTPUT_TOOL_NAME_BASE is a lowercase constant ("sf_emit_structured_output"), calling .to_lowercase() on it and the generated candidate string is redundant and causes unnecessary allocations. We can directly query the HashSet with the lowercase constant and the candidate string.
| if !existing.contains(&STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_lowercase()) { | |
| return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string(); | |
| } | |
| for suffix in 1.. { | |
| let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}"); | |
| if !existing.contains(&candidate.to_lowercase()) { | |
| return candidate; | |
| } | |
| if !existing.contains(STRUCTURED_OUTPUT_TOOL_NAME_BASE) { | |
| return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string(); | |
| } | |
| for suffix in 1.. { | |
| let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}"); | |
| if !existing.contains(&candidate) { | |
| return candidate; | |
| } | |
| } |
There was a problem hiding this comment.
Fixed in 34c9b57. STRUCTURED_OUTPUT_TOOL_NAME_BASE and the {BASE}_{suffix} candidates are already lowercase, so those .to_lowercase() calls were no-op allocations — now querying the HashSet with the borrowed &str/&String directly. The per-tool .to_lowercase() that builds the set stays (it normalizes arbitrary tool names for case-insensitive comparison).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f50d18829
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| use std::collections::{BTreeMap, HashMap}; | ||
|
|
||
| pub use convert::{convert_normalized_request_with_resolved_session, current_user_message_range}; |
There was a problem hiding this comment.
Re-export convert_request from the facade
convert_request was a public function on anthropic::converter before this split, but the new facade only re-exports other entry points while convert_request now lives inside the private convert submodule. Because anthropic/mod.rs exposes pub mod converter, any caller using llm_access_kiro::anthropic::converter::convert_request can no longer compile after this refactor even though the function still exists internally; include it in this pub use to preserve the previous public surface.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This one does not hold. convert_request is #[cfg(test)]-gated (it was in the original file too), so it does not exist in non-test builds and was never part of the public surface — no downstream caller can name llm_access_kiro::anthropic::converter::convert_request, and a workspace grep finds no external (non-test) caller. The real entry points (convert_normalized_request_with_resolved_session, normalize_request, current_user_message_range, extract_tool_result_content, etc.) are re-exported by name, and the reverse-dep llm-access build compiles green against the unchanged surface. So there is nothing to re-export here.
…/tools Two behavior-preserving cleanups flagged in PR #12 review (both on code this PR only relocated, not introduced): - preview_session_value traversed the string twice (collect the preview, then chars().count() the whole value). Drive a single iterator: take the preview, then peek one more char to decide whether to append the truncation marker. Byte-identical output. - make_structured_output_tool_name called .to_lowercase() on STRUCTURED_OUTPUT_TOOL_NAME_BASE and each generated candidate before querying the HashSet. Both are already lowercase, so those calls were no-op allocations; query the set with the borrowed &str / &String directly. The per-tool .to_lowercase() that builds the set stays (it normalizes arbitrary tool names for case-insensitive comparison). No behavior change: clippy -D warnings clean (lib + tests), 164 kiro tests pass (incl. the session-fallback + structured-output tests that exercise both fns). Addresses PR #12 review findings from gemini-code-assist. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
Split the 6071-line
llm-access-kiro/src/anthropic/converter.rsinto aconverter/mod.rsfacade plus 15 concern-focused submodules. Pure structuralmove — no logic changes. Next step in the incremental
llm-access*refactor(follows #10 stream, #7 cache_sim, #4 request).
Layout
The facade (
mod.rs) keeps the shared data model (ConversionResult,NormalizedRequest+ its private fields,SessionTracking, theSession*/Tool*events,ConversionError), module consts, theinvalid_requesthelper, and the integration tests. Each conversion concern moves to a
descendant submodule — descendants read the parent's private struct fields, so
no field was made public.
modelschemadocument/imagetool_nametoolstool_resulttool_pairingsessionidentitythinkingsystemvalidatenormalizenormalize_request)convertConversationStateModule style (per the agreed standard)
mod.rsconvention; nouse super::*in submodules — explicit originimports. Anthropic types via
crate::anthropic::types::(submodulesuperis now
converter, notanthropic), wire types viacrate::wire::.pub use X::*globs:mod.rsre-exports only the 8 entry-point fnsused outside the module, by name. The public data-model types stay defined in
the facade, so the
converter::Xsurface (consumed byllm-accessprovider.rs/provider/kiro_error.rsand thestreamsubmodule) isunchanged.
pubonly if referenced from anothermodule (45 of 96); the rest stay private.
convert_normalized_request_with_validationkeepspub(crate). Cross-modulehelper imports used only by
#[cfg(test)]code are#[cfg(test)]-gated sothe lib build stays warning-clean.
the 70 integration tests stay in
mod.rsagainst the public API.Verification
hit was a doc-comment prose match —
media-type canonicalization).#[test]count 77 = 77.cargo clippy -p llm-access-kiro --all-targets -- -D warnings: clean.cargo test -p llm-access-kiro: 164 passed.cargo build -p llm-access: compiles against the unchangedpublic surface.
rustfmton the 16 new files only;deps/lance&deps/lancedbstayed clean.🤖 Generated with Claude Code