fix: Java test instrumentation and context improvements for E2E optimization#1530
Merged
mashraf-222 merged 4 commits intoomni-javafrom Feb 18, 2026
Merged
Conversation
- Increase imported type skeleton token budget from 2000 to 4000 - Add constructor signature summary headers to skeleton output - Expand wildcard imports (e.g., import com.foo.*) into individual types instead of silently skipping them - Prioritize skeleton processing for types referenced in the target method so parameter types are guaranteed context before less-critical types - Fix invalid [no-arg] annotation in constructor summaries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests Use distinct __existing_perfinstrumented prefix for existing test instrumentation paths to avoid colliding with generated test file paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When Maven compiles all test files together, a broken instrumented test file from one function's optimization can cause cascading compilation failures for ALL subsequent functions. This adds pre-iteration cleanup using find_leftover_instrumented_test_files() as a safety net. Also updates the Java pattern to match __existing_perfinstrumented variant files that were missed by the previous pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… Java timing instrumentation Two bugs in _add_timing_instrumentation that caused instrumented tests to fail compilation when test code contained multi-byte UTF-8 characters or variable declarations in the target call statement. 1. Tree-sitter returns byte offsets but body_text is a Python str (Unicode). Slicing the str with byte offsets corrupts statements when multi-byte chars (é, 世, etc.) precede the target call. 2. Wrapping a local_variable_declaration (e.g., int len = func()) inside a for/try block moves the variable out of scope for subsequent code. Now hoists the declaration before the timing block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
⚡️ Codeflash found optimizations for this PR📄 10% (0.10x) speedup for
|
misrasaurabh1
approved these changes
Feb 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problems fixed
Three categories of bugs that caused Java E2E optimization to fail, plus two systemic issues discovered during aerospike-client-java
--allrun (55 functions, 80% compilation failure rate):1. Timing instrumentation produces invalid Java when tests contain multi-byte UTF-8 characters
During E2E testing of
Buffer.stringToUtf8with hardcoded tests containing Unicode strings ("éñ","世界"), the instrumented test files had corrupted statements liket len = Buffer.stringToUtf8(...)instead ofint len = .... Theintkeyword was split across lines, producingnot a statementand';' expectedcompilation errors in every test method with non-ASCII string literals.Root cause:
_add_timing_instrumentation()ininstrumentation.pyuses tree-sitter, which returns byte offsets. These byte offsets were used directly to slicebody_text, which is a Pythonstr(Unicode). When multi-byte UTF-8 characters appear before the target statement, the byte offset is larger than the character offset, causing the slice to start mid-character. For example,"éñ"is 4 bytes in UTF-8 but 2 chars in Python — shifting all subsequent byte offsets by +2.Evidence: Reproduced with a minimal test script:
Fix: Convert tree-sitter byte offsets to character offsets before slicing:
stmt_start = len(body_bytes[:stmt_byte_start].decode(\"utf8\")).2. Variable scoping error when target call is inside a variable declaration
After fixing the byte-offset bug, instrumented tests failed with
variable len might not have been initialized. The timing instrumentation wraps the target statement (int len = func()) inside afor { try { ... } }block, which moves thelendeclaration into the try block scope. Subsequent code referencinglen(e.g.,for (int i = 0; i < len; i++)) can't find it.Fix: Added
split_var_declaration()that detectslocal_variable_declarationAST nodes, hoists the declaration (int len = 0;) before the timing block, and converts the wrapped statement to just an assignment (len = func();). Uses default values (0,0L,null, etc.) to satisfy Java's definite assignment rules.3. AI-generated tests had insufficient type context, causing undeclared variable and missing import errors
During the aerospike-client-java
--allrun, 19 of 55 functions (35%) failed because the AI generated tests referencing undeclared variables (policy,copy,result,configProvider) and missing class imports (ClientPolicy,Builder). The type skeleton system provided insufficient context: token budget was too low (2000 tokens), wildcard imports were silently skipped, type skeletons lacked constructor summary headers, and types referenced in the target method weren't prioritized.Root cause analysis from the aerospike run:
variable policy— 44 errors: AI referenced policy objects without creating themclass ClientPolicy— 80 errors: AI usedClientPolicywithout importingcom.aerospike.client.policy.ClientPolicyclass Builder— 24 errors: AI usedDynamicWriteConfig.Builderwithout proper importvariable copy,copy1,copy2— 38 errors: copy operations never assignedFixes:
IMPORTED_SKELETON_TOKEN_BUDGETfrom 2000 to 4000 tokens, giving the AI more complete type information_extract_type_names_from_code()to parse the target method's AST and collect all referenced type namesexpand_wildcard_import()toimport_resolver.py— wildcard imports likecom.aerospike.client.policy.*are now expanded to individual class files, so all types in a package are available for skeleton extraction_extract_constructor_summaries()that generates one-line// Constructors: ClassName(Type1 param1, Type2 param2)headers at the top of each skeleton, making constructor signatures unambiguous for the AI4. Existing test instrumentation silently overwrites generated test files (path collision)
During the aerospike run, generated tests and existing tests both used the
__perfinstrumentedsuffix, causing file path collisions. When a function had both generated and existing tests, the existing test instrumentation at line ~1940 infunction_optimizer.pyoverwrote the generated test file, silently destroying generated test content.Evidence from Fibonacci validation:
Fix: Existing tests now use distinct suffixes:
__existing_perfinstrumented/__existing_perfonlyinstrumented. Added class name replacement in the generated Java source to keep the file name and class name in sync (Java requirement). Updated the leftover file cleanup regex inoptimizer.pyto match the new__existing_prefix variant.5. Leftover instrumented test files from previous runs cause cascading compilation failures
In multi-function
--allruns, a broken instrumented test file from function N persists in the test directory and causes Maven compilation failures for function N+1 (since Maven compiles all test files together). This cascading effect can turn a single bad test into 100% failure for all subsequent functions.Fix: Added a safety-net cleanup step at the start of each function's optimization cycle in
optimizer.py. Before each function is optimized,find_leftover_instrumented_test_files()is called to detect and remove any stale*__perfinstrumented*and*__existing_perfinstrumented*files from the test root.Code changes
instrumentation.pybuild_instrumented_body(). Addedsplit_var_declaration()helper for variable hoisting with default value initialization. Applied to both single-range and multi-range branches.context.py_extract_type_names_from_code()for type prioritization via tree-sitter AST. Added_extract_constructor_summaries()for unambiguous constructor headers. Added priority sorting so target-method types get skeletons first.import_resolver.pyexpand_wildcard_import()— resolvescom.example.*to individual.javafiles in the package directory, enabling skeleton extraction for all types in wildcard-imported packages.function_optimizer.py__existing_perfinstrumented/__existing_perfonlyinstrumentedsuffixes. Added class name fixup for Java file/class name consistency.optimizer.py__existing_prefix.test_context.pyTesting
int len = 0;before timing block,len = func();inside tryKnown remaining issues
perfonlyinstrumentedvariant strips assertions viatransform_java_assertions, but leaves behind emptyforloops (for (int i = 0; i < len; i++) {}) that reference the hoisted variable. These compile correctly now but are dead code. A cleanup pass could remove them.codeflash-internalPR for the hardcoded test and prompt updates.