fix: resolve 4 critical Java E2E pipeline bugs#1514
Conversation
… Maven output Add NullHighlighter to Rich Console and RichHandler instances to prevent ANSI escape codes in Maven output from being interpreted as Rich markup. Add -B (batch mode) flag to all Maven commands to suppress ANSI color output at the source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… behavioral tests fail When all behavioral tests fail for a Java/JS optimization candidate, skip the SQLite file comparison that would crash with FileNotFoundError. SQLite result files only exist when test instrumentation hooks fire, which doesn't happen when tests error out in setUp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nt hallucination Add get_java_imported_type_skeletons() that resolves project-internal imports, extracts class declarations, fields, constructors, and public method signatures, and appends them to the testgen context. This gives the AI real type information instead of forcing it to hallucinate constructors and factory methods. Follows the same pattern as Python's get_imported_class_definitions(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
misrasaurabh1
left a comment
There was a problem hiding this comment.
approved but add tests please
This optimization achieves a **12% runtime improvement** by reducing redundant string operations and minimizing encode/decode overhead in Java code parsing.
**Key optimizations:**
1. **Reduced repeated `strip().splitlines()` calls**: The original code called `skeleton.fields_code.strip().splitlines()` on every loop iteration. The optimized version hoists this computation outside the loop, performing it once and reusing the result. Same for `constructors_code`. This eliminates redundant string processing.
2. **Single-pass child traversal with byte operations**: In `_extract_public_method_signatures`, the original code made two separate passes over `node.children` - first to find modifiers, then to collect signature parts. The optimized version combines these into a single pass, checking modifiers and accumulating signature bytes simultaneously.
3. **Direct byte comparison**: Instead of decoding modifier text to check `if "public" in mod_text` (which requires UTF-8 decode + string search), the optimization checks `if pub_token in mod_slice` directly on bytes. This avoids unnecessary decode operations.
4. **Deferred decoding with byte accumulation**: Rather than decoding each child's bytes immediately and joining decoded strings (`sig_parts.append(...decode("utf8"))`), the optimized code accumulates raw byte slices and performs a single `b" ".join(...).decode("utf8")` at the end. This reduces allocation overhead from multiple intermediate string objects.
**Performance impact:**
The large-scale test (1000 fields/constructors/methods) shows the strongest improvement: **1.30ms → 1.15ms (12.9% faster)**. This demonstrates the optimization scales well with code size, as the benefits of reducing redundant operations compound with larger inputs. The smaller test cases show minor variations (some slightly slower, some faster) as the overhead savings are more significant for larger workloads.
**Why it's faster:**
- Fewer string allocations and deallocations
- Reduced UTF-8 encode/decode operations (Python strings ↔ bytes conversions are expensive)
- Single traversal of AST children instead of two passes
- Minimized repeated string method calls (`strip()`, `splitlines()`)
The optimization maintains identical behavior while leveraging Python's efficient byte operations and reducing unnecessary string conversions that dominated the original implementation's runtime in the line profiler (14.5% time in decode operations alone).
⚡️ Codeflash found optimizations for this PR📄 13% (0.13x) speedup for
|
⚡️ Codeflash found optimizations for this PR📄 318% (3.18x) speedup for
|
…2026-02-18T03.24.51 ⚡️ Speed up function `_format_skeleton_for_context` by 13% in PR #1514 (`fix/java-e2e-critical-bugs`)
|
This PR is now faster! 🚀 @mashraf-222 accepted my optimizations from: |
Add 13 tests covering: - get_java_imported_type_skeletons(): internal import resolution, method signature extraction, external import filtering, deduplication, empty input handling, and token budget enforcement - _extract_public_method_signatures(): public method extraction, constructor exclusion, empty class handling, class name filtering - _format_skeleton_for_context(): basic class formatting, enum constants, empty class edge case Also resolve merge conflict from PR #1515 optimization (bytes-based single-pass method signature extraction). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Added 13 unit tests covering |
Bug 4 (candidate_early_exit.py - 6 tests): - All tests failed → 0 total passed (guard triggers) - Some tests passed → nonzero (guard does not trigger) - Empty results → 0 passed (guard triggers) - Only non-loop1 results → ignored by report (guard triggers) - Mixed test types all failing → 0 across all types - Single passing among many failures → prevents early exit Bug 3 edge cases (context.py - 8 tests): - Wildcard imports are skipped (class_name=None) - Import to nonexistent class returns None skeleton - Skeleton output is well-formed Java (has braces) - Protected and package-private methods excluded - Overloaded public methods all extracted - Generic method signatures extracted correctly - Round-trip: _extract_type_skeleton → _format_skeleton_for_context - Round-trip with real MathHelper fixture file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Added 14 more tests addressing coverage gaps: Bug 4 — early exit guard (6 tests in
Bug 3 — edge cases (8 tests in
Total: 99 tests pass across both files. |
Summary
Fixes 4 critical bugs in the Java E2E optimization pipeline discovered during bug hunting against the aerospike-client-java project.
Problems fixed
Rich console hang: Maven ANSI escape codes (color output) were being parsed by Rich's highlighter, causing the console to hang or produce garbled output during long Maven operations.
FileNotFoundError on correctness verification: After running candidate behavioral tests, the code went directly to SQLite file comparison with no early exit if all tests failed. Python uses in-memory comparison (no file dependency), but Java/JS require SQLite files that only exist when test instrumentation hooks fire — which doesn't happen when tests error out in setUp.
Hallucinated constructors/methods: Java testgen context only passed target code + same-file helpers. Unlike Python (which uses
get_imported_class_definitions()to extract full class bodies from project modules), Java had no mechanism to provide imported type information, forcing the AI to guess constructor signatures and method APIs.Root causes
get_code_optimization_context_for_language()only assembled target code and same-file helpers for Java — no equivalent to Python'sget_imported_class_definitions()existed.Solutions implemented
NullHighlighter on all Rich Console and RichHandler instances +
-B(batch mode) flag on all Maven subprocess commands to suppress ANSI output at the source.Early exit guard before SQLite comparison: checks
get_test_pass_fail_report_by_type()and returnsget_results_not_matched_error()when total passed tests is 0. Mirrors Python'scompare_test_results()which returns(False, [])for empty results.New
get_java_imported_type_skeletons()function that resolves project-internal imports viaJavaImportResolver, extracts class declarations + constructors + fields + public method signatures using_extract_type_skeleton(), and appends them to testgen context. Addedimported_type_skeletonsfield toCodeContextdataclass and threaded it through toCodeStringsMarkdownfor testgen.Code changes
codeflash/cli_cmds/console.pycodeflash/cli_cmds/logging_config.pycodeflash/languages/java/test_runner.py-Bflag on 4 Maven commandscodeflash/languages/java/build_tools.py-Bflag on 4 Maven commandscodeflash/optimization/function_optimizer.pycodeflash/languages/java/context.pyget_java_imported_type_skeletons()+ helperscodeflash/languages/base.pyimported_type_skeletonsfield on CodeContextcodeflash/context/code_context_extractor.pyTesting
E2E validation (both pass):
Unit tests: 72/72 Java context tests pass; 545 total pass (33 pre-existing failures in unrelated areas)
Impact
Related
Companion PR in codeflash-internal: prompt updates for Bug 2 (mock infrastructure deps) and Bug 3 (use provided type signatures)
🤖 Generated with Claude Code