fix: resolve 4 critical Java E2E pipeline bugs by mashraf-222 · Pull Request #1514 · codeflash-ai/codeflash

mashraf-222 · 2026-02-18T03:02:22Z

Summary

Fixes 4 critical bugs in the Java E2E optimization pipeline discovered during bug hunting against the aerospike-client-java project.

ANSI escape codes causing Rich console hangs — Maven color output was being interpreted as Rich markup, causing terminal freezes
SQLite test result files not found crash — When all behavioral tests fail, instrumentation hooks never fire, so SQLite files don't exist; the code crashed trying to compare them
AI hallucinating classes/methods in test generation — Java testgen context only contained target code + same-file helpers, lacking imported type definitions that Python already provides

Problems fixed

Rich console hang: Maven ANSI escape codes (color output) were being parsed by Rich's highlighter, causing the console to hang or produce garbled output during long Maven operations.
FileNotFoundError on correctness verification: After running candidate behavioral tests, the code went directly to SQLite file comparison with no early exit if all tests failed. Python uses in-memory comparison (no file dependency), but Java/JS require SQLite files that only exist when test instrumentation hooks fire — which doesn't happen when tests error out in setUp.
Hallucinated constructors/methods: Java testgen context only passed target code + same-file helpers. Unlike Python (which uses get_imported_class_definitions() to extract full class bodies from project modules), Java had no mechanism to provide imported type information, forcing the AI to guess constructor signatures and method APIs.

Root causes

Rich's default highlighter interprets ANSI sequences in console output; Maven outputs colored text by default in non-batch mode.
The non-Python correctness verification path assumed SQLite files would always exist after behavioral tests, but instrumentation hooks only fire when tests actually execute their test bodies.
get_code_optimization_context_for_language() only assembled target code and same-file helpers for Java — no equivalent to Python's get_imported_class_definitions() existed.

Solutions implemented

NullHighlighter on all Rich Console and RichHandler instances + -B (batch mode) flag on all Maven subprocess commands to suppress ANSI output at the source.
Early exit guard before SQLite comparison: checks get_test_pass_fail_report_by_type() and returns get_results_not_matched_error() when total passed tests is 0. Mirrors Python's compare_test_results() which returns (False, []) for empty results.
New get_java_imported_type_skeletons() function that resolves project-internal imports via JavaImportResolver, extracts class declarations + constructors + fields + public method signatures using _extract_type_skeleton(), and appends them to testgen context. Added imported_type_skeletons field to CodeContext dataclass and threaded it through to CodeStringsMarkdown for testgen.

Code changes

File	Change
`codeflash/cli_cmds/console.py`	NullHighlighter on Console + RichHandler
`codeflash/cli_cmds/logging_config.py`	NullHighlighter on both RichHandlers
`codeflash/languages/java/test_runner.py`	`-B` flag on 4 Maven commands
`codeflash/languages/java/build_tools.py`	`-B` flag on 4 Maven commands
`codeflash/optimization/function_optimizer.py`	Early exit when 0 behavioral tests pass (non-Python)
`codeflash/languages/java/context.py`	New `get_java_imported_type_skeletons()` + helpers
`codeflash/languages/base.py`	`imported_type_skeletons` field on CodeContext
`codeflash/context/code_context_extractor.py`	Append skeletons to testgen context

Testing

E2E validation (both pass):

cd code_to_optimize/java/
CODEFLASH_CFAPI_SERVER=local CODEFLASH_AIS_SERVER=local uv run codeflash --file src/main/java/com/example/Fibonacci.java --function fibonacci --no-pr --verbose
CODEFLASH_CFAPI_SERVER=local CODEFLASH_AIS_SERVER=local uv run codeflash --file src/main/java/com/example/Calculator.java --function calculateStats --no-pr --verbose

Fibonacci: optimization found, tests pass, correctness verified, 973x speedup
Calculator.calculateStats: optimization found, tests pass, correctness verified, 3.6x speedup (exercises imported type resolution via MathHelpers dependency)

Unit tests: 72/72 Java context tests pass; 545 total pass (33 pre-existing failures in unrelated areas)

Impact

Unblocks Java E2E optimization on real-world projects with Maven builds
Prevents crashes when generated tests fail during setUp (common with infrastructure-dependent code)
Reduces AI hallucination in test generation by providing real type signatures

… Maven output Add NullHighlighter to Rich Console and RichHandler instances to prevent ANSI escape codes in Maven output from being interpreted as Rich markup. Add -B (batch mode) flag to all Maven commands to suppress ANSI color output at the source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… behavioral tests fail When all behavioral tests fail for a Java/JS optimization candidate, skip the SQLite file comparison that would crash with FileNotFoundError. SQLite result files only exist when test instrumentation hooks fire, which doesn't happen when tests error out in setUp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nt hallucination Add get_java_imported_type_skeletons() that resolves project-internal imports, extracts class declarations, fields, constructors, and public method signatures, and appends them to the testgen context. This gives the AI real type information instead of forcing it to hallucinate constructors and factory methods. Follows the same pattern as Python's get_imported_class_definitions(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeflash/languages/java/context.py

misrasaurabh1

approved but add tests please

This optimization achieves a **12% runtime improvement** by reducing redundant string operations and minimizing encode/decode overhead in Java code parsing. **Key optimizations:** 1. **Reduced repeated `strip().splitlines()` calls**: The original code called `skeleton.fields_code.strip().splitlines()` on every loop iteration. The optimized version hoists this computation outside the loop, performing it once and reusing the result. Same for `constructors_code`. This eliminates redundant string processing. 2. **Single-pass child traversal with byte operations**: In `_extract_public_method_signatures`, the original code made two separate passes over `node.children` - first to find modifiers, then to collect signature parts. The optimized version combines these into a single pass, checking modifiers and accumulating signature bytes simultaneously. 3. **Direct byte comparison**: Instead of decoding modifier text to check `if "public" in mod_text` (which requires UTF-8 decode + string search), the optimization checks `if pub_token in mod_slice` directly on bytes. This avoids unnecessary decode operations. 4. **Deferred decoding with byte accumulation**: Rather than decoding each child's bytes immediately and joining decoded strings (`sig_parts.append(...decode("utf8"))`), the optimized code accumulates raw byte slices and performs a single `b" ".join(...).decode("utf8")` at the end. This reduces allocation overhead from multiple intermediate string objects. **Performance impact:** The large-scale test (1000 fields/constructors/methods) shows the strongest improvement: **1.30ms → 1.15ms (12.9% faster)**. This demonstrates the optimization scales well with code size, as the benefits of reducing redundant operations compound with larger inputs. The smaller test cases show minor variations (some slightly slower, some faster) as the overhead savings are more significant for larger workloads. **Why it's faster:** - Fewer string allocations and deallocations - Reduced UTF-8 encode/decode operations (Python strings ↔ bytes conversions are expensive) - Single traversal of AST children instead of two passes - Minimized repeated string method calls (`strip()`, `splitlines()`) The optimization maintains identical behavior while leveraging Python's efficient byte operations and reducing unnecessary string conversions that dominated the original implementation's runtime in the line profiler (14.5% time in decode operations alone).

codeflash-ai · 2026-02-18T03:25:00Z

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for `_format_skeleton_for_context` in `codeflash/languages/java/context.py`

⏱️ Runtime : 1.32 milliseconds → 1.17 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _format_skeleton_for_context by 13% in PR #1514 (fix/java-e2e-critical-bugs) #1515

If you approve, it will be merged into this PR (branch fix/java-e2e-critical-bugs).

codeflash-ai · 2026-02-18T03:30:49Z

⚡️ Codeflash found optimizations for this PR

📄 318% (3.18x) speedup for `_extract_public_method_signatures` in `codeflash/languages/java/context.py`

⏱️ Runtime : 5.37 milliseconds → 1.28 milliseconds (best of 7 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _extract_public_method_signatures by 318% in PR #1514 (fix/java-e2e-critical-bugs) #1516

If you approve, it will be merged into this PR (branch fix/java-e2e-critical-bugs).

…2026-02-18T03.24.51 ⚡️ Speed up function `_format_skeleton_for_context` by 13% in PR #1514 (`fix/java-e2e-critical-bugs`)

codeflash-ai · 2026-02-18T09:51:58Z

This PR is now faster! 🚀 @mashraf-222 accepted my optimizations from:

⚡️ Speed up function _format_skeleton_for_context by 13% in PR #1514 (fix/java-e2e-critical-bugs) #1515

Add 13 tests covering: - get_java_imported_type_skeletons(): internal import resolution, method signature extraction, external import filtering, deduplication, empty input handling, and token budget enforcement - _extract_public_method_signatures(): public method extraction, constructor exclusion, empty class handling, class name filtering - _format_skeleton_for_context(): basic class formatting, enum constants, empty class edge case Also resolve merge conflict from PR #1515 optimization (bytes-based single-pass method signature extraction). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-02-18T09:55:09Z

Added 13 unit tests covering get_java_imported_type_skeletons(), _extract_public_method_signatures(), and _format_skeleton_for_context(). Tests use the existing java_maven fixture project (Calculator importing MathHelper/Formatter). All 85 context tests pass.

Bug 4 (candidate_early_exit.py - 6 tests): - All tests failed → 0 total passed (guard triggers) - Some tests passed → nonzero (guard does not trigger) - Empty results → 0 passed (guard triggers) - Only non-loop1 results → ignored by report (guard triggers) - Mixed test types all failing → 0 across all types - Single passing among many failures → prevents early exit Bug 3 edge cases (context.py - 8 tests): - Wildcard imports are skipped (class_name=None) - Import to nonexistent class returns None skeleton - Skeleton output is well-formed Java (has braces) - Protected and package-private methods excluded - Overloaded public methods all extracted - Generic method signatures extracted correctly - Round-trip: _extract_type_skeleton → _format_skeleton_for_context - Round-trip with real MathHelper fixture file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-02-18T10:01:57Z

Added 14 more tests addressing coverage gaps:

Bug 4 — early exit guard (6 tests in test_candidate_early_exit.py):

All tests failed / empty results / non-loop1 only → guard triggers (0 passed)
Some tests passed / single pass among failures → guard does not trigger
Mixed test types all failing → 0 across all types

Bug 3 — edge cases (8 tests in test_context.py):

Wildcard imports skipped (class_name=None)
Import to nonexistent class → None skeleton
Skeleton output well-formed (braces check)
Protected/package-private methods excluded
Overloaded and generic method signatures
Round-trip: _extract_type_skeleton → _format_skeleton_for_context with real fixture

Total: 99 tests pass across both files.

mashraf-222 and others added 3 commits February 18, 2026 02:33

misrasaurabh1 reviewed Feb 18, 2026

View reviewed changes

codeflash/languages/java/context.py Show resolved Hide resolved

misrasaurabh1 approved these changes Feb 18, 2026

View reviewed changes

codeflash-ai bot mentioned this pull request Feb 18, 2026

⚡️ Speed up function _format_skeleton_for_context by 13% in PR #1514 (fix/java-e2e-critical-bugs) #1515

Merged

codeflash-ai bot mentioned this pull request Feb 18, 2026

⚡️ Speed up function _extract_public_method_signatures by 318% in PR #1514 (fix/java-e2e-critical-bugs) #1516

Closed

Merge pull request #1515 from codeflash-ai/codeflash/optimize-pr1514-…

22d6559

…2026-02-18T03.24.51 ⚡️ Speed up function `_format_skeleton_for_context` by 13% in PR #1514 (`fix/java-e2e-critical-bugs`)

mashraf-222 merged commit 0b8284f into omni-java Feb 18, 2026
22 of 34 checks passed

mashraf-222 deleted the fix/java-e2e-critical-bugs branch February 18, 2026 10:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve 4 critical Java E2E pipeline bugs#1514

fix: resolve 4 critical Java E2E pipeline bugs#1514
mashraf-222 merged 7 commits intoomni-javafrom
fix/java-e2e-critical-bugs

mashraf-222 commented Feb 18, 2026

Uh oh!

Uh oh!

misrasaurabh1 left a comment

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Speed up function `_format_skeleton_for_context` by 13% in PR #1514 (`fix/java-e2e-critical-bugs`) #1515

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Speed up function `_extract_public_method_signatures` by 318% in PR #1514 (`fix/java-e2e-critical-bugs`) #1516

Uh oh!

codeflash-ai bot commented Feb 18, 2026

Uh oh!

mashraf-222 commented Feb 18, 2026

Uh oh!

mashraf-222 commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mashraf-222 commented Feb 18, 2026

Summary

Problems fixed

Root causes

Solutions implemented

Code changes

Testing

Impact

Related

Uh oh!

Uh oh!

misrasaurabh1 left a comment

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for _format_skeleton_for_context in codeflash/languages/java/context.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _format_skeleton_for_context by 13% in PR #1514 (fix/java-e2e-critical-bugs) #1515

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 318% (3.18x) speedup for _extract_public_method_signatures in codeflash/languages/java/context.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function _extract_public_method_signatures by 318% in PR #1514 (fix/java-e2e-critical-bugs) #1516

Uh oh!

codeflash-ai bot commented Feb 18, 2026

Uh oh!

mashraf-222 commented Feb 18, 2026

Uh oh!

mashraf-222 commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

📄 13% (0.13x) speedup for `_format_skeleton_for_context` in `codeflash/languages/java/context.py`

⚡️ Speed up function `_format_skeleton_for_context` by 13% in PR #1514 (`fix/java-e2e-critical-bugs`) #1515

📄 318% (3.18x) speedup for `_extract_public_method_signatures` in `codeflash/languages/java/context.py`

⚡️ Speed up function `_extract_public_method_signatures` by 318% in PR #1514 (`fix/java-e2e-critical-bugs`) #1516