Skip to content

fix: resolve 4 critical Java E2E pipeline bugs#1514

Merged
mashraf-222 merged 7 commits intoomni-javafrom
fix/java-e2e-critical-bugs
Feb 18, 2026
Merged

fix: resolve 4 critical Java E2E pipeline bugs#1514
mashraf-222 merged 7 commits intoomni-javafrom
fix/java-e2e-critical-bugs

Conversation

@mashraf-222
Copy link
Contributor

Summary

Fixes 4 critical bugs in the Java E2E optimization pipeline discovered during bug hunting against the aerospike-client-java project.

  • ANSI escape codes causing Rich console hangs — Maven color output was being interpreted as Rich markup, causing terminal freezes
  • SQLite test result files not found crash — When all behavioral tests fail, instrumentation hooks never fire, so SQLite files don't exist; the code crashed trying to compare them
  • AI hallucinating classes/methods in test generation — Java testgen context only contained target code + same-file helpers, lacking imported type definitions that Python already provides

Problems fixed

  1. Rich console hang: Maven ANSI escape codes (color output) were being parsed by Rich's highlighter, causing the console to hang or produce garbled output during long Maven operations.

  2. FileNotFoundError on correctness verification: After running candidate behavioral tests, the code went directly to SQLite file comparison with no early exit if all tests failed. Python uses in-memory comparison (no file dependency), but Java/JS require SQLite files that only exist when test instrumentation hooks fire — which doesn't happen when tests error out in setUp.

  3. Hallucinated constructors/methods: Java testgen context only passed target code + same-file helpers. Unlike Python (which uses get_imported_class_definitions() to extract full class bodies from project modules), Java had no mechanism to provide imported type information, forcing the AI to guess constructor signatures and method APIs.

Root causes

  1. Rich's default highlighter interprets ANSI sequences in console output; Maven outputs colored text by default in non-batch mode.
  2. The non-Python correctness verification path assumed SQLite files would always exist after behavioral tests, but instrumentation hooks only fire when tests actually execute their test bodies.
  3. get_code_optimization_context_for_language() only assembled target code and same-file helpers for Java — no equivalent to Python's get_imported_class_definitions() existed.

Solutions implemented

  1. NullHighlighter on all Rich Console and RichHandler instances + -B (batch mode) flag on all Maven subprocess commands to suppress ANSI output at the source.

  2. Early exit guard before SQLite comparison: checks get_test_pass_fail_report_by_type() and returns get_results_not_matched_error() when total passed tests is 0. Mirrors Python's compare_test_results() which returns (False, []) for empty results.

  3. New get_java_imported_type_skeletons() function that resolves project-internal imports via JavaImportResolver, extracts class declarations + constructors + fields + public method signatures using _extract_type_skeleton(), and appends them to testgen context. Added imported_type_skeletons field to CodeContext dataclass and threaded it through to CodeStringsMarkdown for testgen.

Code changes

File Change
codeflash/cli_cmds/console.py NullHighlighter on Console + RichHandler
codeflash/cli_cmds/logging_config.py NullHighlighter on both RichHandlers
codeflash/languages/java/test_runner.py -B flag on 4 Maven commands
codeflash/languages/java/build_tools.py -B flag on 4 Maven commands
codeflash/optimization/function_optimizer.py Early exit when 0 behavioral tests pass (non-Python)
codeflash/languages/java/context.py New get_java_imported_type_skeletons() + helpers
codeflash/languages/base.py imported_type_skeletons field on CodeContext
codeflash/context/code_context_extractor.py Append skeletons to testgen context

Testing

E2E validation (both pass):

cd code_to_optimize/java/
CODEFLASH_CFAPI_SERVER=local CODEFLASH_AIS_SERVER=local uv run codeflash --file src/main/java/com/example/Fibonacci.java --function fibonacci --no-pr --verbose
CODEFLASH_CFAPI_SERVER=local CODEFLASH_AIS_SERVER=local uv run codeflash --file src/main/java/com/example/Calculator.java --function calculateStats --no-pr --verbose
  • Fibonacci: optimization found, tests pass, correctness verified, 973x speedup
  • Calculator.calculateStats: optimization found, tests pass, correctness verified, 3.6x speedup (exercises imported type resolution via MathHelpers dependency)

Unit tests: 72/72 Java context tests pass; 545 total pass (33 pre-existing failures in unrelated areas)

Impact

  • Unblocks Java E2E optimization on real-world projects with Maven builds
  • Prevents crashes when generated tests fail during setUp (common with infrastructure-dependent code)
  • Reduces AI hallucination in test generation by providing real type signatures

Related

Companion PR in codeflash-internal: prompt updates for Bug 2 (mock infrastructure deps) and Bug 3 (use provided type signatures)

🤖 Generated with Claude Code

mashraf-222 and others added 3 commits February 18, 2026 02:33
… Maven output

Add NullHighlighter to Rich Console and RichHandler instances to prevent
ANSI escape codes in Maven output from being interpreted as Rich markup.
Add -B (batch mode) flag to all Maven commands to suppress ANSI color
output at the source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… behavioral tests fail

When all behavioral tests fail for a Java/JS optimization candidate,
skip the SQLite file comparison that would crash with FileNotFoundError.
SQLite result files only exist when test instrumentation hooks fire,
which doesn't happen when tests error out in setUp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nt hallucination

Add get_java_imported_type_skeletons() that resolves project-internal imports,
extracts class declarations, fields, constructors, and public method signatures,
and appends them to the testgen context. This gives the AI real type information
instead of forcing it to hallucinate constructors and factory methods.

Follows the same pattern as Python's get_imported_class_definitions().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved but add tests please

This optimization achieves a **12% runtime improvement** by reducing redundant string operations and minimizing encode/decode overhead in Java code parsing.

**Key optimizations:**

1. **Reduced repeated `strip().splitlines()` calls**: The original code called `skeleton.fields_code.strip().splitlines()` on every loop iteration. The optimized version hoists this computation outside the loop, performing it once and reusing the result. Same for `constructors_code`. This eliminates redundant string processing.

2. **Single-pass child traversal with byte operations**: In `_extract_public_method_signatures`, the original code made two separate passes over `node.children` - first to find modifiers, then to collect signature parts. The optimized version combines these into a single pass, checking modifiers and accumulating signature bytes simultaneously.

3. **Direct byte comparison**: Instead of decoding modifier text to check `if "public" in mod_text` (which requires UTF-8 decode + string search), the optimization checks `if pub_token in mod_slice` directly on bytes. This avoids unnecessary decode operations.

4. **Deferred decoding with byte accumulation**: Rather than decoding each child's bytes immediately and joining decoded strings (`sig_parts.append(...decode("utf8"))`), the optimized code accumulates raw byte slices and performs a single `b" ".join(...).decode("utf8")` at the end. This reduces allocation overhead from multiple intermediate string objects.

**Performance impact:**

The large-scale test (1000 fields/constructors/methods) shows the strongest improvement: **1.30ms → 1.15ms (12.9% faster)**. This demonstrates the optimization scales well with code size, as the benefits of reducing redundant operations compound with larger inputs. The smaller test cases show minor variations (some slightly slower, some faster) as the overhead savings are more significant for larger workloads.

**Why it's faster:**

- Fewer string allocations and deallocations
- Reduced UTF-8 encode/decode operations (Python strings ↔ bytes conversions are expensive)
- Single traversal of AST children instead of two passes
- Minimized repeated string method calls (`strip()`, `splitlines()`)

The optimization maintains identical behavior while leveraging Python's efficient byte operations and reducing unnecessary string conversions that dominated the original implementation's runtime in the line profiler (14.5% time in decode operations alone).
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for _format_skeleton_for_context in codeflash/languages/java/context.py

⏱️ Runtime : 1.32 milliseconds 1.17 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fix/java-e2e-critical-bugs).

Static Badge

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 318% (3.18x) speedup for _extract_public_method_signatures in codeflash/languages/java/context.py

⏱️ Runtime : 5.37 milliseconds 1.28 milliseconds (best of 7 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fix/java-e2e-critical-bugs).

Static Badge

…2026-02-18T03.24.51

⚡️ Speed up function `_format_skeleton_for_context` by 13% in PR #1514 (`fix/java-e2e-critical-bugs`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 18, 2026

Add 13 tests covering:
- get_java_imported_type_skeletons(): internal import resolution,
  method signature extraction, external import filtering, deduplication,
  empty input handling, and token budget enforcement
- _extract_public_method_signatures(): public method extraction,
  constructor exclusion, empty class handling, class name filtering
- _format_skeleton_for_context(): basic class formatting, enum
  constants, empty class edge case

Also resolve merge conflict from PR #1515 optimization (bytes-based
single-pass method signature extraction).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mashraf-222
Copy link
Contributor Author

Added 13 unit tests covering get_java_imported_type_skeletons(), _extract_public_method_signatures(), and _format_skeleton_for_context(). Tests use the existing java_maven fixture project (Calculator importing MathHelper/Formatter). All 85 context tests pass.

Bug 4 (candidate_early_exit.py - 6 tests):
- All tests failed → 0 total passed (guard triggers)
- Some tests passed → nonzero (guard does not trigger)
- Empty results → 0 passed (guard triggers)
- Only non-loop1 results → ignored by report (guard triggers)
- Mixed test types all failing → 0 across all types
- Single passing among many failures → prevents early exit

Bug 3 edge cases (context.py - 8 tests):
- Wildcard imports are skipped (class_name=None)
- Import to nonexistent class returns None skeleton
- Skeleton output is well-formed Java (has braces)
- Protected and package-private methods excluded
- Overloaded public methods all extracted
- Generic method signatures extracted correctly
- Round-trip: _extract_type_skeleton → _format_skeleton_for_context
- Round-trip with real MathHelper fixture file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mashraf-222
Copy link
Contributor Author

Added 14 more tests addressing coverage gaps:

Bug 4 — early exit guard (6 tests in test_candidate_early_exit.py):

  • All tests failed / empty results / non-loop1 only → guard triggers (0 passed)
  • Some tests passed / single pass among failures → guard does not trigger
  • Mixed test types all failing → 0 across all types

Bug 3 — edge cases (8 tests in test_context.py):

  • Wildcard imports skipped (class_name=None)
  • Import to nonexistent class → None skeleton
  • Skeleton output well-formed (braces check)
  • Protected/package-private methods excluded
  • Overloaded and generic method signatures
  • Round-trip: _extract_type_skeleton_format_skeleton_for_context with real fixture

Total: 99 tests pass across both files.

@mashraf-222 mashraf-222 merged commit 0b8284f into omni-java Feb 18, 2026
22 of 34 checks passed
@mashraf-222 mashraf-222 deleted the fix/java-e2e-critical-bugs branch February 18, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants