Skip to content

fix: handle multi-line function calls in behavior instrumentation#1913

Merged
HeshamHM28 merged 2 commits intomainfrom
cf-fix-multiline-instrumentation
Mar 31, 2026
Merged

fix: handle multi-line function calls in behavior instrumentation#1913
HeshamHM28 merged 2 commits intomainfrom
cf-fix-multiline-instrumentation

Conversation

@HeshamHM28
Copy link
Copy Markdown
Contributor

@HeshamHM28 HeshamHM28 commented Mar 27, 2026

Problem

wrap_target_calls_with_treesitter() in instrumentation.py processes test method bodies line-by-line, but tree-sitter byte offsets can span multiple lines. When a function call like:

String result = augmentToolInputSchema(baseSchema, propertyName,
        description, required);

gets instrumented, only the first line is replaced. The continuation line becomes an orphaned fragment:

String result = _cf_result1_1   // replacement stops here
        description, required);  // ORPHANED → "';' expected", "not a statement"

Fix

Rewrote wrap_target_calls_with_treesitter() to operate on the full body_text string with pre-computed character offsets, processing calls back-to-front — the same approach _add_timing_instrumentation already uses successfully.

Key changes:

  • Pre-compute all character offsets before any text modifications
  • Replace calls in the full body text (not individual lines) so multi-line spans are handled correctly
  • Process in reverse order so earlier offsets remain valid after later replacements

Test plan

  • Added TestMultiLineCallInstrumentation with 2 new tests:
    • Multi-line expression statement call (behavior mode)
    • Multi-line embedded call inside assertEquals (behavior mode)
  • All 46 instrumentation tests pass (44 existing + 2 new)
  • Run against Spring AI to verify the 80% skip rate is resolved

🤖 Generated with Claude Code

wrap_target_calls_with_treesitter() operated line-by-line but tree-sitter
byte offsets span multiple lines. Multi-line calls like:
  func(arg1,
       arg2);
only had the first line replaced, leaving orphaned continuation lines
that caused compilation errors (80% of Spring AI functions skipped).

Rewrote to operate on the full body text with pre-computed character
offsets, processing calls back-to-front — same approach as
_add_timing_instrumentation which never had this bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 27, 2026

Claude finished @HeshamHM28's task in 5m 3s —— View job


PR Review — fix: handle multi-line function calls in behavior instrumentation

  • Triage PR size — LARGE (116 lines production code changed)
  • Run lint and typecheck — all clean
  • Resolve stale threads — none
  • Review code
  • Duplicate detection
  • Test coverage analysis

Prek Checks

All clean — ruff (check + format) pass, mypy reports no issues.


Code Review

Approach is correct. The rewrite from line-by-line to whole-text processing with pre-computed character offsets is the right fix. The back-to-front processing invariant is sound: since calls are processed in reverse byte order, each edit only affects positions ≥ the current call's offset, leaving pre-computed offsets for earlier calls valid. This mirrors the pattern already used in _add_timing_instrumentation. All 46 tests pass.

One bug to flag:

Performance mode paths are untested for multi-line calls. Lines 385 and 417 (the precise_call_timing=False branches for expression-statement and embedded cases) are not exercised by the new tests. The new tests only cover mode="behavior". While the logic is correct (same offset approach), the motivating bug (multi-line orphaned continuation) can also occur in performance mode. A multi-line expression_statement in performance mode replaces es_start..es_end with f"{capture_stmt_with_decl} {serialize_stmt}" — the capture_stmt_with_decl includes the raw multi-line full_call text, which puts {serialize_stmt} on the continuation line. This compiles fine but is subtly different behavior than the single-line case and worth explicit regression coverage.

Fix this →

Minor observation: _infer_array_cast_type(orig_line) at line 341 only sees the first line of a multi-line call (e.g. assertArrayEquals(new int[]{...},\n func())). This is a pre-existing limitation, not a regression, and works correctly when the array literal and assertion are on the first line.


Duplicate Detection

No duplicates detected. wrap_target_calls_with_treesitter is defined only in codeflash/languages/java/instrumentation.py, and the back-to-front offset pattern it now uses is consistent with (not duplicated from) _add_timing_instrumentation in the same file.


Test Coverage

  • Coverage: 83% for instrumentation.py (up from effectively the same baseline — no regression)
  • The two new tests cover expression_statement multi-line + embedded multi-line in behavior mode
  • Lines 353–355, 385, 417 (performance mode paths in the rewritten function) remain uncovered — these are the paths most worth adding tests for, as described above

@HeshamHM28 HeshamHM28 requested a review from mashraf-222 March 27, 2026 05:26
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@mashraf-222 mashraf-222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated E2E with multi-line calls in test code (both embedded in assertEquals and standalone expression statements). Behavior instrumentation handles multi-line byte ranges correctly — instrumented code compiles, runs, and produces optimization candidates. 46/46 unit tests pass.

@HeshamHM28 HeshamHM28 merged commit 944a6a9 into main Mar 31, 2026
31 of 32 checks passed
@HeshamHM28 HeshamHM28 deleted the cf-fix-multiline-instrumentation branch March 31, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants