fix: handle multi-line function calls in behavior instrumentation by HeshamHM28 · Pull Request #1913 · codeflash-ai/codeflash

HeshamHM28 · 2026-03-27T05:24:39Z

Problem

wrap_target_calls_with_treesitter() in instrumentation.py processes test method bodies line-by-line, but tree-sitter byte offsets can span multiple lines. When a function call like:

String result = augmentToolInputSchema(baseSchema, propertyName,
        description, required);

gets instrumented, only the first line is replaced. The continuation line becomes an orphaned fragment:

String result = _cf_result1_1   // replacement stops here
        description, required);  // ORPHANED → "';' expected", "not a statement"

Fix

Rewrote wrap_target_calls_with_treesitter() to operate on the full body_text string with pre-computed character offsets, processing calls back-to-front — the same approach _add_timing_instrumentation already uses successfully.

Key changes:

Pre-compute all character offsets before any text modifications
Replace calls in the full body text (not individual lines) so multi-line spans are handled correctly
Process in reverse order so earlier offsets remain valid after later replacements

Test plan

Added TestMultiLineCallInstrumentation with 2 new tests:
- Multi-line expression statement call (behavior mode)
- Multi-line embedded call inside assertEquals (behavior mode)
All 46 instrumentation tests pass (44 existing + 2 new)
Run against Spring AI to verify the 80% skip rate is resolved

🤖 Generated with Claude Code

wrap_target_calls_with_treesitter() operated line-by-line but tree-sitter byte offsets span multiple lines. Multi-line calls like: func(arg1, arg2); only had the first line replaced, leaving orphaned continuation lines that caused compilation errors (80% of Spring AI functions skipped). Rewrote to operate on the full body text with pre-computed character offsets, processing calls back-to-front — same approach as _add_timing_instrumentation which never had this bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-03-27T05:25:21Z

Claude finished @HeshamHM28's task in 5m 3s —— View job

PR Review — `fix: handle multi-line function calls in behavior instrumentation`

Triage PR size — LARGE (116 lines production code changed)
Run lint and typecheck — all clean
Resolve stale threads — none
Review code
Duplicate detection
Test coverage analysis

Prek Checks

All clean — ruff (check + format) pass, mypy reports no issues.

Code Review

Approach is correct. The rewrite from line-by-line to whole-text processing with pre-computed character offsets is the right fix. The back-to-front processing invariant is sound: since calls are processed in reverse byte order, each edit only affects positions ≥ the current call's offset, leaving pre-computed offsets for earlier calls valid. This mirrors the pattern already used in _add_timing_instrumentation. All 46 tests pass.

One bug to flag:

Performance mode paths are untested for multi-line calls. Lines 385 and 417 (the precise_call_timing=False branches for expression-statement and embedded cases) are not exercised by the new tests. The new tests only cover mode="behavior". While the logic is correct (same offset approach), the motivating bug (multi-line orphaned continuation) can also occur in performance mode. A multi-line expression_statement in performance mode replaces es_start..es_end with f"{capture_stmt_with_decl} {serialize_stmt}" — the capture_stmt_with_decl includes the raw multi-line full_call text, which puts {serialize_stmt} on the continuation line. This compiles fine but is subtly different behavior than the single-line case and worth explicit regression coverage.

Fix this →

Minor observation: _infer_array_cast_type(orig_line) at line 341 only sees the first line of a multi-line call (e.g. assertArrayEquals(new int[]{...},\n func())). This is a pre-existing limitation, not a regression, and works correctly when the array literal and assertion are on the first line.

Duplicate Detection

No duplicates detected. wrap_target_calls_with_treesitter is defined only in codeflash/languages/java/instrumentation.py, and the back-to-front offset pattern it now uses is consistent with (not duplicated from) _add_timing_instrumentation in the same file.

Test Coverage

Coverage: 83% for instrumentation.py (up from effectively the same baseline — no regression)
The two new tests cover expression_statement multi-line + embedded multi-line in behavior mode
Lines 353–355, 385, 417 (performance mode paths in the rewritten function) remain uncovered — these are the paths most worth adding tests for, as described above

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mashraf-222

Validated E2E with multi-line calls in test code (both embedded in assertEquals and standalone expression statements). Behavior instrumentation handles multi-line byte ranges correctly — instrumented code compiles, runs, and produces optimization candidates. 46/46 unit tests pass.

HeshamHM28 requested a review from mashraf-222 March 27, 2026 05:26

test: use full string equality for multi-line instrumentation tests

94d9ff9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mashraf-222 approved these changes Mar 31, 2026

View reviewed changes

HeshamHM28 merged commit 944a6a9 into main Mar 31, 2026
31 of 32 checks passed

HeshamHM28 deleted the cf-fix-multiline-instrumentation branch March 31, 2026 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle multi-line function calls in behavior instrumentation#1913

fix: handle multi-line function calls in behavior instrumentation#1913
HeshamHM28 merged 2 commits intomainfrom
cf-fix-multiline-instrumentation

HeshamHM28 commented Mar 27, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

mashraf-222 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HeshamHM28 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test plan

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — fix: handle multi-line function calls in behavior instrumentation

Prek Checks

Code Review

Duplicate Detection

Test Coverage

Uh oh!

mashraf-222 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HeshamHM28 commented Mar 27, 2026 •

edited

Loading

claude bot commented Mar 27, 2026 •

edited

Loading

PR Review — `fix: handle multi-line function calls in behavior instrumentation`