Conversation
…functionality - Added unit tests for sorting arrays and collecting Fibonacci numbers in FibonacciTest.java. - Modified Comparator.java to handle test results with stdout capturing. - Updated build_tools.py to ensure JaCoCo configuration uses a custom property for better compatibility. - Enhanced discovery.py to include return types for methods. - Improved instrumentation.py to capture stdout for void methods and serialize side effects. - Updated remove_asserts.py to handle void methods correctly during assertion transformation. - Modified CodeflashHelper.java to store captured stdout in the SQLite database. - Enhanced parse_test_output.py to read stdout from SQLite results for better test output analysis. - Updated function_types.py to include return type information in FunctionToOptimize model.
⚡️ Codeflash found optimizations for this PR📄 125% (1.25x) speedup for
|
| parts.append(receiver) | ||
| parts.extend(arg_exprs) | ||
| if not parts: | ||
| return "null" | ||
| items = ", ".join(parts) |
There was a problem hiding this comment.
⚡️Codeflash found 20% (0.20x) speedup for _build_void_serialize_expr in codeflash/languages/java/instrumentation.py
⏱️ Runtime : 64.5 microseconds → 53.9 microseconds (best of 250 runs)
📝 Explanation and details
Primary benefit — runtime improved from ~64.5μs to ~53.9μs (≈19% faster). This optimization reduces Python-level work and memory allocations in the hot path, yielding measurable wall‑clock savings.
What changed, concretely
- Avoided building a combined list of strings (parts) and doing parts.append / parts.extend followed by ", ".join(parts) in the common cases.
- When a receiver should be included and there are args, the code now constructs the items string as receiver + ", " + ", ".join(arg_exprs) instead of creating a new list with receiver + args and joining that list.
- When no receiver is present we directly join arg_exprs (no intermediate list). When neither receiver nor args exist we return "null" early.
Why this is faster (Python-level reasoning)
- List append/extend and iterating to produce a new list are Python-level operations (many bytecode steps). The original code paid the cost of parts.append(receiver) and parts.extend(arg_exprs) and then iterated that combined list in the join.
- str.join runs in C and is fast, but joining required an extra Python-level list construction to include the receiver. By joining arg_exprs directly and prepending the receiver as a small concatenation (receiver + ", " + ...), we avoid that list construction and the associated iteration/copies.
- Fewer allocations and fewer Python bytecode operations matter most when arg_exprs is large or when the common path omits a receiver — the profiler and tests show the biggest wins on those cases.
Profiler evidence
- The original profiler shows a large fraction of time spent on building and joining parts (parts.extend and ", ".join(parts)).
- The optimized profiler shows reduced time spent in the join/combination stage and fewer Python-level list operations. Overall function time drops ~13% in the profile dump and ~19% in microbenchmarks.
Behavioral impact and trade-offs
- Output semantics are preserved for all tested inputs (unit tests validate correctness).
- There is a small regression in one microcase (receiver + args with small arg lists) where string concatenation + join is marginally slower than the old path (~2.7% slower for that test). This is an acceptable trade-off given the consistent wins across most inputs — especially in the large-args and no-receiver cases which are common and where gains are largest.
- No signature or external dependency changes were made.
Which workloads benefit most
- Large arg lists: avoids creating an extra list of size N, so both memory and iteration cost drop — annotated tests and profilers show ~18–26% improvements in 1000-element cases.
- No-receiver / small-receiver cases: we skip list assembly entirely and directly join arg_exprs (common lower-cost path).
- Many short calls (hot path): reduces per-call allocation overhead, so throughput improves.
Summary
This change reduces Python-level list construction and iteration by deferring to str.join on the existing arg sequence and only doing a tiny string concat for the receiver. Fewer allocations and fewer Python operations produce the observed ~19% runtime improvement, with only a small, acceptable regression in one microcase.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 25 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 100.0% |
🌀 Click to see Generated Regression Tests
import pytest # used for our unit tests
from codeflash.languages.java.instrumentation import _build_void_serialize_expr
def test_no_receiver_no_args_returns_null():
# When the call dict has neither receiver nor arg_exprs, the function should return "null".
call = {}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 712ns -> 631ns (12.8% faster)
def test_only_static_receiver_is_excluded_and_returns_null():
# Static receivers are recognized by starting with an uppercase letter and being a valid identifier.
# They should be excluded, leaving no parts, thus returning "null".
call = {"receiver": "MyClass"}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.22μs -> 1.02μs (19.7% faster)
def test_only_args_serialized_in_order():
# When only arg_exprs are present, they should be serialized in order inside the Object[].
call = {"arg_exprs": ["a", "b", "c"]}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.26μs -> 1.08μs (16.6% faster)
def test_receiver_and_args_include_receiver_first():
# For instance methods, the receiver should be serialized first, then the args in order.
call = {"receiver": "thisObj", "arg_exprs": ["x", "y"]}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.45μs -> 1.49μs (2.68% slower)
def test_missing_arg_exprs_key_treated_as_empty_list():
# If 'arg_exprs' key is missing, the function should behave as if there are no args.
call = {"receiver": "objInstance"}
# receiver starts with lowercase so should be included
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.27μs -> 972ns (30.9% faster)
def test_uppercase_non_identifier_receiver_is_included():
# If receiver starts with uppercase but is NOT a valid identifier, it should be included.
# Example: "A-b" starts with uppercase 'A' but contains '-' so isidentifier() is False.
call = {"receiver": "A-b", "arg_exprs": ["arg1"]}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.69μs -> 1.49μs (13.4% faster)
def test_empty_string_arg_expr_is_included_as_empty_item():
# An arg expression that is an empty string is a truthy element in parts and should produce
# empty content between commas (i.e., produce brace with an empty slot), distinct from "null".
call = {"arg_exprs": [""]}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.01μs -> 822ns (23.1% faster)
def test_receiver_single_uppercase_char_excluded_results_in_null():
# Single-char uppercase receiver like "A" is a valid identifier and should be treated as a static class name.
call = {"receiver": "A"}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.20μs -> 1.02μs (17.8% faster)
def test_receiver_single_lowercase_char_included():
# Single-char lowercase receiver should be included as an instance.
call = {"receiver": "a"}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.27μs -> 1.00μs (27.0% faster)
def test_receiver_with_digits_but_starting_uppercase_is_excluded_if_identifier():
# "MyClass1" starts with uppercase and is a valid identifier, so it should be excluded.
call = {"receiver": "MyClass1", "arg_exprs": ["p"]}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.44μs -> 1.33μs (8.18% faster)
def test_arg_expressions_with_special_characters_are_kept_untouched():
# Arguments may contain special characters (not necessarily valid identifiers); they should be included raw.
call = {"arg_exprs": ["obj.field", "arr[0]", "'literal'"]}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 1.23μs -> 1.06μs (16.0% faster)
def test_large_number_of_arguments_1000_elements():
# Build a large list of argument expressions to ensure function scales and constructs the correct string.
n = 1000
args = [f"arg{i}" for i in range(n)] # real list construction
call = {"receiver": "r", "arg_exprs": args} # receiver 'r' should be included first
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 12.8μs -> 10.8μs (18.0% faster)
# Build expected payload the same way the function does: receiver first, then args joined by ", ".
expected_items = ", ".join(["r"] + args)
expected = f"com.codeflash.Serializer.serialize(new Object[]{{{expected_items}}})"
def test_large_number_of_arguments_without_receiver_1000_elements():
# Large list without receiver should still serialize correctly.
n = 1000
args = [f"val{i}" for i in range(n)]
call = {"arg_exprs": args}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 11.7μs -> 9.46μs (23.6% faster)
expected_items = ", ".join(args)
expected = f"com.codeflash.Serializer.serialize(new Object[]{{{expected_items}}})"
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.import pytest # used for our unit tests
from codeflash.languages.java.instrumentation import _build_void_serialize_expr
# Basic tests (easy -> moderate)
def test_no_receiver_no_args_returns_null():
# empty dict means no receiver and no arg_exprs -> should return literal "null"
call = {}
codeflash_output = _build_void_serialize_expr(call) # 812ns -> 651ns (24.7% faster)
def test_static_class_receiver_excluded_and_no_args_returns_null():
# Receiver looks like a static class name (starts uppercase and isidentifier) -> excluded
call = {"receiver": "MyClass"} # "MyClass" starts with uppercase and is a valid identifier
# with no args, result should be "null"
codeflash_output = _build_void_serialize_expr(call) # 1.31μs -> 1.03μs (27.2% faster)
def test_instance_receiver_and_single_arg_included_and_ordered():
# receiver is a normal instance name (lowercase) and there is one arg -> both included
call = {"receiver": "obj", "arg_exprs": ["x"]}
# expected ordering: receiver first, then arguments
expected = "com.codeflash.Serializer.serialize(new Object[]{obj, x})"
codeflash_output = _build_void_serialize_expr(call) # 1.50μs -> 1.35μs (11.1% faster)
def test_receiver_uppercase_but_not_identifier_is_included():
# receiver starts with uppercase but is not a valid identifier (contains hyphen) -> included
call = {"receiver": "Foo-bar", "arg_exprs": ["y"]}
expected = "com.codeflash.Serializer.serialize(new Object[]{Foo-bar, y})"
codeflash_output = _build_void_serialize_expr(call) # 1.74μs -> 1.50μs (16.0% faster)
def test_empty_receiver_string_treated_as_no_receiver():
# empty string is falsy; receiver should not be added
call = {"receiver": "", "arg_exprs": []}
codeflash_output = _build_void_serialize_expr(call) # 761ns -> 651ns (16.9% faster)
def test_args_only_are_serialized_in_order():
# no receiver but multiple args -> only args should appear, in order
call = {"arg_exprs": ["a", "b", "c"]}
expected = "com.codeflash.Serializer.serialize(new Object[]{a, b, c})"
codeflash_output = _build_void_serialize_expr(call) # 1.24μs -> 1.01μs (22.7% faster)
def test_arg_exprs_accept_tuple_and_are_serialized():
# arg_exprs provided as a tuple should be extendable and work the same way
call = {"arg_exprs": ("p1", "p2")}
expected = "com.codeflash.Serializer.serialize(new Object[]{p1, p2})"
codeflash_output = _build_void_serialize_expr(call) # 1.21μs -> 1.03μs (17.4% faster)
def test_special_characters_in_args_preserved_and_joined_correctly():
# arguments may contain commas, brackets, quotes, etc.; function should not escape them
call = {"arg_exprs": ['"quoted"', "arr[0]", "a,b"]}
# note: inner commas remain; join injects ", " between items
expected = 'com.codeflash.Serializer.serialize(new Object[]{"quoted", arr[0], a,b})'
codeflash_output = _build_void_serialize_expr(call) # 1.23μs -> 1.08μs (13.9% faster)
# Edge tests (harder)
def test_single_uppercase_letter_receiver_excluded_even_with_args_absent():
# single letter uppercase receiver is a valid identifier and should be considered static -> excluded
call = {"receiver": "A"}
codeflash_output = _build_void_serialize_expr(call) # 1.23μs -> 1.06μs (16.0% faster)
def test_uppercase_valid_identifier_receiver_with_args_excludes_receiver_and_keeps_args():
# when receiver looks like a static class name it is excluded even if args exist
call = {"receiver": "MyClass", "arg_exprs": ["arg1", "arg2"]}
expected = "com.codeflash.Serializer.serialize(new Object[]{arg1, arg2})"
codeflash_output = _build_void_serialize_expr(call) # 1.66μs -> 1.51μs (9.91% faster)
def test_receiver_with_non_identifier_characters_included_even_if_uppercase():
# ensure logic does not exclude uppercase names that are not valid identifiers
call = {"receiver": "Z*Z", "arg_exprs": []}
# "Z*Z".isidentifier() == False, so receiver should be included -> single item serialized
expected = "com.codeflash.Serializer.serialize(new Object[]{Z*Z})"
codeflash_output = _build_void_serialize_expr(call) # 1.42μs -> 1.22μs (16.4% faster)
# Large-scale tests (performance / scalability)
def test_large_number_of_arguments_serialized_correctly():
# construct 1000 arg expressions (the specification allows up to 1000 elements)
num_args = 1000
args = [f"v{i}" for i in range(num_args)]
call = {"arg_exprs": args}
codeflash_output = _build_void_serialize_expr(call); result = codeflash_output # 12.1μs -> 9.55μs (26.4% faster)
# count of separators ", " should be num_args - 1
separators = result.count(", ")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.To test or edit this optimization locally git merge codeflash/optimize-pr1655-2026-02-25T05.34.49
| parts.append(receiver) | |
| parts.extend(arg_exprs) | |
| if not parts: | |
| return "null" | |
| items = ", ".join(parts) | |
| # If we need to include the receiver, avoid allocating an extra list by | |
| # concatenating receiver and the joined arg expressions directly. | |
| if arg_exprs: | |
| items = receiver + ", " + ", ".join(arg_exprs) | |
| else: | |
| items = receiver | |
| else: | |
| # No receiver to include; if there are no arg expressions, return "null" | |
| if not arg_exprs: | |
| return "null" | |
| items = ", ".join(arg_exprs) |
⚡️ Codeflash found optimizations for this PR📄 85% (0.85x) speedup for
|
⚡️ Codeflash found optimizations for this PR📄 32% (0.32x) speedup for
|
…flict Kept both the omni-java test_time_correction_instrumentation method and our TestVoidFunctionInstrumentation class. Fixed diff ordering assertion in test_void_multiple_invocations_mixed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
⚡️ Codeflash found optimizations for this PR📄 29% (0.29x) speedup for
|
⚡️ Codeflash found optimizations for this PR📄 18% (0.18x) speedup for
|
…trumentation to support return type handling
Sort class names in _get_test_class_names() to match Maven Surefire's alphabetical execution order. Without sorting, iteration_id collisions across test classes resolve differently between Maven (original) and direct JVM (candidate) runs, causing spurious "DIFFERENT" comparisons. Also rebuild the codeflash-runtime JAR to include the Comparator$TestResult inner class needed for stdout comparison support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
…ison The iteration_id stored in SQLite was just the call_counter (e.g., "1"), which caused collisions when multiple test files/methods each had their first call to the target function. The Comparator's HashMap would keep only the last writer for each key, causing different test data to be compared between original and candidate runs. Changed iteration_id format to "ClassName.testMethodName.callCounter_testIteration" which is globally unique across all test files and properly maps between original (suffix _0) and candidate (suffix _1) runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
for Java behavior mode