⚡️ Speed up method `JavaScriptSupport._find_and_extract_body` by 11% in PR #1780 (`fix/normalizer`) by codeflash-ai[bot] · Pull Request #1781 · codeflash-ai/codeflash

codeflash-ai · 2026-03-06T16:17:03Z

⚡️ This pull request contains optimizations for PR #1780

If you approve this dependent PR, these changes will be merged into the original PR branch fix/normalizer.

This PR will be automatically closed if the original PR is merged.

📄 11% (0.11x) speedup for `JavaScriptSupport._find_and_extract_body` in `codeflash/languages/javascript/support.py`

⏱️ Runtime : 1.91 milliseconds → 1.72 milliseconds (best of 30 runs)

📝 Explanation and details

The hot loop replaced recursive AST traversal with an iterative stack-based DFS and eliminated per-node UTF-8 decoding by comparing byte slices via a single memoryview and pre-encoded target name. Line profiler shows the original recursive find_function_node consumed 90% of runtime (9.79 ms), while the optimized iterative loop distributes work across node.type checks (34.8%) and stack.pop() (4.1%), totaling lower overhead per node. The large-scale test with 1000 variable declarations improved 29.9% (683 µs → 526 µs), confirming the win scales with AST size. Smaller test cases regressed 8–17% due to loop setup cost exceeding recursion savings on trivial trees, a reasonable trade-off for the target workload.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 41 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import pytest  # used for our unit tests
from codeflash.languages.javascript.support import JavaScriptSupport
from codeflash.languages.javascript.treesitter import TreeSitterAnalyzer

# Helper lightweight fake tree/node objects to simulate tree-sitter parse results.
# These are small test-only helpers to provide the minimal API the function expects:
# - tree.root_node
# - node.type
# - node.children (iterable)
# - node.child_by_field_name(name) -> node or None
# - node.start_byte, node.end_byte
#
# Note: These helper classes are only used to mimic the tree-sitter API surface
# required by JavaScriptSupport._find_and_extract_body. They are not used as test
# inputs in place of the actual classes being tested (we still instantiate a real
# TreeSitterAnalyzer and a real JavaScriptSupport and call the real method).
class _FakeNode:
    def __init__(self, type_, start_byte=0, end_byte=0, children=None, field_map=None):
        # Node type string as tree-sitter would expose
        self.type = type_
        # Byte offsets into the source bytes
        self.start_byte = start_byte
        self.end_byte = end_byte
        # Ordered children list for recursion
        self.children = children or []
        # Mapping of field name -> node for child_by_field_name
        self._field_map = field_map or {}

    def child_by_field_name(self, name: str):
        # Return the mapped node for a field name, or None if absent
        return self._field_map.get(name)

class _FakeTree:
    def __init__(self, root_node):
        # tree-sitter Tree has a .root_node attribute
        self.root_node = root_node

# Utility to compute byte offsets for substrings inside a source string,
# raising a clear error if the substring is not found.
def _index_span_for(source: str, sub: str, start: int = 0):
    b = source.encode("utf8")
    idx = b.find(sub.encode("utf8"), start)
    if idx == -1:
        raise ValueError(f"Substring {sub!r} not found in source starting at {start}")
    return idx, idx + len(sub.encode("utf8"))

def test_extract_body_from_function_declaration_basic():
    # Basic scenario: a simple function declaration should have its body extracted,
    # including braces and whitespace exactly as in the source.
    source = "function foo() { return 1; }"
    js = JavaScriptSupport()  # real instance as required
    analyzer = TreeSitterAnalyzer("javascript")  # real analyzer instance

    # Find byte spans for the identifier 'foo' and the function body braces.
    name_start, name_end = _index_span_for(source, "foo")
    # Find the opening brace after the name
    body_open = source.find("{", name_end)
    # find matching closing brace (simple one-level, sufficient for this simple source)
    body_close = source.find("}", body_open)
    # include braces in the extracted span
    body_start = len(source[:body_open].encode("utf8"))
    body_end = len(source[: body_close + 1].encode("utf8"))

    # Build nodes to reflect a tree-sitter structure:
    # program -> function_declaration (with name and body fields)
    name_node = _FakeNode("identifier", start_byte=name_start, end_byte=name_end)
    body_node = _FakeNode("statement_block", start_byte=body_start, end_byte=body_end)
    func_node = _FakeNode(
        "function_declaration",
        start_byte=0,
        end_byte=len(source.encode("utf8")),
        children=[name_node, body_node],
        field_map={"name": name_node, "body": body_node},
    )
    root = _FakeNode("program", children=[func_node])
    tree = _FakeTree(root)

    # Monkeypatch the analyzer.parse method on this instance to return our fake tree.
    # This avoids needing an actual tree-sitter parser while still using a real
    # TreeSitterAnalyzer instance.
    analyzer.parse = lambda _src: tree

    # Call the real method under test and verify exact output.
    codeflash_output = js._find_and_extract_body(source, "foo", analyzer); extracted = codeflash_output # 5.15μs -> 5.87μs (12.3% slower)

def test_extract_body_from_method_definition():
    # Method inside a class should be found via type "method_definition".
    source = "class A { bar() { console.log('x'); } }"
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    name_start, name_end = _index_span_for(source, "bar")
    body_open = source.find("{", name_end)
    body_close = source.find("}", body_open)
    body_start = len(source[:body_open].encode("utf8"))
    body_end = len(source[: body_close + 1].encode("utf8"))

    name_node = _FakeNode("property_identifier", start_byte=name_start, end_byte=name_end)
    body_node = _FakeNode("statement_block", start_byte=body_start, end_byte=body_end)
    method_node = _FakeNode(
        "method_definition",
        children=[name_node, body_node],
        field_map={"name": name_node, "body": body_node},
    )
    class_node = _FakeNode("class_declaration", children=[method_node])
    root = _FakeNode("program", children=[class_node])
    tree = _FakeTree(root)

    analyzer.parse = lambda _src: tree

    codeflash_output = js._find_and_extract_body(source, "bar", analyzer); extracted = codeflash_output # 3.97μs -> 4.36μs (8.97% slower)

def test_extract_body_from_variable_arrow_function_using_statement_block_child():
    # Arrow function assigned to a const should be found through variable_declaration ->
    # variable_declarator -> value (arrow_function). The body might be a 'statement_block'
    # child rather than a 'body' field, so the fallback detection should apply.
    source = "const baz = () => { return 7; };"
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    name_start, name_end = _index_span_for(source, "baz")
    # locate the arrow function's block
    value_open = source.find("{", name_end)
    value_close = source.find("}", value_open)
    body_start = len(source[:value_open].encode("utf8"))
    body_end = len(source[: value_close + 1].encode("utf8"))

    name_node = _FakeNode("identifier", start_byte=name_start, end_byte=name_end)
    # The arrow function node is the variable value node
    value_node = _FakeNode(
        "arrow_function",
        children=[],  # intentionally not setting 'body' field to test fallback
    )
    # Add a child of type 'statement_block' to value_node so the fallback matches it
    statement_block_node = _FakeNode("statement_block", start_byte=body_start, end_byte=body_end)
    value_node.children.append(statement_block_node)

    var_decl = _FakeNode("variable_declaration", children=[])
    var_declarator = _FakeNode(
        "variable_declarator",
        children=[name_node, value_node],
        field_map={"name": name_node, "value": value_node},
    )
    var_decl.children.append(var_declarator)
    root = _FakeNode("program", children=[var_decl])
    tree = _FakeTree(root)

    analyzer.parse = lambda _src: tree

    codeflash_output = js._find_and_extract_body(source, "baz", analyzer); extracted = codeflash_output # 4.16μs -> 4.71μs (11.7% slower)

def test_return_none_when_function_not_found():
    # If the named function is not present in the AST, the method should return None.
    source = "function somethingElse() { return 0; }"
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    # Construct a tree that contains only a different function name.
    name_start, name_end = _index_span_for(source, "somethingElse")
    body_open = source.find("{", name_end)
    body_close = source.find("}", body_open)
    body_node = _FakeNode("statement_block", start_byte=len(source[:body_open].encode("utf8")), end_byte=len(source[: body_close + 1].encode("utf8")))
    name_node = _FakeNode("identifier", start_byte=name_start, end_byte=name_end)
    func_node = _FakeNode("function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})
    root = _FakeNode("program", children=[func_node])
    tree = _FakeTree(root)

    analyzer.parse = lambda _src: tree

    # Looking for a name not present should yield None
    codeflash_output = js._find_and_extract_body(source, "nonexistent", analyzer) # 6.15μs -> 7.15μs (14.0% slower)

def test_return_none_when_body_missing():
    # If a function node is found but there is no body (and no statement_block child),
    # the method should return None.
    source = "function lonely() ;"
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    # Place a function node that has a name but no body mapping and no statement_block child.
    name_start, name_end = _index_span_for(source, "lonely")
    name_node = _FakeNode("identifier", start_byte=name_start, end_byte=name_end)
    func_node = _FakeNode("function_declaration", children=[name_node], field_map={"name": name_node})
    root = _FakeNode("program", children=[func_node])
    tree = _FakeTree(root)

    analyzer.parse = lambda _src: tree

    codeflash_output = js._find_and_extract_body(source, "lonely", analyzer) # 3.29μs -> 3.87μs (15.0% slower)

def test_empty_source_returns_none():
    # Empty source should not crash and should simply return None for any name.
    source = ""
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    # Tree with no children
    root = _FakeNode("program", children=[])
    tree = _FakeTree(root)
    analyzer.parse = lambda _src: tree

    codeflash_output = js._find_and_extract_body(source, "anything", analyzer) # 2.46μs -> 2.98μs (17.2% slower)

def test_unicode_function_name_extraction():
    # Verify that non-ASCII function names are handled correctly (utf-8 encoding/decoding).
    source = "function fün() { return 'ü'; }"
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    name_start, name_end = _index_span_for(source, "fün")
    body_open = source.find("{", name_end)
    body_close = source.find("}", body_open)
    body_node = _FakeNode("statement_block", start_byte=len(source[:body_open].encode("utf8")), end_byte=len(source[: body_close + 1].encode("utf8")))
    name_node = _FakeNode("identifier", start_byte=name_start, end_byte=name_end)
    func_node = _FakeNode("function_declaration", children=[name_node, body_node], field_map={"name": name_node, "body": body_node})
    root = _FakeNode("program", children=[func_node])
    tree = _FakeTree(root)
    analyzer.parse = lambda _src: tree

    codeflash_output = js._find_and_extract_body(source, "fün", analyzer); extracted = codeflash_output # 4.25μs -> 4.81μs (11.7% slower)

def test_large_scale_many_variable_declarations():
    # Construct a large tree with many variable_declaration nodes to ensure the
    # recursive traversal is performant and correct for many siblings.
    target_index = 500  # pick a middle entry
    count = 1000  # as required by large-scale tests (up to 1000)
    # Build a source with many const declarations; ensure each name is unique.
    parts = []
    for i in range(count):
        # Use small body for each to keep the overall source small-ish
        parts.append(f"const v{i} = () => {{ return {i}; }};")
    source = " ".join(parts)
    js = JavaScriptSupport()
    analyzer = TreeSitterAnalyzer("javascript")

    # Build AST: program -> many variable_declaration nodes, each containing a variable_declarator
    root_children = []
    offset = 0  # track byte offset while constructing nodes for accurate spans
    src_bytes = source.encode("utf8")
    # We'll iterate and find spans for every occurrence; using find with a moving start point.
    search_pos = 0
    for i in range(count):
        name = f"v{i}"
        # find the identifier occurrence starting from search_pos
        name_idx = src_bytes.find(name.encode("utf8"), search_pos)
        name_end = name_idx + len(name.encode("utf8"))
        # find the opening brace for the arrow function body after the name
        brace_idx = src_bytes.find(b"{", name_end)
        brace_close_idx = src_bytes.find(b"}", brace_idx)

        name_node = _FakeNode("identifier", start_byte=name_idx, end_byte=name_end)
        # arrow function node will be the value; put a statement_block child for its body
        statement_block_node = _FakeNode("statement_block", start_byte=brace_idx, end_byte=brace_close_idx + 1)
        value_node = _FakeNode("arrow_function", children=[statement_block_node], field_map={"body": statement_block_node})
        var_declarator = _FakeNode("variable_declarator", children=[name_node, value_node], field_map={"name": name_node, "value": value_node})
        var_decl = _FakeNode("variable_declaration", children=[var_declarator])
        root_children.append(var_decl)

        # move search_pos forward to avoid finding the same identifier again
        search_pos = name_end

    root = _FakeNode("program", children=root_children)
    tree = _FakeTree(root)
    analyzer.parse = lambda _src: tree

    # Request extraction for the chosen target name
    target_name = f"v{target_index}"
    codeflash_output = js._find_and_extract_body(source, target_name, analyzer); extracted = codeflash_output # 683μs -> 526μs (29.9% faster)
    # Verify expected substring corresponds to the body braces for that variable's arrow function
    # Find the corresponding braces in the source for the chosen target to compare exact bytes.
    b_open = source.encode("utf8").find(b"{", src_bytes.find(target_name.encode("utf8")) + len(target_name.encode("utf8")))
    b_close = source.encode("utf8").find(b"}", b_open)
    expected = source.encode("utf8")[b_open : b_close + 1].decode("utf8")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from unittest.mock import MagicMock, Mock

# imports
import pytest
from codeflash.languages.javascript.support import JavaScriptSupport
from codeflash.languages.javascript.treesitter import TreeSitterAnalyzer

def test_js_benchmarking_max_loops_constant_exists():
    """Test that the benchmarking constant is defined and has expected value."""
    js_support = JavaScriptSupport()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1780-2026-03-06T16.16.57 and push.

The hot loop replaced recursive AST traversal with an iterative stack-based DFS and eliminated per-node UTF-8 decoding by comparing byte slices via a single `memoryview` and pre-encoded target name. Line profiler shows the original recursive `find_function_node` consumed 90% of runtime (9.79 ms), while the optimized iterative loop distributes work across `node.type` checks (34.8%) and `stack.pop()` (4.1%), totaling lower overhead per node. The large-scale test with 1000 variable declarations improved 29.9% (683 µs → 526 µs), confirming the win scales with AST size. Smaller test cases regressed 8–17% due to loop setup cost exceeding recursion savings on trivial trees, a reasonable trade-off for the target workload.

claude · 2026-03-06T16:17:40Z

Claude finished @codeflash-ai[bot]'s task in 18s —— View job

PR Review Summary

Prek Checks

Fixed (auto-committed): The PR added from codeflash.cli_cmds.console import logger (line 15), but support.py already defines logger = logging.getLogger(__name__) at line 30, causing an F811 redefinition error. The redundant import was removed in commit db318676. uv run prek run --from-ref origin/fix/normalizer now passes cleanly.

Code Review

Logic issue — DFS traversal order change:
codeflash/languages/javascript/support.py:1241

The original recursive code traverses children left-to-right. The new iterative version uses:

stack.extend(node.children)  # push in order
node = stack.pop()            # pop reverses the order → right-to-left DFS

This means the optimized code performs a right-to-left pre-order DFS instead of the original left-to-right. If two functions share the same name at different sibling positions in the AST (e.g., overloaded-style declarations), the new code could return a different node than the original. In practice, function names are unique within a scope, so this shouldn't cause real-world issues — but it's a subtle behavioral change worth being aware of.

Observation — decl_types children pushed to stack redundantly:
codeflash/languages/javascript/support.py:1226–1242

When processing a variable_declaration/lexical_declaration node, the code explicitly iterates over its children (looking for variable_declarator) and then falls through to stack.extend(node.children). This means the variable_declarator children will also be pushed onto the stack and processed again (harmlessly, since variable_declarator doesn't match any of the type checks and just pushes its own children). This is slightly wasteful but not incorrect.

Duplicate Detection

MEDIUM confidence: codeflash/languages/javascript/treesitter.py:1620 — _find_function_node implements similar recursive AST traversal to find function nodes by name. However, it also accepts a function_line parameter and serves a different purpose (find by name+line for replacement), so these are not true duplicates. No actionable de-duplication needed.

Test Coverage

No existing unit tests cover _find_and_extract_body. The PR reports 41 generated regression tests all passed (100% coverage). No regression in test coverage.

Summary: The optimization is sound. One lint bug was auto-fixed (redundant logger import). The traversal-order change is acceptable for the intended use case. No blocking issues.
| Fix commit: db318676

claude · 2026-03-06T18:04:40Z

Closing stale optimization PR.

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 6, 2026

codeflash-ai bot mentioned this pull request Mar 6, 2026

[Fix] Normalizer and expand its scope #1780

Merged

fix: remove redundant logger import shadowing module-level logger

db31867

claude bot closed this Mar 6, 2026

claude bot deleted the codeflash/optimize-pr1780-2026-03-06T16.16.57 branch March 6, 2026 18:04

claude bot mentioned this pull request Mar 6, 2026

candidates whose behavioral runtime exceeds 10x the baseline will skip benchmarking #1783

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `JavaScriptSupport._find_and_extract_body` by 11% in PR #1780 (`fix/normalizer`)#1781

⚡️ Speed up method `JavaScriptSupport._find_and_extract_body` by 11% in PR #1780 (`fix/normalizer`)#1781
codeflash-ai[bot] wants to merge 2 commits intofix/normalizerfrom
codeflash/optimize-pr1780-2026-03-06T16.16.57

codeflash-ai bot commented Mar 6, 2026

Uh oh!

claude bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Mar 6, 2026

⚡️ This pull request contains optimizations for PR #1780

📄 11% (0.11x) speedup for JavaScriptSupport._find_and_extract_body in codeflash/languages/javascript/support.py

📝 Explanation and details

Uh oh!

claude bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Duplicate Detection

Test Coverage

Uh oh!

claude bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 11% (0.11x) speedup for `JavaScriptSupport._find_and_extract_body` in `codeflash/languages/javascript/support.py`

claude bot commented Mar 6, 2026 •

edited

Loading