Skip to content

⚡️ Speed up function _expr_matches_name by 26% in PR #1660 (unstructured-inference)#1850

Merged
KRRT7 merged 1 commit intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-17T03.31.35
Mar 17, 2026
Merged

⚡️ Speed up function _expr_matches_name by 26% in PR #1660 (unstructured-inference)#1850
KRRT7 merged 1 commit intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-17T03.31.35

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Mar 17, 2026

⚡️ This pull request contains optimizations for PR #1660

If you approve this dependent PR, these changes will be merged into the original PR branch unstructured-inference.

This PR will be automatically closed if the original PR is merged.


📄 26% (0.26x) speedup for _expr_matches_name in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 419 microseconds 333 microseconds (best of 5 runs)

⚡️ This change will improve the performance of the following benchmarks:

{benchmark_info_improved}

🔻 This change will degrade the performance of the following benchmarks:

Benchmark File :: Function Original Runtime Expected New Runtime Slowdown
tests.benchmarks.test_benchmark_code_extract_code_context::test_benchmark_extract 17.0 seconds 17.0 seconds 0.00%

📝 Explanation and details

The optimization replaced recursive calls in _get_expr_name with an iterative loop that walks attribute chains once, collecting parts into a list and reversing them only at the end, eliminating function-call overhead that dominated 46% of original runtime (line profiler shows recursive calls at 1154 ns/hit vs. the new loop iterations at ~300 ns/hit). Additionally, _expr_matches_name now precomputes "." + suffix once instead of building it twice per invocation via f-strings, cutting redundant string allocations. The net 26% runtime improvement comes primarily from avoiding Python's recursion stack and reducing temporary object creation in the hot path, with all tests passing and only minor per-test slowdowns (typically 10–25%) offset by dramatic wins on deep attribute chains (up to 393% faster for 100-level nesting).

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 231 Passed
⏪ Replay Tests 1 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import ast  # to construct real AST nodes as inputs

import pytest  # used for our unit tests
# import the function under test from the real module path
from codeflash.languages.python.context.code_context_extractor import \
    _expr_matches_name
from jedi.api.classes import Name

def test_name_node_matches_exact_suffix():
    # Create a simple Name node with id "foo"
    node = ast.Name(id="foo", ctx=ast.Load())
    # No import aliases needed
    import_aliases = {}
    # Suffix equals the name -> should match
    assert _expr_matches_name(node, import_aliases, "foo") is True # 801ns -> 1.35μs (40.8% slower)

def test_attribute_node_matches_suffix_by_attribute_name():
    # Build an Attribute node representing "pkg.mod"
    pkg = ast.Name(id="pkg", ctx=ast.Load())
    node = ast.Attribute(value=pkg, attr="mod", ctx=ast.Load())
    import_aliases = {}
    # Suffix equals the attribute name -> should match because expr_name == "pkg.mod" endswith ".mod"
    assert _expr_matches_name(node, import_aliases, "mod") is True # 2.28μs -> 2.63μs (13.3% slower)

def test_call_node_unwraps_to_function_name():
    # A Call whose func is a Name should be unwrapped to the name "bar"
    call = ast.Call(func=ast.Name(id="bar", ctx=ast.Load()), args=[], keywords=[])
    import_aliases = {}
    # Suffix equal to the underlying function name
    assert _expr_matches_name(call, import_aliases, "bar") is True # 1.19μs -> 1.58μs (24.7% slower)

def test_import_aliases_resolve_to_matching_suffix():
    # Name node uses an alias that resolves to a dotted package path ending with the suffix
    node = ast.Name(id="alias", ctx=ast.Load())
    import_aliases = {"alias": "some.long.package.target"}  # resolved name ends with ".target"
    assert _expr_matches_name(node, import_aliases, "target") is True # 1.77μs -> 2.04μs (13.2% slower)

def test_import_aliases_resolve_to_exact_suffix():
    # Alias resolves exactly to the suffix (no dot). Should match too.
    node = ast.Name(id="a", ctx=ast.Load())
    import_aliases = {"a": "final"}  # resolved name equals the suffix
    assert _expr_matches_name(node, import_aliases, "final") is True # 1.54μs -> 1.77μs (13.0% slower)

def test_no_match_returns_false_for_unrelated_name():
    # Name that does not match suffix and is not in aliases should return False
    node = ast.Name(id="unrelated", ctx=ast.Load())
    import_aliases = {}
    assert _expr_matches_name(node, import_aliases, "something_else") is False # 1.44μs -> 1.77μs (18.7% slower)

def test_none_node_returns_false():
    # Passing None for the node should be gracefully handled and return False
    assert _expr_matches_name(None, {}, "anything") is False # 621ns -> 722ns (14.0% slower)

def test_unhandled_node_type_returns_false():
    # For node types _get_expr_name doesn't handle (e.g., BinOp), the function should return False
    node = ast.BinOp(left=ast.Constant(1), op=ast.Add(), right=ast.Constant(2))
    assert _expr_matches_name(node, {}, "1") is False # 1.06μs -> 1.18μs (10.2% slower)

def test_empty_suffix_behaviour_typical_case():
    # An empty suffix is an odd input. For normal expr names we expect no match (unless names end with a dot).
    node = ast.Name(id="name", ctx=ast.Load())
    assert _expr_matches_name(node, {}, "") is False # 1.49μs -> 1.78μs (16.3% slower)

def test_suffix_with_dots_matches_full_resolved_path():
    # Suffix may itself contain dots; ensure matching uses endswith semantics properly
    node = ast.Name(id="alias_x", ctx=ast.Load())
    import_aliases = {"alias_x": "pkg.subpkg.target.deep"}
    # suffix contains dots and matches the tail of the resolved path
    assert _expr_matches_name(node, import_aliases, "target.deep") is True # 1.73μs -> 2.17μs (20.2% slower)

def test_attribute_chain_unwrapped_by_call_then_resolved_by_alias():
    # A Call around an Attribute should unwrap to the dotted attribute name and then be resolved via aliases
    # Build attribute "a.b"
    attr = ast.Attribute(value=ast.Name(id="a", ctx=ast.Load()), attr="b", ctx=ast.Load())
    call = ast.Call(func=attr, args=[], keywords=[])
    # Suppose the expression name "a.b" is used as a key in import_aliases and resolves to "x.y.final"
    import_aliases = {"a.b": "x.y.final"}
    assert _expr_matches_name(call, import_aliases, "final") is True # 3.06μs -> 3.30μs (7.01% slower)

def test_import_alias_present_but_not_matching_suffix_returns_false():
    # If an alias resolves to something that does not end with the suffix, we should get False
    node = ast.Name(id="alias_nonmatch", ctx=ast.Load())
    import_aliases = {"alias_nonmatch": "some.other.value"}
    assert _expr_matches_name(node, import_aliases, "not_present") is False # 1.74μs -> 2.08μs (16.4% slower)

def test_large_import_aliases_lookup_and_repeated_calls():
    # Build a large import_aliases mapping to test performance with realistic data
    import_aliases = {f"a{i}": f"pkg{i}.target" for i in range(1000)}
    
    # Test with a representative sample from different parts of the alias map
    test_indices = [0, 250, 500, 750, 999]
    for idx in test_indices:
        node = ast.Name(id=f"a{idx}", ctx=ast.Load())
        # Each node should match because their resolved names end with ".target"
        assert _expr_matches_name(node, import_aliases, "target") is True # 5.11μs -> 5.81μs (12.0% slower)
        # Also verify they don't match a different suffix
        assert _expr_matches_name(node, import_aliases, "other_target") is False

def test_long_attribute_chain_matches_deep_suffix():
    # Build an attribute chain of reasonable depth to test nested attribute handling
    depth = 100
    current = ast.Name(id="root", ctx=ast.Load())
    for i in range(1, depth + 1):
        current = ast.Attribute(value=current, attr=f"p{i}", ctx=ast.Load())
    
    suffix = f"p{depth}"
    import_aliases = {}
    # Should match the deep suffix
    assert _expr_matches_name(current, import_aliases, suffix) is True # 80.2μs -> 16.3μs (393% faster)
    
    # Also test that it doesn't match an unrelated suffix
    assert _expr_matches_name(current, import_aliases, "unrelated") is False # 40.2μs -> 13.1μs (207% faster)
    
    # Test with import aliases as well - create an alias that resolves to a path ending with the suffix
    import_aliases_with_alias = {"root": f"module.chain.p{depth}"}
    # This won't match because we're checking the actual expression name against the suffix
    # But it demonstrates handling of aliases with deep paths
    assert _expr_matches_name(current, import_aliases_with_alias, f"p{depth}") is True # 38.5μs -> 11.8μs (227% faster)
import ast

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    _expr_matches_name
from jedi.api.classes import Name

def test_exact_name_match():
    """Test that exact name matches return True."""
    # Create a Name node with id 'foo'
    node = ast.Name(id='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo') # 1.73μs -> 2.33μs (25.5% slower)
    assert result is True

def test_exact_name_no_match():
    """Test that non-matching names return False."""
    # Create a Name node with id 'foo'
    node = ast.Name(id='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'bar') # 1.71μs -> 2.25μs (24.0% slower)
    assert result is False

def test_dotted_suffix_match():
    """Test that dotted names matching the suffix return True."""
    # Create an Attribute node representing 'module.foo'
    module_node = ast.Name(id='module', ctx=ast.Load())
    node = ast.Attribute(value=module_node, attr='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo') # 2.75μs -> 2.94μs (6.13% slower)
    assert result is True

def test_dotted_suffix_no_match():
    """Test that dotted names not matching the suffix return False."""
    # Create an Attribute node representing 'module.foo'
    module_node = ast.Name(id='module', ctx=ast.Load())
    node = ast.Attribute(value=module_node, attr='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'bar') # 2.58μs -> 2.92μs (12.0% slower)
    assert result is False

def test_deeply_nested_attribute():
    """Test deeply nested attribute access like 'a.b.c' matching suffix 'c'."""
    # Create: a.b.c
    a_node = ast.Name(id='a', ctx=ast.Load())
    b_node = ast.Attribute(value=a_node, attr='b', ctx=ast.Load())
    c_node = ast.Attribute(value=b_node, attr='c', ctx=ast.Load())
    result = _expr_matches_name(c_node, {}, 'c') # 2.58μs -> 2.92μs (12.0% slower)
    assert result is True

def test_call_node_with_name():
    """Test that Call nodes extract the function name correctly."""
    # Create a Call node: foo()
    func_node = ast.Name(id='foo', ctx=ast.Load())
    call_node = ast.Call(func=func_node, args=[], keywords=[])
    result = _expr_matches_name(call_node, {}, 'foo') # 1.35μs -> 1.72μs (21.5% slower)
    assert result is True

def test_call_node_with_attribute():
    """Test that Call nodes with attribute functions work correctly."""
    # Create a Call node: module.foo()
    module_node = ast.Name(id='module', ctx=ast.Load())
    attr_node = ast.Attribute(value=module_node, attr='foo', ctx=ast.Load())
    call_node = ast.Call(func=attr_node, args=[], keywords=[])
    result = _expr_matches_name(call_node, {}, 'foo') # 2.44μs -> 2.81μs (13.2% slower)
    assert result is True

def test_alias_direct_match():
    """Test that import aliases are resolved correctly for direct matches."""
    # Create a Name node 'alias' that maps to 'suffix'
    node = ast.Name(id='alias', ctx=ast.Load())
    aliases = {'alias': 'suffix'}
    result = _expr_matches_name(node, aliases, 'suffix') # 1.68μs -> 2.06μs (18.4% slower)
    assert result is True

def test_alias_dotted_match():
    """Test that import aliases are resolved correctly for dotted matches."""
    # Create a Name node 'alias' that maps to 'module.suffix'
    node = ast.Name(id='alias', ctx=ast.Load())
    aliases = {'alias': 'module.suffix'}
    result = _expr_matches_name(node, aliases, 'suffix') # 1.78μs -> 2.15μs (17.2% slower)
    assert result is True

def test_none_node():
    """Test that None node returns False."""
    result = _expr_matches_name(None, {}, 'foo') # 661ns -> 701ns (5.71% slower)
    assert result is False

def test_empty_aliases():
    """Test with empty import_aliases dictionary."""
    node = ast.Name(id='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo') # 772ns -> 1.32μs (41.6% slower)
    assert result is True

def test_empty_suffix():
    """Test with empty suffix string."""
    node = ast.Name(id='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, '') # 1.51μs -> 1.81μs (16.5% slower)
    assert result is False

def test_suffix_with_spaces():
    """Test suffix containing spaces."""
    node = ast.Name(id='foo bar', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo bar') # 781ns -> 1.33μs (41.4% slower)
    assert result is True

def test_suffix_with_underscores():
    """Test suffix containing underscores (common Python naming)."""
    node = ast.Name(id='_private', ctx=ast.Load())
    result = _expr_matches_name(node, {}, '_private') # 782ns -> 1.23μs (36.5% slower)
    assert result is True

def test_suffix_with_numbers():
    """Test suffix containing numbers."""
    node = ast.Name(id='var123', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'var123') # 791ns -> 1.31μs (39.7% slower)
    assert result is True

def test_partial_name_match_fails():
    """Test that partial substring matches do not pass."""
    # 'foo' should not match suffix 'fo'
    node = ast.Name(id='foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'fo') # 1.48μs -> 1.87μs (20.9% slower)
    assert result is False

def test_prefix_not_sufficient():
    """Test that just matching the prefix does not pass."""
    # 'foobar' should not match suffix 'foo' even though 'foo' is a prefix
    node = ast.Name(id='foobar', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo') # 1.44μs -> 1.85μs (22.1% slower)
    assert result is False

def test_case_sensitive_matching():
    """Test that matching is case-sensitive."""
    node = ast.Name(id='Foo', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo') # 1.46μs -> 1.74μs (16.1% slower)
    assert result is False

def test_alias_not_in_dict():
    """Test that unaliased names in suffix form are not resolved."""
    node = ast.Name(id='unknown', ctx=ast.Load())
    aliases = {'other': 'module.foo'}
    result = _expr_matches_name(node, aliases, 'foo') # 1.43μs -> 1.90μs (24.7% slower)
    assert result is False

def test_alias_points_to_unrelated_name():
    """Test that aliases pointing to unrelated names don't match."""
    node = ast.Name(id='alias', ctx=ast.Load())
    aliases = {'alias': 'completely.different.name'}
    result = _expr_matches_name(node, aliases, 'foo') # 1.77μs -> 2.04μs (13.2% slower)
    assert result is False

def test_dotted_name_partial_suffix_no_match():
    """Test that partial dotted suffix matching fails."""
    # 'module.foobar' should not match suffix 'foo'
    module_node = ast.Name(id='module', ctx=ast.Load())
    node = ast.Attribute(value=module_node, attr='foobar', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'foo') # 2.52μs -> 2.90μs (12.8% slower)
    assert result is False

def test_triple_nested_attribute():
    """Test triple nested attributes like 'a.b.c.d' matching 'd'."""
    a_node = ast.Name(id='a', ctx=ast.Load())
    b_node = ast.Attribute(value=a_node, attr='b', ctx=ast.Load())
    c_node = ast.Attribute(value=b_node, attr='c', ctx=ast.Load())
    d_node = ast.Attribute(value=c_node, attr='d', ctx=ast.Load())
    result = _expr_matches_name(d_node, {}, 'd') # 2.83μs -> 2.95μs (3.77% slower)
    assert result is True

def test_call_with_arguments():
    """Test Call nodes with arguments are handled correctly."""
    # Create: foo(1, 2, 3)
    func_node = ast.Name(id='foo', ctx=ast.Load())
    arg1 = ast.Constant(value=1)
    arg2 = ast.Constant(value=2)
    call_node = ast.Call(func=func_node, args=[arg1, arg2], keywords=[])
    result = _expr_matches_name(call_node, {}, 'foo') # 1.30μs -> 1.54μs (15.6% slower)
    assert result is True

def test_nested_call():
    """Test nested function calls like 'foo(bar())'."""
    inner_func = ast.Name(id='bar', ctx=ast.Load())
    inner_call = ast.Call(func=inner_func, args=[], keywords=[])
    outer_func = ast.Name(id='foo', ctx=ast.Load())
    outer_call = ast.Call(func=outer_func, args=[inner_call], keywords=[])
    result = _expr_matches_name(outer_call, {}, 'foo') # 1.19μs -> 1.49μs (20.2% slower)
    assert result is True

def test_single_char_name():
    """Test single character names."""
    node = ast.Name(id='x', ctx=ast.Load())
    result = _expr_matches_name(node, {}, 'x') # 832ns -> 1.33μs (37.5% slower)
    assert result is True

def test_long_dotted_chain_partial_match_middle():
    """Test that middle portions of dotted chains don't match."""
    # 'a.b.c.d.e' should not match suffix 'c'
    a_node = ast.Name(id='a', ctx=ast.Load())
    b_node = ast.Attribute(value=a_node, attr='b', ctx=ast.Load())
    c_node = ast.Attribute(value=b_node, attr='c', ctx=ast.Load())
    d_node = ast.Attribute(value=c_node, attr='d', ctx=ast.Load())
    e_node = ast.Attribute(value=d_node, attr='e', ctx=ast.Load())
    result = _expr_matches_name(e_node, {}, 'c') # 3.41μs -> 3.39μs (0.590% faster)
    assert result is False

def test_unsupported_node_type():
    """Test that unsupported node types return False."""
    # BinOp is not handled by _get_expr_name
    left = ast.Constant(value=1)
    right = ast.Constant(value=2)
    binop_node = ast.BinOp(left=left, op=ast.Add(), right=right)
    result = _expr_matches_name(binop_node, {}, 'anything') # 1.05μs -> 1.18μs (11.0% slower)
    assert result is False

def test_alias_chain_simple():
    """Test that simple alias chains work."""
    node = ast.Name(id='x', ctx=ast.Load())
    aliases = {'x': 'y'}
    result = _expr_matches_name(node, aliases, 'y') # 1.72μs -> 2.09μs (17.7% slower)
    assert result is True

def test_dotted_name_matching_full_path():
    """Test matching entire dotted path."""
    a_node = ast.Name(id='a', ctx=ast.Load())
    b_node = ast.Attribute(value=a_node, attr='b', ctx=ast.Load())
    result = _expr_matches_name(b_node, {}, 'a.b') # 1.64μs -> 2.27μs (27.8% slower)
    assert result is True

def test_dotted_name_no_match_full_path():
    """Test non-matching entire dotted path."""
    a_node = ast.Name(id='a', ctx=ast.Load())
    b_node = ast.Attribute(value=a_node, attr='b', ctx=ast.Load())
    result = _expr_matches_name(b_node, {}, 'x.y') # 2.30μs -> 2.75μs (16.4% slower)
    assert result is False

def test_alias_with_dotted_suffix():
    """Test alias mapping to dotted name matching suffix."""
    node = ast.Name(id='import_name', ctx=ast.Load())
    aliases = {'import_name': 'package.module.func'}
    result = _expr_matches_name(node, aliases, 'func') # 2.04μs -> 2.40μs (14.7% slower)
    assert result is True

def test_many_aliases_dict():
    """Test with multiple aliases of varying sizes and realistic usage patterns."""
    results = []
    
    # Test with realistic alias dict sizes
    for dict_size in [5, 15, 30]:
        aliases = {f'import_{i}': f'module.func{i % 3}' for i in range(dict_size)}
        
        # Test diverse lookup patterns: match, no match, different depths
        test_cases = [
            (0, f'func{0 % 3}', True),
            (dict_size - 1, f'func{(dict_size - 1) % 3}', True),
            (dict_size // 2, 'nonexistent', False),
            (1, f'func{1 % 3}', True),
            (dict_size - 2, 'other_func', False),
        ]
        
        for alias_idx, suffix, expected in test_cases:
            node = ast.Name(id=f'import_{alias_idx}', ctx=ast.Load())
            result = _expr_matches_name(node, aliases, suffix)
            results.append(result == expected)
    
    assert all(results)

def test_many_aliases_no_match():
    """Test with aliases dict with varying match/no-match ratios."""
    results = []
    
    # Test with realistic dict sizes and mixed match/no-match cases
    for dict_size in [10, 25, 50]:
        aliases = {f'alias_{i}': f'module_{i % 5}.target' for i in range(dict_size)}
        
        # Test cases with different match outcomes
        test_cases = [
            ('missing_key', 'target', False),  # Key not in aliases
            ('alias_0', 'target', True),       # Key exists and suffix matches
            ('alias_5', 'target', True),       # Different key, same suffix
            ('alias_10', 'wrong', False),      # Wrong suffix
        ]
        
        for key, suffix, expected in test_cases:
            if key in aliases or key == 'missing_key':
                node = ast.Name(id=key, ctx=ast.Load())
                result = _expr_matches_name(node, aliases, suffix)
                results.append(result == expected)
    
    assert all(results)

def test_deeply_nested_attributes_large():
    """Test handling of deeply nested attributes with realistic nesting depths."""
    results = []
    
    # Test realistic nesting depths found in production code (up to 6-8 levels)
    for depth in [2, 4, 6, 8]:
        node = ast.Name(id='root', ctx=ast.Load())
        for i in range(depth):
            node = ast.Attribute(value=node, attr=f'attr{i}', ctx=ast.Load())
        
        # Test matching the final attribute
        result = _expr_matches_name(node, {}, f'attr{depth - 1}') # 8.42μs -> 8.28μs (1.72% faster)
        results.append(result)
        
        # Test non-matching case
        result_no_match = _expr_matches_name(node, {}, 'nonexistent')
        results.append(not result_no_match)
        
        # Test matching intermediate attribute (should fail)
        if depth > 1:
            result_middle = _expr_matches_name(node, {}, f'attr{depth // 2}') # 7.50μs -> 6.41μs (17.0% faster)
            results.append(not result_middle) # 7.50μs -> 6.41μs (17.0% faster)
    
    assert all(results)

def test_deeply_nested_attributes_no_match():
    """Test deeply nested attributes that don't match with realistic depths."""
    results = []
    
    # Test realistic nesting depths
    for depth in [3, 5, 7]:
        node = ast.Name(id='root', ctx=ast.Load())
        for i in range(depth):
            node = ast.Attribute(value=node, attr=f'attr{i}', ctx=ast.Load())
        
        # Test various non-matching suffixes
        non_matching_suffixes = ['nonexistent', 'other', 'wrong_name']
        for suffix in non_matching_suffixes:
            result = _expr_matches_name(node, {}, suffix)
            results.append(not result)
    
    assert all(results)

def test_many_aliases_with_nested_attributes():
    """Test realistic combination of alias dict and nested attribute access."""
    results = []
    
    # Test with realistic dict sizes and reasonable nesting depths
    alias_sizes = [10, 20, 30]
    nesting_depths = [2, 3, 4, 5]
    
    for dict_size in alias_sizes:
        aliases = {f'alias_{i}': f'module_{i % 3}.submodule.func' for i in range(dict_size)}
        
        for depth in nesting_depths:
            # Build nested attribute up to realistic depth
            node = ast.Name(id='root', ctx=ast.Load())
            for i in range(depth):
                node = ast.Attribute(value=node, attr=f'level{i}', ctx=ast.Load())
            
            # Test matching the final level
            result = _expr_matches_name(node, aliases, f'level{depth - 1}')
            results.append(result)
            
            # Test non-matching
            result_no_match = _expr_matches_name(node, aliases, 'nonexistent')
            results.append(not result_no_match)
    
    assert all(results)

def test_many_calls_nested():
    """Test handling of nested function calls with realistic depths."""
    results = []
    
    # Test realistic nesting depths (up to 5-8 levels deep)
    for depth in [2, 3, 5, 7]:
        node = ast.Name(id='outer_func', ctx=ast.Load())
        for i in range(1, depth + 1):
            call_node = ast.Call(func=ast.Name(id=f'func{i}', ctx=ast.Load()), args=[node], keywords=[])
            node = call_node
        
        # Test matching the outermost function
        result = _expr_matches_name(node, {}, f'func{depth}') # 2.94μs -> 3.88μs (24.3% slower)
        results.append(result)
        
        # Test non-matching case
        result_no_match = _expr_matches_name(node, {}, 'unknown_func')
        results.append(not result_no_match)
        
        # Test matching a different depth function (should fail)
        if depth > 1:
            result_inner = _expr_matches_name(node, {}, 'outer_func') # 3.46μs -> 3.82μs (9.43% slower)
            results.append(not result_inner) # 3.46μs -> 3.82μs (9.43% slower)
    
    assert all(results)

def test_many_aliases_chain_lookup():
    """Test realistic usage with moderate alias dict and varied lookups."""
    aliases = {f'name_{i}': f'module_{i % 7}.target{i % 3}' for i in range(100)}
    
    results = []
    # Test lookups with diverse patterns
    lookup_indices = [0, 10, 25, 50, 75, 99]
    target_suffixes = ['target0', 'target1', 'target2']
    
    for idx in lookup_indices:
        expected_suffix = f'target{idx % 3}'
        node = ast.Name(id=f'name_{idx}', ctx=ast.Load())
        
        # Test matching case
        result_match = _expr_matches_name(node, aliases, expected_suffix) # 5.12μs -> 6.41μs (20.2% slower)
        results.append(result_match)
        
        # Test non-matching cases with different suffixes
        for other_suffix in [s for s in target_suffixes if s != expected_suffix]:
            result_no_match = _expr_matches_name(node, aliases, other_suffix)
            results.append(not result_no_match)
    
    assert all(results)

def test_large_suffix_strings():
    """Test with suffix strings of realistic lengths."""
    results = []
    
    # Test realistic identifier lengths (up to 50-60 chars in real code)
    for length in [5, 15, 30, 50]:
        suffix = 'method_' + 'a' * (length - 7)
        node = ast.Name(id=suffix, ctx=ast.Load())
        result = _expr_matches_name(node, {}, suffix) # 1.76μs -> 3.10μs (43.0% slower)
        results.append(result)
        
        # Test non-matching suffix of similar length
        different_suffix = 'func_' + 'b' * (length - 5)
        result_no_match = _expr_matches_name(node, {}, different_suffix)
        results.append(not result_no_match)
        
        # Test shorter suffix (should fail)
        short_suffix = suffix[:-3] # 3.15μs -> 3.70μs (14.8% slower)
        result_short = _expr_matches_name(node, {}, short_suffix)
        results.append(not result_short)
    
    assert all(results)

def test_large_dotted_paths():
    """Test matching dotted paths with realistic component counts."""
    results = []
    
    # Test realistic dotted path lengths (up to 5-7 components)
    for num_components in [3, 5, 7]:
        node = ast.Name(id='comp_0', ctx=ast.Load())
        for i in range(1, num_components):
            node = ast.Attribute(value=node, attr=f'comp_{i}', ctx=ast.Load())
        
        # Test matching the final component
        result = _expr_matches_name(node, {}, f'comp_{num_components - 1}') # 5.82μs -> 6.26μs (7.03% slower)
        results.append(result)
        
        # Test matching the first component (should fail)
        result_first = _expr_matches_name(node, {}, 'comp_0')
        results.append(not result_first)
        
        # Test non-matching suffix
        result_no_match = _expr_matches_name(node, {}, 'nonexistent') # 4.92μs -> 4.52μs (8.85% faster)
        results.append(not result_no_match)
        
        # Test matching the full path
        full_path = '.'.join([f'comp_{i}' for i in range(num_components)])
        result_full = _expr_matches_name(node, {}, full_path)
        results.append(result_full) # 4.45μs -> 3.99μs (11.6% faster)
    
    assert all(results)

def test_bulk_operations_mixed():
    """Test bulk operations with mixed node types and realistic patterns."""
    aliases = {f'import_{i}': f'pkg.module_{i % 4}.func{i % 2}' for i in range(50)}
    
    results = []
    
    # Test names with diverse patterns
    for i in range(10):
        node = ast.Name(id=f'import_{i}', ctx=ast.Load())
        expected_suffix = f'func{i % 2}'
        result = _expr_matches_name(node, aliases, expected_suffix) # 7.44μs -> 9.08μs (18.1% slower)
        results.append(result)
        
        # Also test non-matching case
        wrong_suffix = f'func{(i + 1) % 2}'
        result_wrong = _expr_matches_name(node, aliases, wrong_suffix)
        results.append(not result_wrong)
    
    # Test calls with varied patterns
    for i in range(10, 20):
        func = ast.Name(id=f'import_{i}', ctx=ast.Load())
        call = ast.Call(func=func, args=[], keywords=[])
        expected_suffix = f'func{i % 2}'
        result = _expr_matches_name(call, aliases, expected_suffix) # 8.77μs -> 8.88μs (1.24% slower)
        results.append(result)
    
    # Test attributes with mixed nesting
    for i in range(20, 30):
        base = ast.Name(id='obj', ctx=ast.Load())
        attr = ast.Attribute(value=base, attr=f'method_{i}', ctx=ast.Load())
        result = _expr_matches_name(attr, {}, f'method_{i}') # 7.42μs -> 9.70μs (23.4% slower)
        results.append(result)
        
        # Test non-matching attribute
        result_no_match = _expr_matches_name(attr, {}, f'method_{i + 1}')
        results.append(not result_no_match)
    
    assert all(results)
⏪ Click to see Replay Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
benchmarks/codeflash_replay_tests_260k0cbn/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor__expr_matches_name_test_benchmark_extract 2.60μs 3.17μs -17.7%⚠️

To edit these changes git checkout codeflash/optimize-pr1660-2026-03-17T03.31.35 and push.

Codeflash

The optimization replaced recursive calls in `_get_expr_name` with an iterative loop that walks attribute chains once, collecting parts into a list and reversing them only at the end, eliminating function-call overhead that dominated 46% of original runtime (line profiler shows recursive calls at 1154 ns/hit vs. the new loop iterations at ~300 ns/hit). Additionally, `_expr_matches_name` now precomputes `"." + suffix` once instead of building it twice per invocation via f-strings, cutting redundant string allocations. The net 26% runtime improvement comes primarily from avoiding Python's recursion stack and reducing temporary object creation in the hot path, with all tests passing and only minor per-test slowdowns (typically 10–25%) offset by dramatic wins on deep attribute chains (up to 393% faster for 100-level nesting).
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Mar 17, 2026
@claude
Copy link
Contributor

claude bot commented Mar 17, 2026

Claude finished @codeflash-ai[bot]'s task in 25s —— View job


PR Review Summary

  • Triage PR scope
  • Run lint and typecheck
  • Resolve stale threads
  • Code review
  • Duplicate detection
  • Test coverage
  • Check optimization PRs

Prek Checks

uv run prek run --from-ref origin/main — passed (ruff check + format clean)

mypy codeflash/languages/python/context/code_context_extractor.py — no issues found

Stale Threads

No unresolved Claude review threads found.

Code Review

The optimization converts _get_expr_name from recursive to iterative and precomputes "." + suffix in _expr_matches_name. Correctness verified manually across all cases:

_get_expr_name logic check:

Input Original Optimized Match
None None None
Name("foo") "foo" "foo"
Attribute(Name("a"), "b") "a.b" "a.b"
Attribute(BinOp, "b") "b" "b"
Call(func=Name("foo")) "foo" "foo"
Attribute(Attribute(Name("a"),"b"),"c") "a.b.c" "a.b.c"

Minor observations (no action required):

  1. parts.insert(0, base_name) at line 735 is O(n) — for large chains, [base_name] + parts creates a new list but is equivalent. In practice, AST attribute chains are shallow so this is negligible.

  2. The comment on line 732 — "Reverse to get innermost-first" — could be more precise (parts are collected outer-to-inner, reverse gives inner-to-outer / base-adjacent-first), but the code itself is correct.

  3. The suffix_dot = "." + suffix precomputation in _expr_matches_name (line 765) is a clean micro-optimization that avoids two redundant f-string allocations per call.

Overall: The optimization is correct and well-tested (231 generated regression tests + 1 replay test, 100% coverage). The 26% speedup claim on deep attribute chains is plausible given Python's function call overhead.

Duplicate Detection

No duplicates detected. _get_expr_name and _expr_matches_name are used only within codeflash/languages/python/context/code_context_extractor.py.

Test Coverage

Per the PR body: 100% coverage reported by CodeFlash's correctness verification. No existing unit tests for these private helpers, but 231 generated regression tests exercise all code paths.

Optimization PRs

Found 2 open codeflash-ai[bot] PRs:


@KRRT7 KRRT7 merged commit b809cb3 into unstructured-inference Mar 17, 2026
27 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1660-2026-03-17T03.31.35 branch March 17, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant