Skip to content

⚡️ Speed up function collect_existing_class_names by 188% in PR #1660 (unstructured-inference)#1838

Merged
KRRT7 merged 6 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-16T19.13.06
Mar 16, 2026
Merged

⚡️ Speed up function collect_existing_class_names by 188% in PR #1660 (unstructured-inference)#1838
KRRT7 merged 6 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-16T19.13.06

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai bot commented Mar 16, 2026

⚡️ This pull request contains optimizations for PR #1660

If you approve this dependent PR, these changes will be merged into the original PR branch unstructured-inference.

This PR will be automatically closed if the original PR is merged.


📄 188% (1.88x) speedup for collect_existing_class_names in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 18.6 milliseconds 6.44 milliseconds (best of 5 runs)

⚡️ This change will improve the performance of the following benchmarks:

Benchmark File :: Function Original Runtime Expected New Runtime Speedup
tests.benchmarks.test_benchmark_code_extract_code_context::test_benchmark_extract 16.5 seconds 16.5 seconds 0.02%

🔻 This change will degrade the performance of the following benchmarks:

{benchmark_info_degraded}

📝 Explanation and details

The optimization replaces ast.walk(tree) — which visits every node in the AST — with a manual stack-based traversal that only descends into container node types (Module, ClassDef, FunctionDef, control-flow statements, etc.) where ClassDef nodes can appear. This eliminates traversal of leaf nodes like Name, Constant, Load, and Store, which constitute the bulk of an AST but never contain class definitions. The profiler shows the original single-line comprehension spent 100% of runtime (117.7 ms) in ast.walk, while the optimized version completes in 36.1 ms (3.26× faster) by skipping ~60–80% of nodes depending on AST density. Tests confirm correctness across nested classes, control-flow scopes, and large trees with 1000+ classes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 42 Passed
⏪ Replay Tests 2 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import ast  # used to build AST Module nodes from source code

import pytest  # used for our unit tests
# import the function under test from the provided module path
from codeflash.languages.python.context.code_context_extractor import \
    collect_existing_class_names

def test_empty_module_returns_empty_set():
    # Parse an empty module into an AST Module node
    tree = ast.parse("")  # no definitions at all
    # collect_existing_class_names should return an empty set for an empty module
    assert collect_existing_class_names(tree) == set() # 6.43μs -> 1.84μs (249% faster)

def test_single_top_level_class_is_detected():
    # Simple source with a single top-level class definition
    src = "class Foo:\n    pass\n"
    tree = ast.parse(src)  # produce AST
    # Expect a set containing exactly the class name 'Foo'
    assert collect_existing_class_names(tree) == {"Foo"} # 9.19μs -> 3.15μs (192% faster)

def test_multiple_and_nested_classes_are_all_detected():
    # Source with multiple top-level classes and nested class inside another class
    src = """
class A:
    class Inner:
        pass

class B:
    pass

def factory():
    class Local:
        pass
    return Local
"""
    tree = ast.parse(src)
    # Expect names from top-level, nested, and function-local class definitions
    result = collect_existing_class_names(tree) # 21.0μs -> 8.47μs (148% faster)
    assert "A" in result  # top-level
    assert "B" in result  # top-level
    assert "Inner" in result  # nested inside A
    assert "Local" in result  # defined inside a function
    # Ensure exactly these four names (no extras)
    assert result >= {"A", "B", "Inner", "Local"}

def test_ignores_non_class_def_nodes():
    # Ensure function defs, async function defs, assignments, and imports are not reported
    src = """
def NotAClass(): 
    pass

async def NotAClassAsync():
    pass

x = 123

class RealClass:
    pass
"""
    tree = ast.parse(src)
    result = collect_existing_class_names(tree) # 21.4μs -> 7.18μs (198% faster)
    # Only the actual class should be present
    assert result == {"RealClass"}
    # Sanity: ensure function names are not included
    assert "NotAClass" not in result
    assert "NotAClassAsync" not in result

def test_decorated_and_metaclass_classes_are_detected():
    # Classes with decorators and with keyword args (metaclass) should still be found
    src = """
def deco(cls):
    return cls

@deco
class Decorated:
    pass

class WithMeta(metaclass=type):
    pass
"""
    tree = ast.parse(src)
    result = collect_existing_class_names(tree) # 21.9μs -> 6.32μs (246% faster)
    # Both class names should be present
    assert "Decorated" in result
    assert "WithMeta" in result
    assert result >= {"Decorated", "WithMeta"}

def test_duplicate_class_names_appear_only_once_in_result_set():
    # Define the same class name multiple times in different scopes; set should deduplicate
    src = """
class Dup:
    pass

def make():
    class Dup:
        pass
    return Dup

class Dup:
    pass
"""
    tree = ast.parse(src)
    result = collect_existing_class_names(tree) # 19.2μs -> 7.83μs (145% faster)
    # 'Dup' may be defined multiple times but should only appear once in the returned set
    assert result == {"Dup"}

def test_unicode_and_various_valid_identifier_class_names():
    # Python supports many unicode letters in identifiers; ensure they are captured
    src = """
class クラス:
    pass

class CamelCase:
    pass

class _LeadingUnderscore:
    pass

class mixed123:
    pass
"""
    tree = ast.parse(src)
    result = collect_existing_class_names(tree) # 14.4μs -> 7.68μs (87.9% faster)
    # Verify all class names, including the unicode name, are present
    assert "クラス" in result
    assert "CamelCase" in result
    assert "_LeadingUnderscore" in result
    assert "mixed123" in result

def test_classes_in_control_flow_and_try_blocks_are_detected():
    # Classes defined inside if/for/try blocks should be discovered by ast.walk
    src = """
if True:
    class IfClass:
        pass

for i in range(1):
    class ForClass:
        pass

try:
    class TryClass:
        pass
except Exception:
    pass
"""
    tree = ast.parse(src)
    result = collect_existing_class_names(tree) # 25.2μs -> 10.6μs (137% faster)
    # All class names declared inside control flow constructs should be present
    assert {"IfClass", "ForClass", "TryClass"} <= result

def test_large_number_of_classes_is_handled_and_all_names_returned():
    # Create a large source string with 1000 uniquely named classes to test scalability
    N = 1000  # number of classes to generate (within the requested limit)
    # Build source code with many simple class declarations
    src_lines = [f"class Class{i}:\n    pass\n" for i in range(N)]
    src = "\n".join(src_lines)
    tree = ast.parse(src)  # parse once
    result = collect_existing_class_names(tree) # 1.74ms -> 1.02ms (70.6% faster)
    # Expect exactly N unique class names
    assert len(result) == N
    # Check presence of a few boundary and random names to ensure correctness
    assert "Class0" in result
    assert f"Class{N-1}" in result
    assert "Class500" in result  # a mid-range sample
import ast

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    collect_existing_class_names

def test_single_class_definition():
    """Test that a single class definition is collected correctly."""
    # Parse a simple Python module with one class
    code = "class MyClass:\n    pass"
    tree = ast.parse(code)
    
    # Call the function and verify it returns the class name
    result = collect_existing_class_names(tree) # 9.97μs -> 4.42μs (126% faster)
    assert result == {"MyClass"}

def test_multiple_class_definitions():
    """Test that multiple class definitions are all collected."""
    # Parse a module with multiple top-level classes
    code = """
class FirstClass:
    pass

class SecondClass:
    pass

class ThirdClass:
    pass
"""
    tree = ast.parse(code)
    
    # Verify all three class names are in the result
    result = collect_existing_class_names(tree) # 13.2μs -> 6.94μs (90.5% faster)
    assert result == {"FirstClass", "SecondClass", "ThirdClass"}

def test_nested_class_definitions():
    """Test that nested classes are also collected."""
    # Parse a module with a nested class structure
    code = """
class OuterClass:
    class InnerClass:
        pass
"""
    tree = ast.parse(code)
    
    # Verify both outer and inner class names are collected
    result = collect_existing_class_names(tree) # 10.2μs -> 4.23μs (140% faster)
    assert result == {"OuterClass", "InnerClass"}

def test_deeply_nested_classes():
    """Test that deeply nested classes are all collected."""
    # Parse a module with multiple levels of nesting
    code = """
class Level1:
    class Level2:
        class Level3:
            pass
"""
    tree = ast.parse(code)
    
    # Verify all three class names at different nesting levels are collected
    result = collect_existing_class_names(tree) # 11.8μs -> 4.56μs (159% faster)
    assert result == {"Level1", "Level2", "Level3"}

def test_class_with_methods_and_attributes():
    """Test that classes with methods and attributes are collected correctly."""
    # Parse a module with a class containing methods and attributes
    code = """
class MyClass:
    def __init__(self):
        self.value = 42
    
    def method(self):
        return self.value
"""
    tree = ast.parse(code)
    
    # Verify the class name is collected regardless of its content
    result = collect_existing_class_names(tree) # 27.1μs -> 6.72μs (303% faster)
    assert result == {"MyClass"}

def test_class_with_inheritance():
    """Test that classes with inheritance are collected correctly."""
    # Parse a module with a class that inherits from another
    code = """
class BaseClass:
    pass

class DerivedClass(BaseClass):
    pass
"""
    tree = ast.parse(code)
    
    # Verify both classes are collected
    result = collect_existing_class_names(tree) # 12.8μs -> 5.81μs (119% faster)
    assert result == {"BaseClass", "DerivedClass"}

def test_class_with_decorators():
    """Test that decorated classes are collected correctly."""
    # Parse a module with a decorated class
    code = """
@decorator
class DecoratedClass:
    pass
"""
    tree = ast.parse(code)
    
    # Verify the class name is collected despite decorators
    result = collect_existing_class_names(tree) # 10.3μs -> 3.50μs (194% faster)
    assert result == {"DecoratedClass"}

def test_return_type_is_set():
    """Test that the return value is always a set type."""
    # Parse a simple module with a class
    code = "class TestClass:\n    pass"
    tree = ast.parse(code)
    
    # Call the function and verify it returns a set
    result = collect_existing_class_names(tree) # 8.57μs -> 3.59μs (139% faster)
    assert isinstance(result, set)

def test_class_names_are_strings():
    """Test that all collected class names are strings."""
    # Parse a module with multiple classes
    code = """
class Class1:
    pass

class Class2:
    pass
"""
    tree = ast.parse(code)
    
    # Verify all items in the result are strings
    result = collect_existing_class_names(tree) # 10.8μs -> 5.51μs (95.8% faster)
    assert all(isinstance(name, str) for name in result)

def test_empty_module():
    """Test that an empty module returns an empty set."""
    # Parse an empty module (just whitespace and comments)
    code = ""
    tree = ast.parse(code)
    
    # Verify the result is an empty set
    result = collect_existing_class_names(tree) # 5.55μs -> 1.93μs (187% faster)
    assert result == set()
    assert len(result) == 0

def test_module_with_only_functions():
    """Test that a module with only functions returns an empty set."""
    # Parse a module with only function definitions
    code = """
def function1():
    pass

def function2():
    pass
"""
    tree = ast.parse(code)
    
    # Verify no classes are collected
    result = collect_existing_class_names(tree) # 14.9μs -> 5.35μs (179% faster)
    assert result == set()

def test_module_with_only_imports():
    """Test that a module with only imports returns an empty set."""
    # Parse a module with only import statements
    code = """
import os
from sys import argv
"""
    tree = ast.parse(code)
    
    # Verify no classes are collected
    result = collect_existing_class_names(tree) # 10.5μs -> 3.93μs (167% faster)
    assert result == set()

def test_module_with_only_variables():
    """Test that a module with only variable assignments returns an empty set."""
    # Parse a module with only variable assignments
    code = """
x = 42
y = "hello"
z = [1, 2, 3]
"""
    tree = ast.parse(code)
    
    # Verify no classes are collected
    result = collect_existing_class_names(tree) # 19.1μs -> 4.07μs (371% faster)
    assert result == set()

def test_class_with_special_name():
    """Test that classes with special names like __ClassName__ are collected."""
    # Parse a module with a class that has underscores
    code = "class __SpecialClass__:\n    pass"
    tree = ast.parse(code)
    
    # Verify the special class name is collected
    result = collect_existing_class_names(tree) # 8.59μs -> 3.76μs (128% faster)
    assert result == {"__SpecialClass__"}

def test_class_with_all_caps_name():
    """Test that classes with all-caps names are collected correctly."""
    # Parse a module with a class in all capitals
    code = "class CONSTANT:\n    pass"
    tree = ast.parse(code)
    
    # Verify the all-caps class name is collected
    result = collect_existing_class_names(tree) # 8.60μs -> 3.63μs (137% faster)
    assert result == {"CONSTANT"}

def test_class_with_numbers_in_name():
    """Test that classes with numbers in their names are collected."""
    # Parse a module with classes containing numbers
    code = """
class Class1:
    pass

class Class2Test:
    pass

class Test123:
    pass
"""
    tree = ast.parse(code)
    
    # Verify all classes with numbers are collected
    result = collect_existing_class_names(tree) # 12.5μs -> 6.37μs (96.4% faster)
    assert result == {"Class1", "Class2Test", "Test123"}

def test_duplicate_class_names_in_different_scopes():
    """Test that classes with the same name in different scopes are deduplicated in set."""
    # Parse a module with classes having the same name in different scopes
    code = """
class MyClass:
    class MyClass:
        pass
"""
    tree = ast.parse(code)
    
    # The set should contain only one "MyClass" since sets deduplicate
    result = collect_existing_class_names(tree) # 10.0μs -> 4.25μs (136% faster)
    assert result == {"MyClass"}
    assert len(result) == 1

def test_class_inside_function():
    """Test that classes defined inside functions are still collected."""
    # Parse a module with a class inside a function
    code = """
def my_function():
    class LocalClass:
        pass
    return LocalClass
"""
    tree = ast.parse(code)
    
    # Verify the local class is still collected by ast.walk
    result = collect_existing_class_names(tree) # 15.8μs -> 5.61μs (182% faster)
    assert result == {"LocalClass"}

def test_class_inside_if_statement():
    """Test that classes inside conditional blocks are collected."""
    # Parse a module with a class inside an if statement
    code = """
if True:
    class ConditionalClass:
        pass
"""
    tree = ast.parse(code)
    
    # Verify the class in the conditional block is collected
    result = collect_existing_class_names(tree) # 11.5μs -> 4.38μs (163% faster)
    assert result == {"ConditionalClass"}

def test_class_inside_try_except():
    """Test that classes inside try-except blocks are collected."""
    # Parse a module with a class inside a try block
    code = """
try:
    class TriedClass:
        pass
except:
    class ExceptClass:
        pass
"""
    tree = ast.parse(code)
    
    # Verify both classes in try and except are collected
    result = collect_existing_class_names(tree) # 13.4μs -> 7.15μs (87.8% faster)
    assert result == {"TriedClass", "ExceptClass"}

def test_class_inside_loop():
    """Test that classes inside loops are collected."""
    # Parse a module with a class inside a for loop
    code = """
for i in range(1):
    class LoopClass:
        pass
"""
    tree = ast.parse(code)
    
    # Verify the class in the loop is collected
    result = collect_existing_class_names(tree) # 15.5μs -> 4.39μs (254% faster)
    assert result == {"LoopClass"}

def test_class_inside_with_statement():
    """Test that classes inside with statements are collected."""
    # Parse a module with a class inside a with statement
    code = """
with open('file.txt'):
    class WithClass:
        pass
"""
    tree = ast.parse(code)
    
    # Verify the class in the with block is collected
    result = collect_existing_class_names(tree) # 14.8μs -> 5.05μs (193% faster)
    assert result == {"WithClass"}

def test_single_letter_class_names():
    """Test that single-letter class names are collected correctly."""
    # Parse a module with single-letter class names
    code = """
class A:
    pass

class B:
    pass

class Z:
    pass
"""
    tree = ast.parse(code)
    
    # Verify all single-letter classes are collected
    result = collect_existing_class_names(tree) # 12.6μs -> 6.63μs (90.2% faster)
    assert result == {"A", "B", "Z"}

def test_very_long_class_name():
    """Test that very long class names are collected correctly."""
    # Create a very long class name
    long_name = "VeryLongClassName" + "X" * 100
    code = f"class {long_name}:\n    pass"
    tree = ast.parse(code)
    
    # Verify the long class name is collected
    result = collect_existing_class_names(tree) # 8.65μs -> 3.52μs (146% faster)
    assert result == {long_name}

def test_unicode_class_names():
    """Test that classes with unicode characters in names are collected."""
    # Parse a module with unicode characters in class names
    code = "class ClassΑ:\n    pass"  # Using Greek Alpha
    tree = ast.parse(code)
    
    # Verify the unicode class name is collected
    result = collect_existing_class_names(tree) # 8.56μs -> 3.35μs (156% faster)
    assert "ClassΑ" in result

def test_many_top_level_classes():
    """Test collection of a large number of top-level classes."""
    # Generate code with 100 top-level class definitions
    code_lines = [f"class Class{i}:\n    pass\n" for i in range(100)]
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify all 100 classes are collected
    result = collect_existing_class_names(tree) # 182μs -> 103μs (76.3% faster)
    assert len(result) == 100
    assert all(f"Class{i}" in result for i in range(100))

def test_deeply_nested_classes_10_levels():
    """Test collection of classes nested 10 levels deep."""
    # Generate deeply nested classes
    code_parts = []
    for i in range(10):
        code_parts.append("    " * i + f"class Level{i}:\n")
    code = "".join(code_parts) + "        " * 10 + "pass"
    tree = ast.parse(code)
    
    # Verify all 10 levels of classes are collected
    result = collect_existing_class_names(tree) # 21.7μs -> 9.48μs (129% faster)
    assert len(result) == 10
    assert all(f"Level{i}" in result for i in range(10))

def test_mixed_classes_and_functions_500_items():
    """Test collection when module has 500 mixed classes and functions."""
    # Generate a module with 250 classes and 250 functions
    code_lines = []
    for i in range(250):
        code_lines.append(f"class Class{i}:\n    pass\n")
        code_lines.append(f"def func{i}():\n    pass\n")
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify exactly 250 classes are collected despite mixed content
    result = collect_existing_class_names(tree) # 1.29ms -> 481μs (167% faster)
    assert len(result) == 250
    assert all(f"Class{i}" in result for i in range(250))

def test_classes_in_nested_functions_200_total():
    """Test collection of 200 classes distributed in nested functions."""
    # Generate functions with nested classes
    code_lines = []
    for i in range(100):
        code_lines.append(f"def func{i}():\n")
        code_lines.append(f"    class LocalClass{i}A:\n        pass\n")
        code_lines.append(f"    class LocalClass{i}B:\n        pass\n")
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify all 200 classes are collected
    result = collect_existing_class_names(tree) # 676μs -> 263μs (157% faster)
    assert len(result) == 200

def test_classes_with_multiple_levels_of_nesting_1000_total():
    """Test collection with complex nesting structure totaling ~1000 classes."""
    # Generate a more complex structure with multiple nesting patterns
    code_lines = []
    
    # Add 100 top-level classes with 5 nested classes each
    for i in range(100):
        code_lines.append(f"class TopLevel{i}:\n")
        for j in range(5):
            code_lines.append(f"    class Nested{i}_{j}:\n")
            code_lines.append(f"        pass\n")
    
    # Add 400 classes in conditional blocks
    for i in range(400):
        code_lines.append(f"if True:\n    class Conditional{i}:\n        pass\n")
    
    # Add 500 classes in functions
    for i in range(500):
        code_lines.append(f"def func{i}():\n    class InFunc{i}:\n        pass\n")
    
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify all classes are collected (100 + 500 + 400 + 500 = 1500)
    result = collect_existing_class_names(tree) # 4.78ms -> 2.01ms (139% faster)
    assert len(result) == 1500

def test_classes_with_complex_decorators_and_inheritance_500():
    """Test collection of 500 classes each with decorators and inheritance."""
    # Generate classes with decorators and inheritance
    code_lines = []
    code_lines.append("class BaseClass:\n    pass\n")
    
    for i in range(500):
        code_lines.append(f"@decorator_{i}\n")
        code_lines.append(f"class DerivedClass{i}(BaseClass):\n")
        code_lines.append(f"    def method{i}(self):\n        pass\n")
    
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify all 501 classes are collected
    result = collect_existing_class_names(tree) # 3.85ms -> 821μs (368% faster)
    assert len(result) == 501
    assert "BaseClass" in result
    assert all(f"DerivedClass{i}" in result for i in range(500))

def test_performance_with_1000_simple_classes():
    """Test performance when processing 1000 simple class definitions."""
    # Generate 1000 simple classes
    code_lines = [f"class SimpleClass{i}:\n    pass\n" for i in range(1000)]
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify all 1000 classes are collected
    result = collect_existing_class_names(tree) # 1.71ms -> 944μs (81.1% faster)
    assert len(result) == 1000
    assert all(f"SimpleClass{i}" in result for i in range(1000))

def test_classes_within_lambda_like_structures():
    """Test collection of classes in various scopes including comprehensions."""
    # Generate classes in different scope types
    code_lines = []
    
    # Classes in comprehensions are tricky, but we test classes in normal structures
    for i in range(100):
        code_lines.append(f"class OuterClass{i}:\n")
        code_lines.append(f"    class InnerClass{i}A:\n        pass\n")
        code_lines.append(f"    class InnerClass{i}B:\n        pass\n")
        code_lines.append(f"    class InnerClass{i}C:\n        pass\n")
    
    code = "".join(code_lines)
    tree = ast.parse(code)
    
    # Verify all 400 classes are collected (100 outer + 300 inner)
    result = collect_existing_class_names(tree) # 668μs -> 356μs (87.5% faster)
    assert len(result) == 400
⏪ Click to see Replay Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
benchmarks/codeflash_replay_tests_xz10rm7d/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor_collect_existing_class_names_test_benchmark_extract 3.21ms 261μs 1127%✅

To edit these changes git checkout codeflash/optimize-pr1660-2026-03-16T19.13.06 and push.

Codeflash

The optimization replaces `ast.walk(tree)` — which visits every node in the AST — with a manual stack-based traversal that only descends into container node types (`Module`, `ClassDef`, `FunctionDef`, control-flow statements, etc.) where `ClassDef` nodes can appear. This eliminates traversal of leaf nodes like `Name`, `Constant`, `Load`, and `Store`, which constitute the bulk of an AST but never contain class definitions. The profiler shows the original single-line comprehension spent 100% of runtime (117.7 ms) in `ast.walk`, while the optimized version completes in 36.1 ms (3.26× faster) by skipping ~60–80% of nodes depending on AST density. Tests confirm correctness across nested classes, control-flow scopes, and large trees with 1000+ classes.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Mar 16, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 16, 2026

Claude finished @misrasaurabh1's task in 4m 51s —— View job


PR Review Summary

  • Triage: production Python file changed — full review required
  • Lint & typecheck
  • Resolve stale threads
  • Code review
  • Duplicate detection
  • Test coverage
  • Summary comment
  • Check/merge codeflash-ai bot PRs

Prek Checks

✅ No linting issues. Mypy reports no type errors.


Code Review

collect_existing_class_names — missing ast.Match and ast.TryStar node types (Minor)

codeflash/languages/python/context/code_context_extractor.py:577

The selective traversal lists node types that can contain ClassDef nodes, but omits:

  • ast.Match (Python 3.10+) — classes defined inside match/case bodies won't be found
  • ast.TryStar (Python 3.11+) — classes defined inside try/except* blocks won't be found

The original ast.walk(tree) handled these automatically. While defining a class inside a match case is unusual, it's valid Python and the previous implementation handled it correctly. Since CodeFlash targets Python 3.9+, match statements in analyzed files are fully in scope.

Fix this →

# noqa: SIM110 on for loops (Informational)

codeflash/languages/python/context/code_context_extractor.py:774

Three functions (_is_namedtuple_class, _class_has_explicit_init, _has_descriptor_like_class_fields) have for loops with # noqa: SIM110 silencing the "use any() instead" warning. The early-return pattern is intentional for performance, so the noqa suppression is correct — just slightly surprising to readers since SIM110 normally fires on for loops that should be any().

assert class_node.end_lineno is not None — Fine addition in _should_use_raw_project_class_context; ClassDef always has end_lineno set after parsing.

Path import moved to TYPE_CHECKING — Correct; confirms Path is only used in type annotations at runtime.


Duplicate Detection

No duplicates detected. The changes are self-contained within a single file.


Test Coverage

No unit tests exist for the modified helper functions (collect_existing_class_names, _is_namedtuple_class, _class_has_explicit_init, _has_descriptor_like_class_fields). The PR includes 42 generated regression tests with 100% coverage for collect_existing_class_names, and 2 replay tests pass. Integration coverage exists via tests/benchmarks/test_benchmark_code_extract_code_context.py.


Bot PR Status

PR #1840 (_should_use_raw_project_class_context optimization) was already merged. No other open codeflash-ai bot PRs found.


Last updated: 2026-03-16T20:02Z

github-actions bot and others added 3 commits March 16, 2026 19:16
The optimization replaced `any()` generator expressions with explicit early-return for-loops in four helper functions (`_is_namedtuple_class`, `_class_has_explicit_init`, `_has_descriptor_like_class_fields`, and `_has_non_property_method_decorator`), eliminating the overhead of building generator objects and calling the `any()` builtin. Line profiler data shows `_class_has_explicit_init` dropped from 1.85 ms to 0.96 ms (48% faster), and `_is_namedtuple_class` improved from 97 µs to 53 µs (46% faster), because the optimized code avoids allocating iterator state and returns immediately upon finding a match instead of completing the generator. The 51% overall runtime improvement (1.43 ms → 948 µs) comes from these cumulative reductions in per-call overhead across thousands of invocations during AST traversal. Test suite confirms no behavioral changes across all edge cases including dataclasses, decorators, and size-limit boundaries.
@codeflash-ai
Copy link
Copy Markdown
Contributor Author

codeflash-ai bot commented Mar 16, 2026

⚡️ Codeflash found optimizations for this PR

📄 51% (0.51x) speedup for _should_use_raw_project_class_context in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 1.43 milliseconds 948 microseconds (best of 163 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch codeflash/optimize-pr1660-2026-03-16T19.13.06).

Static Badge

…2026-03-16T19.35.10

⚡️ Speed up function `_should_use_raw_project_class_context` by 51% in PR #1838 (`codeflash/optimize-pr1660-2026-03-16T19.13.06`)
@codeflash-ai
Copy link
Copy Markdown
Contributor Author

codeflash-ai bot commented Mar 16, 2026

@KRRT7 KRRT7 merged commit 299853b into unstructured-inference Mar 16, 2026
25 of 26 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1660-2026-03-16T19.13.06 branch March 16, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants