Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Nov 21, 2025

⚡️ This pull request contains optimizations for PR #935

If you approve this dependent PR, these changes will be merged into the original PR branch fix/ctx-global-definitions-deps.

This PR will be automatically closed if the original PR is merged.


📄 15% (0.15x) speedup for remove_unused_definitions_by_function_names in codeflash/context/unused_definition_remover.py

⏱️ Runtime : 928 milliseconds 804 milliseconds (best of 12 runs)

📝 Explanation and details

The optimized code achieves a 15% speedup through three key optimizations that target expensive operations identified in the profiler:

1. Early Exit for Empty Qualified Functions (Major Impact)

  • Added a guard clause in collect_top_level_defs_with_usages() that returns early if qualified_function_names is empty
  • This completely skips the expensive CST visitor pattern (wrapper.visit(dependency_collector)) which accounts for 82.6% of the function's runtime
  • Test results show dramatic speedups (706-3748% faster) for cases with empty function sets, indicating this optimization has substantial impact when no specific functions need to be preserved

2. Loop Optimization with Local Variable Caching

  • Cached new_children.append as append_child and remove_unused_definitions_recursively as rr to eliminate repeated attribute lookups in the hot loop
  • The profiler shows this loop executing 2,777 times and consuming 9.5% of total runtime through recursive calls
  • Attribute lookups in Python are relatively expensive, so caching these references provides measurable improvement in tight loops

3. Tuple Unpacking Elimination

  • Replaced tuple unpacking modified_module, _ = remove_unused_definitions_recursively(...) with direct indexing to avoid creating temporary tuples
  • While a micro-optimization, it reduces object allocation overhead in the main execution path

Impact Based on Function Usage:
The function references show this is called from extract_code_string_context_from_files() and extract_code_markdown_context_from_files(), both of which process multiple files and function sets during code context extraction. The early exit optimization is particularly valuable here since many files may have empty qualified function sets, allowing the system to skip expensive dependency analysis entirely.

The optimizations are most effective for:

  • Large codebases where many files don't contain target functions (early exit benefit)
  • Complex AST structures with deep nesting (loop optimization benefit)
  • Batch processing scenarios where the function is called repeatedly (cumulative micro-optimization benefits)

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 8 Passed
🌀 Generated Regression Tests 50 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 75.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_remove_unused_definitions.py::test_class_method_with_dunder_methods 8.56ms 8.36ms 2.34%✅
test_remove_unused_definitions.py::test_class_variable_removal 7.17ms 7.02ms 2.14%✅
test_remove_unused_definitions.py::test_complex_type_annotations 8.70ms 8.49ms 2.40%✅
test_remove_unused_definitions.py::test_complex_variable_dependencies 5.14ms 4.94ms 4.03%✅
test_remove_unused_definitions.py::test_conditional_and_loop_variables 13.4ms 13.1ms 3.01%✅
test_remove_unused_definitions.py::test_try_except_finally_variables 7.52ms 7.40ms 1.72%✅
test_remove_unused_definitions.py::test_type_annotation_usage 4.63ms 4.51ms 2.64%✅
test_remove_unused_definitions.py::test_variable_removal_only 4.13ms 4.02ms 2.64%✅
🌀 Generated Regression Tests and Runtime
# imports
from codeflash.context.unused_definition_remover import remove_unused_definitions_by_function_names

# function to test
# (PASTED FROM ABOVE, OMITTED HERE FOR BREVITY)
# ... assume remove_unused_definitions_by_function_names is imported and available ...

# -----------------------------------------
# Basic Test Cases
# -----------------------------------------


def test_basic_keep_function_by_name():
    """Should keep only the specified function and remove others."""
    code = """
def foo():
    pass

def bar():
    pass

def baz():
    pass
"""
    # Only keep 'bar'
    codeflash_output = remove_unused_definitions_by_function_names(code, {"bar"})
    result = codeflash_output  # 3.05ms -> 2.97ms (2.57% faster)


def test_basic_keep_multiple_functions():
    """Should keep multiple specified functions."""
    code = """
def foo():
    pass

def bar():
    pass

def baz():
    pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo", "baz"})
    result = codeflash_output  # 3.06ms -> 2.98ms (2.52% faster)


def test_basic_keep_class_and_method():
    """Should keep class and specified method."""
    code = """
class MyClass:
    def method1(self):
        pass
    def method2(self):
        pass
def unrelated():
    pass
"""
    # Keep method1 of MyClass
    codeflash_output = remove_unused_definitions_by_function_names(code, {"MyClass.method1"})
    result = codeflash_output  # 3.49ms -> 3.41ms (2.33% faster)


def test_basic_keep_variable():
    """Should keep only the specified variable assignment."""
    code = """
x = 1
y = 2
def foo():
    pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"x"})
    result = codeflash_output  # 2.53ms -> 2.49ms (1.48% faster)


def test_basic_keep_class_and_variable():
    """Should keep class and variable if both are referenced."""
    code = """
class A:
    pass
b = 42
def foo():
    pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"A", "b"})
    result = codeflash_output  # 2.76ms -> 2.68ms (2.68% faster)


# -----------------------------------------
# Edge Test Cases
# -----------------------------------------


def test_edge_empty_code():
    """Should handle empty code gracefully."""
    code = ""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"anything"})
    result = codeflash_output  # 1.30ms -> 1.29ms (0.922% faster)


def test_edge_no_qualified_names():
    """Should remove all definitions if no qualified names are given."""
    code = """
def foo(): pass
x = 1
class Bar: pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, set())
    result = codeflash_output  # 2.53ms -> 246μs (928% faster)


def test_edge_nonexistent_function_names():
    """Should remove all definitions if specified function names don't exist."""
    code = """
def foo(): pass
def bar(): pass
x = 1
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"baz"})
    result = codeflash_output  # 2.54ms -> 2.50ms (1.66% faster)


def test_edge_imports_are_preserved():
    """Should always preserve import statements."""
    code = """
import os
from sys import path

def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, set())
    result = codeflash_output  # 2.57ms -> 273μs (840% faster)


def test_edge_tuple_unpacking_assignment():
    """Should handle tuple unpacking in assignments."""
    code = """
a, b = (1, 2)
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"a"})
    result = codeflash_output  # 2.69ms -> 2.62ms (2.35% faster)
    # 'b' is not referenced, but assignment cannot be split, so assignment is kept if any target is referenced


def test_edge_annassign_and_augassign():
    """Should handle annotated and augmented assignments."""
    code = """
x: int = 5
y += 1
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"x"})
    result = codeflash_output  # 2.58ms -> 2.53ms (2.22% faster)


def test_edge_class_with_variable_and_method():
    """Should keep class variable if referenced, and remove if not."""
    code = """
class C:
    x = 1
    y = 2
    def m(self): pass
def foo(): pass
"""
    # Reference only 'C.x'
    codeflash_output = remove_unused_definitions_by_function_names(code, {"C.x"})
    result = codeflash_output  # 3.30ms -> 3.24ms (1.81% faster)


def test_edge_class_with_no_methods():
    """Should keep class if referenced even if it has no methods."""
    code = """
class Empty:
    pass
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"Empty"})
    result = codeflash_output  # 2.31ms -> 2.28ms (1.47% faster)


def test_edge_nested_functions():
    """Should ignore nested (non-top-level) function definitions."""
    code = """
def outer():
    def inner():
        pass
    return inner
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"outer"})
    result = codeflash_output  # 2.90ms -> 2.83ms (2.35% faster)
    # Should not attempt to keep 'inner' as it's not top-level


def test_edge_if_else_blocks_with_assignments():
    """Should handle assignments inside if/else blocks."""
    code = """
if True:
    x = 1
else:
    y = 2
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"x"})
    result = codeflash_output  # 2.87ms -> 2.81ms (2.24% faster)


def test_edge_try_except_finally_blocks():
    """Should handle assignments in try/except/finally blocks."""
    code = """
try:
    x = 1
except Exception:
    y = 2
finally:
    z = 3
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"z"})
    result = codeflash_output  # 3.32ms -> 3.26ms (1.75% faster)


def test_edge_keep_class_and_all_methods_if_class_referenced():
    """Should keep all methods if class is referenced."""
    code = """
class D:
    def a(self): pass
    def b(self): pass
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"D"})
    result = codeflash_output  # 3.24ms -> 3.17ms (2.33% faster)


def test_edge_keep_method_by_qualified_name():
    """Should keep class and method if qualified method name is referenced."""
    code = """
class E:
    def m1(self): pass
    def m2(self): pass
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"E.m2"})
    result = codeflash_output  # 3.23ms -> 3.17ms (1.78% faster)


def test_edge_keep_class_variable_by_qualified_name():
    """Should keep class variable if referenced by qualified name."""
    code = """
class F:
    x = 10
    y = 20
def foo(): pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"F.x"})
    result = codeflash_output  # 2.76ms -> 2.69ms (2.43% faster)


def test_edge_keep_function_and_its_dependencies():
    """Should keep dependencies if function depends on other definitions."""
    code = """
x = 1
def foo():
    return x
def bar():
    pass
"""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 2.83ms -> 2.76ms (2.58% faster)


# -----------------------------------------
# Large Scale Test Cases
# -----------------------------------------


def test_large_scale_many_functions_and_variables():
    """Should handle large number of functions and variables efficiently."""
    # Generate code with 500 functions and 500 variables
    code_lines = []
    for i in range(500):
        code_lines.append(f"x{i} = {i}")
    for i in range(500):
        code_lines.append(f"def func{i}():\n    pass")
    code = "\n".join(code_lines)
    # Keep every 100th function and variable
    keep_funcs = {f"func{i}" for i in range(0, 500, 100)}
    keep_vars = {f"x{i}" for i in range(0, 500, 100)}
    codeflash_output = remove_unused_definitions_by_function_names(code, keep_funcs | keep_vars)
    result = codeflash_output  # 341ms -> 325ms (4.75% faster)
    for i in range(0, 500, 100):
        pass
    for i in range(500):
        if i % 100 != 0:
            pass


def test_large_scale_many_classes_and_methods():
    """Should handle large number of classes and methods efficiently."""
    code_lines = []
    for i in range(100):
        code_lines.append(f"class C{i}:\n    def m{2 * i}(self): pass\n    def m{2 * i + 1}(self): pass")
    code = "\n".join(code_lines)
    # Keep every 10th class and its first method
    keep_classes = {f"C{i}" for i in range(0, 100, 10)}
    keep_methods = {f"C{i}.m{i * 2}" for i in range(0, 100, 10)}
    codeflash_output = remove_unused_definitions_by_function_names(code, keep_classes | keep_methods)
    result = codeflash_output  # 127ms -> 123ms (3.79% faster)
    for i in range(0, 100, 10):
        pass
    for i in range(100):
        if i % 10 != 0:
            pass


def test_large_scale_keep_all():
    """Should keep everything if all names are referenced."""
    code_lines = []
    for i in range(100):
        code_lines.append(f"x{i} = {i}")
        code_lines.append(f"def f{i}(): pass")
        code_lines.append(f"class C{i}: pass")
    code = "\n".join(code_lines)
    keep_names = {f"x{i}" for i in range(100)} | {f"f{i}" for i in range(100)} | {f"C{i}" for i in range(100)}
    codeflash_output = remove_unused_definitions_by_function_names(code, keep_names)
    result = codeflash_output  # 96.7ms -> 92.0ms (5.12% faster)
    for i in range(100):
        pass


def test_large_scale_remove_all():
    """Should remove everything except imports if no names are referenced."""
    code_lines = ["import math", "from sys import argv"]
    for i in range(100):
        code_lines.append(f"x{i} = {i}")
        code_lines.append(f"def f{i}(): pass")
        code_lines.append(f"class C{i}: pass")
    code = "\n".join(code_lines)
    codeflash_output = remove_unused_definitions_by_function_names(code, set())
    result = codeflash_output  # 95.4ms -> 11.8ms (706% faster)
    for i in range(100):
        pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
# imports
from codeflash.context.unused_definition_remover import remove_unused_definitions_by_function_names

# function to test
# (PASTE function code here from above, omitted for brevity in this block, but assumed present in the test suite)

# ------------------- UNIT TESTS FOR remove_unused_definitions_by_function_names -------------------


# Helper for normalizing whitespace for easier assertions
def normalize_code(code: str) -> str:
    import textwrap

    return "\n".join(line.rstrip() for line in textwrap.dedent(code).strip().splitlines())


# ------------------- BASIC TEST CASES -------------------


def test_keep_only_specified_function():
    # Only function 'foo' should be kept
    code = """
    def foo():
        pass

    def bar():
        pass
    """
    expected = """
    def foo():
        pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 71.6μs -> 71.9μs (0.516% slower)


def test_keep_function_and_used_variable():
    # Variable 'x' is used in 'foo', so both should be kept
    code = """
    x = 1

    def foo():
        return x

    def bar():
        return 42
    """
    expected = """
    x = 1

    def foo():
        return x
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 57.7μs -> 57.1μs (1.05% faster)


def test_keep_class_and_method():
    # Only class 'A' and its method 'A.m' should be kept
    code = """
    class A:
        def m(self):
            pass

        def n(self):
            pass

    class B:
        def foo(self):
            pass
    """
    expected = """
    class A:
        def m(self):
            pass

        def n(self):
            pass
    """
    # If 'A.m' is specified, keep whole class A
    codeflash_output = remove_unused_definitions_by_function_names(code, {"A.m"})
    result = codeflash_output  # 61.1μs -> 61.0μs (0.098% faster)


def test_keep_class_variable_used_in_method():
    # Class variable 'y' is used by 'A.m', so both should be kept
    code = """
    class A:
        y = 123

        def m(self):
            return self.y

        def n(self):
            pass

    class B:
        def foo(self):
            pass
    """
    expected = """
    class A:
        y = 123

        def m(self):
            return self.y

        def n(self):
            pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"A.m"})
    result = codeflash_output  # 63.1μs -> 62.1μs (1.63% faster)


def test_keep_imports():
    # Imports should never be removed
    code = """
    import os
    from sys import path

    def foo():
        pass

    def bar():
        pass
    """
    expected = """
    import os
    from sys import path

    def foo():
        pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 57.2μs -> 55.8μs (2.49% faster)


def test_keep_multiple_functions():
    # Both foo and bar should be kept
    code = """
    def foo():
        pass

    def bar():
        pass

    def baz():
        pass
    """
    expected = """
    def foo():
        pass

    def bar():
        pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo", "bar"})
    result = codeflash_output  # 55.1μs -> 54.2μs (1.65% faster)


def test_keep_class_and_all_methods_when_class_is_specified():
    # Specifying class keeps all its methods and variables
    code = """
    class A:
        x = 1
        def foo(self): pass
        def bar(self): pass

    class B:
        def baz(self): pass
    """
    expected = """
    class A:
        x = 1
        def foo(self): pass
        def bar(self): pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"A"})
    result = codeflash_output  # 58.9μs -> 58.4μs (0.926% faster)


def test_keep_variable_used_by_function():
    # Only variable 'x' (used by foo) should be kept, not 'y'
    code = """
    x = 10
    y = 20

    def foo():
        return x

    def bar():
        return y
    """
    expected = """
    x = 10

    def foo():
        return x
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 54.3μs -> 54.9μs (0.986% slower)


def test_keep_tuple_unpacking_assignment():
    # Only 'a' is used by foo, so only 'a' assignment should be kept
    code = """
    a, b = 1, 2

    def foo():
        return a

    def bar():
        return b
    """
    expected = """
    a, b = 1, 2

    def foo():
        return a
    """
    # Both a and b are assigned together, so assignment must be kept if any used
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 55.0μs -> 54.7μs (0.530% faster)


# ------------------- EDGE TEST CASES -------------------


def test_empty_code():
    code = ""
    expected = ""
    codeflash_output = remove_unused_definitions_by_function_names(code, set())
    result = codeflash_output  # 1.30ms -> 33.8μs (3748% faster)


def test_no_matching_functions():
    code = """
    def foo():
        pass
    def bar():
        pass
    """
    expected = ""
    codeflash_output = remove_unused_definitions_by_function_names(code, {"baz"})
    result = codeflash_output  # 57.3μs -> 53.0μs (8.13% faster)


def test_syntax_error_returns_original():
    # Should return original code if code is invalid
    code = "def foo("
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 49.3μs -> 47.7μs (3.25% faster)


def test_class_with_no_methods():
    code = """
    class A:
        pass

    class B:
        def foo(self): pass
    """
    expected = """
    class B:
        def foo(self): pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"B.foo"})
    result = codeflash_output  # 55.3μs -> 53.3μs (3.80% faster)


def test_function_with_decorator():
    code = """
    def helper(): pass

    @staticmethod
    def foo():
        helper()
    """
    expected = """
    def helper(): pass

    @staticmethod
    def foo():
        helper()
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 55.4μs -> 54.1μs (2.45% faster)


def test_variable_in_augmented_assignment():
    code = """
    x = 1
    def foo():
        global x
        x += 1
    def bar():
        pass
    """
    expected = """
    x = 1
    def foo():
        global x
        x += 1
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 54.4μs -> 55.0μs (1.11% slower)


def test_variable_in_annotated_assignment():
    code = """
    y: int = 2
    def foo():
        return y
    def bar():
        pass
    """
    expected = """
    y: int = 2
    def foo():
        return y
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 55.9μs -> 53.5μs (4.42% faster)


def test_assignment_in_if_block():
    code = """
    if True:
        x = 1
    else:
        x = 2

    def foo():
        return x

    def bar():
        pass
    """
    expected = """
    if True:
        x = 1
    else:
        x = 2

    def foo():
        return x
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 58.0μs -> 57.2μs (1.35% faster)


def test_nested_functions():
    code = """
    def outer():
        def inner():
            pass
        return inner

    def foo():
        pass
    """
    expected = """
    def foo():
        pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 55.7μs -> 55.0μs (1.17% faster)


def test_class_method_with_same_name_as_function():
    code = """
    def foo():
        pass

    class A:
        def foo(self):
            pass
        def bar(self):
            pass
    """
    expected = """
    def foo():
        pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 56.3μs -> 55.4μs (1.63% faster)


def test_keep_class_method_only():
    code = """
    class A:
        def foo(self):
            pass
        def bar(self):
            pass

    class B:
        def foo(self):
            pass
    """
    expected = """
    class A:
        def foo(self):
            pass
        def bar(self):
            pass
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"A.foo"})
    result = codeflash_output  # 59.1μs -> 58.1μs (1.65% faster)


def test_assignment_with_starred_unpacking():
    code = """
    a, *b = [1, 2, 3]
    def foo():
        return b
    """
    expected = """
    a, *b = [1, 2, 3]
    def foo():
        return b
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 53.4μs -> 52.6μs (1.56% faster)


def test_assignment_with_tuple_in_class():
    code = """
    class A:
        a, b = (1, 2)
        def foo(self): return a
        def bar(self): return b
    """
    expected = """
    class A:
        a, b = (1, 2)
        def foo(self): return a
        def bar(self): return b
    """
    codeflash_output = remove_unused_definitions_by_function_names(code, {"A.foo", "A.bar"})
    result = codeflash_output  # 56.0μs -> 56.1μs (0.109% slower)


# ------------------- LARGE SCALE TEST CASES -------------------


def test_many_functions_and_one_kept():
    # 100 functions, only one is kept
    code = "\n".join([f"def func{i}():\n    return {i}" for i in range(100)])
    expected = "def func42():\n    return 42"
    codeflash_output = remove_unused_definitions_by_function_names(code, {"func42"})
    result = codeflash_output  # 54.6ms -> 51.1ms (6.91% faster)


def test_many_classes_and_methods():
    # 10 classes, each with 10 methods, keep only one method
    code = "\n".join(
        [
            f"class C{i}:\n" + "\n".join([f"    def m{j}(self): return {i * 10 + j}" for j in range(10)])
            for i in range(10)
        ]
    )
    expected = (
        "class C3:\n"
        "    def m7(self): return 37\n"
        "    def m0(self): return 30\n"
        "    def m1(self): return 31\n"
        "    def m2(self): return 32\n"
        "    def m3(self): return 33\n"
        "    def m4(self): return 34\n"
        "    def m5(self): return 35\n"
        "    def m6(self): return 36\n"
        "    def m8(self): return 38\n"
        "    def m9(self): return 39"
    )
    # If you specify a method, the whole class is kept (with all methods)
    codeflash_output = remove_unused_definitions_by_function_names(code, {"C3.m7"})
    result = codeflash_output  # 60.3ms -> 58.5ms (3.17% faster)


def test_large_number_of_variables():
    # 100 variables, only one used
    code = (
        "\n".join([f"x{i} = {i}" for i in range(100)])
        + "\n"
        + """
    def foo():
        return x42
    """
    )
    expected = "x42 = 42\n\ndef foo():\n    return x42"
    codeflash_output = remove_unused_definitions_by_function_names(code, {"foo"})
    result = codeflash_output  # 1.47ms -> 1.41ms (4.35% faster)
    # All other variables should be removed
    for i in range(100):
        if i == 42:
            continue


def test_large_mixed_code():
    # 20 functions, 20 classes, 20 variables, keep half of each
    funcs = [f"def f{i}(): return v{i}" for i in range(20)]
    vars_ = [f"v{i} = {i}" for i in range(20)]
    classes = [f"class C{i}:\n    def m(self): return v{i}" for i in range(20)]
    code = "\n".join(vars_ + funcs + classes)
    keep_funcs = {f"f{i}" for i in range(0, 20, 2)}
    keep_classes = {f"C{i}.m" for i in range(1, 20, 2)}
    keep_names = keep_funcs | keep_classes
    codeflash_output = remove_unused_definitions_by_function_names(code, keep_names)
    result = codeflash_output  # 32.4ms -> 31.4ms (3.43% faster)
    # Check that only the right functions and classes are present
    for i in range(20):
        if i % 2 == 0:
            pass
        else:
            pass
        # Unkept functions/classes should not be present
        if i % 2 == 1:
            pass
        if i % 2 == 0:
            pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr935-2025-11-21T18.54.15 and push.

Codeflash Static Badge

The optimized code achieves a **15% speedup** through three key optimizations that target expensive operations identified in the profiler:

**1. Early Exit for Empty Qualified Functions (Major Impact)**
- Added a guard clause in `collect_top_level_defs_with_usages()` that returns early if `qualified_function_names` is empty
- This completely skips the expensive CST visitor pattern (`wrapper.visit(dependency_collector)`) which accounts for 82.6% of the function's runtime
- Test results show dramatic speedups (706-3748% faster) for cases with empty function sets, indicating this optimization has substantial impact when no specific functions need to be preserved

**2. Loop Optimization with Local Variable Caching**
- Cached `new_children.append` as `append_child` and `remove_unused_definitions_recursively` as `rr` to eliminate repeated attribute lookups in the hot loop
- The profiler shows this loop executing 2,777 times and consuming 9.5% of total runtime through recursive calls
- Attribute lookups in Python are relatively expensive, so caching these references provides measurable improvement in tight loops

**3. Tuple Unpacking Elimination**
- Replaced tuple unpacking `modified_module, _ = remove_unused_definitions_recursively(...)` with direct indexing to avoid creating temporary tuples
- While a micro-optimization, it reduces object allocation overhead in the main execution path

**Impact Based on Function Usage:**
The function references show this is called from `extract_code_string_context_from_files()` and `extract_code_markdown_context_from_files()`, both of which process multiple files and function sets during code context extraction. The early exit optimization is particularly valuable here since many files may have empty qualified function sets, allowing the system to skip expensive dependency analysis entirely.

The optimizations are most effective for:
- Large codebases where many files don't contain target functions (early exit benefit)
- Complex AST structures with deep nesting (loop optimization benefit) 
- Batch processing scenarios where the function is called repeatedly (cumulative micro-optimization benefits)
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 21, 2025

append_child = new_children.append # Local for speed
# Minimize attribute lookup in loop
rr = remove_unused_definitions_recursively
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad optimization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not even an attribute

# Apply the recursive removal transformation
modified_module, _ = remove_unused_definitions_recursively(module, defs_with_usages)
result = remove_unused_definitions_recursively(module, defs_with_usages)
modified_module = result[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary

@codeflash-ai codeflash-ai bot closed this Nov 21, 2025
Base automatically changed from fix/ctx-global-definitions-deps to main November 21, 2025 20:47
@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Nov 21, 2025

This PR has been automatically closed because the original PR #935 by mohammedahmed18 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr935-2025-11-21T18.54.15 branch November 21, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants