Skip to content

Conversation

@mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Nov 21, 2025

this resolves two main issues:

1- the global definitions are not extracted inside the read-writable code context resulting frequently into mock variables when generating the tests which led to a failed optimization

2- when deciding for each function dependencies we conflict the class attributes with module-level variables if they have the same name, for example

x = "some value"
class MY_CLASS
    def __init__(self):
        self.x = 1

it will say that MY_CLASS depends on the global definition x which is not correct

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In DependencyCollector.visit_Name, class attribute references are skipped only when the parent is exactly a cst.Attribute; this may miss nested attribute/qualified cases or Name nodes not directly under Attribute, potentially under-filtering or over-filtering dependencies. Validate behavior on patterns like self.attr.attr, attributes in calls, and annotated assignments.

def visit_Name(self, node: cst.Name) -> None:
    name = node.value

    # Skip if we're not inside a tracked definition
    if not self.current_top_level_name or self.current_top_level_name not in self.definitions:
        return

    # Skip if we're looking at the variable name itself in an assignment
    if self.processing_variable and name in self.current_variable_names:
        return

    if name in self.definitions and name != self.current_top_level_name:
        # skip if we are refrencing a class attribute and not a top-level definition
        if self.class_depth > 0:
            parent = self.get_metadata(cst.metadata.ParentNodeProvider, node)
            if parent is not None and isinstance(parent, cst.Attribute):
                return
        self.definitions[self.current_top_level_name].dependencies.add(name)
API Change

prune_cst_for_read_writable_code signature changed to require defs_with_usages; ensure all internal callers are updated and that recursive calls always pass this dict to avoid None or mismatched behavior in nested structures.

def prune_cst_for_read_writable_code(  # noqa: PLR0911
    node: cst.CSTNode, target_functions: set[str], defs_with_usages: dict[str, UsageInfo], prefix: str = ""
) -> tuple[cst.CSTNode | None, bool]:
    """Recursively filter the node and its children to build the read-writable codeblock. This contains nodes that lead to target functions.

    Returns
    -------
        (filtered_node, found_target):
          filtered_node: The modified CST node or None if it should be removed.
          found_target: True if a target function was found in this node's subtree.

    """
    if isinstance(node, (cst.Import, cst.ImportFrom)):
        return None, False

    if isinstance(node, cst.FunctionDef):
        qualified_name = f"{prefix}.{node.name.value}" if prefix else node.name.value
        if qualified_name in target_functions:
            return node, True
        return None, False

    if isinstance(node, cst.ClassDef):
        # Do not recurse into nested classes
        if prefix:
            return None, False
        # Assuming always an IndentedBlock
        if not isinstance(node.body, cst.IndentedBlock):
            raise ValueError("ClassDef body is not an IndentedBlock")  # noqa: TRY004
        class_prefix = f"{prefix}.{node.name.value}" if prefix else node.name.value
        new_body = []
        found_target = False

        for stmt in node.body.body:
            if isinstance(stmt, cst.FunctionDef):
                qualified_name = f"{class_prefix}.{stmt.name.value}"
                if qualified_name in target_functions:
                    new_body.append(stmt)
                    found_target = True
                elif stmt.name.value == "__init__":
                    new_body.append(stmt)  # enable __init__ optimizations
        # If no target functions found, remove the class entirely
        if not new_body or not found_target:
            return None, False

        return node.with_changes(body=cst.IndentedBlock(body=new_body)), found_target

    if isinstance(node, cst.Assign):
        for target in node.targets:
            names = extract_names_from_targets(target.target)
            for name in names:
                if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function:
                    return node, True
        return None, False

    if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
        names = extract_names_from_targets(node.target)
        for name in names:
            if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function:
                return node, True
        return None, False

    # For other nodes, we preserve them only if they contain target functions in their children.
    section_names = get_section_names(node)
    if not section_names:
        return node, False

    updates: dict[str, list[cst.CSTNode] | cst.CSTNode] = {}
    found_any_target = False

    for section in section_names:
        original_content = getattr(node, section, None)
        if isinstance(original_content, (list, tuple)):
            new_children = []
            section_found_target = False
            for child in original_content:
                filtered, found_target = prune_cst_for_read_writable_code(
                    child, target_functions, defs_with_usages, prefix
                )
                if filtered:
                    new_children.append(filtered)
                section_found_target |= found_target

            if section_found_target:
                found_any_target = True
                updates[section] = new_children
        elif original_content is not None:
            filtered, found_target = prune_cst_for_read_writable_code(
                original_content, target_functions, defs_with_usages, prefix
            )
            if found_target:
Return Type Mismatch

remove_unused_definitions_by_function_names docstring and type hint suggest returning a tuple[str, dict], but several return paths still return only code (str) on parse failure; align all branches to return the declared tuple.

def remove_unused_definitions_by_function_names(
    code: str, qualified_function_names: set[str]
) -> tuple[str, dict[str, UsageInfo]]:
    """Analyze a file and remove top level definitions not used by specified functions.

    Top level definitions, in this context, are only classes, variables or functions.
    If a class is referenced by a qualified function, we keep the entire class.

    Args:
    ----
        code: The code to process
        qualified_function_names: Set of function names to keep. For methods, use format 'classname.methodname'

    """
    try:
        module = cst.parse_module(code)
    except Exception as e:
        logger.debug(f"Failed to parse code with libcst: {type(e).__name__}: {e}")
        return code

    try:
        defs_with_usages = collect_top_level_defs_with_usages(module, qualified_function_names)

        # Apply the recursive removal transformation
        modified_module, _ = remove_unused_definitions_recursively(module, defs_with_usages)

        return modified_module.code if modified_module else ""  # noqa: TRY300
    except Exception as e:

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix mismatched return type

The function signature now promises a tuple but only returns a string, which will
break callers expecting the usages map. Return both the modified code and
defs_with_usages consistently across all paths. Also, ensure the parse-failure path
returns the expected tuple.

codeflash/context/unused_definition_remover.py [491-518]

 def remove_unused_definitions_by_function_names(
     code: str, qualified_function_names: set[str]
 ) -> tuple[str, dict[str, UsageInfo]]:
     """Analyze a file and remove top level definitions not used by specified functions.
 
     Top level definitions, in this context, are only classes, variables or functions.
     If a class is referenced by a qualified function, we keep the entire class.
 
     This method assumes the class either has no inner classes or the nested classes
     are not referenced by the functions that need to be classified.
     """
     try:
         module = cst.parse_module(code)
     except Exception as e:
         logger.debug(f"Failed to parse code with libcst: {type(e).__name__}: {e}")
-        return code
+        return code, {}
 
     try:
         defs_with_usages = collect_top_level_defs_with_usages(module, qualified_function_names)
 
         # Apply the recursive removal transformation
         modified_module, _ = remove_unused_definitions_recursively(module, defs_with_usages)
 
-        return modified_module.code if modified_module else ""  # noqa: TRY300
+        return (modified_module.code if modified_module else ""), defs_with_usages  # noqa: TRY300
     except Exception as e:
         # If any other error occurs during processing, return the original code
+        return code, {}
Suggestion importance[1-10]: 9

__

Why: The function's signature was changed to return a tuple but the implementation still returns a string; fixing this prevents runtime errors and aligns with the new API by consistently returning (code, defs_with_usages).

High
Prevent unsafe assignment pruning

These branches drop any assignment whose target is not marked used, potentially
removing assignments that define names later used within preserved functions (e.g.,
assignments inside try/except or control flow). Guard the pruning to only operate at
module top-level, or check the parent context to avoid removing inner-scope
statements required for semantics.

codeflash/context/code_context_extractor.py [580-594]

 if isinstance(node, cst.Assign):
-    for target in node.targets:
-        names = extract_names_from_targets(target.target)
+    # Only prune top-level assignments based on defs_with_usages
+    parent = getattr(node, "parent", None)
+    if parent is None or isinstance(parent, cst.Module):
+        for target in node.targets:
+            names = extract_names_from_targets(target.target)
+            for name in names:
+                if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function:
+                    return node, True
+        return None, False
+    return node, False
+
+if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
+    parent = getattr(node, "parent", None)
+    if parent is None or isinstance(parent, cst.Module):
+        names = extract_names_from_targets(node.target)
         for name in names:
             if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function:
                 return node, True
-    return None, False
+        return None, False
+    return node, False
 
-if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
-    names = extract_names_from_targets(node.target)
-    for name in names:
-        if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function:
-            return node, True
-    return None, False
-
Suggestion importance[1-10]: 7

__

Why: As written, assignment pruning may erroneously drop non-top-level assignments; adding a top-level check mitigates potential semantic breakage, improving correctness though the parent access approach may require CST metadata to be robust.

Medium
General
Avoid unused computation

defs_with_usages is computed but not used for the READ_ONLY path, potentially
causing inconsistent pruning logic between modes. Pass defs_with_usages to the
READ_ONLY pruning routine or compute it conditionally only for READ_WRITABLE to
avoid unnecessary work and confusion.

codeflash/context/code_context_extractor.py [505-514]

 def parse_code_and_prune_cst(
     code: str,
     code_context_type: CodeContextType,
     target_functions: set[str],
     helpers_of_helper_functions: set[str] = set(),  # noqa: B006
     remove_docstrings: bool = False,  # noqa: FBT001, FBT002
 ) -> str:
     """Create a read-only version of the code by parsing and filtering the code to keep only class contextual information, and other module scoped variables."""
     module = cst.parse_module(code)
-    defs_with_usages = collect_top_level_defs_with_usages(module, target_functions | helpers_of_helper_functions)
 
     if code_context_type == CodeContextType.READ_WRITABLE:
+        defs_with_usages = collect_top_level_defs_with_usages(module, target_functions | helpers_of_helper_functions)
         filtered_node, found_target = prune_cst_for_read_writable_code(module, target_functions, defs_with_usages)
     elif code_context_type == CodeContextType.READ_ONLY:
         filtered_node, found_target = prune_cst_for_read_only_code(
             module, target_functions, helpers_of_helper_functions, remove_docstrings=remove_docstrings
         )
+    elif code_context_type == CodeContextType.HASHING:
+        filtered_node, found_target = prune_cst_for_code_hashing(module, target_functions)
+    else:
+        filtered_node, found_target = prune_cst_for_test_gen_code(
+            module, target_functions, helpers_of_helper_functions, remove_docstrings=remove_docstrings
+        )
 
+    if filtered_node and found_target:
+        code = str(filtered_node.code)
+        if code_context_type == CodeContextType.HASHING:
+            code = ast.unparse(ast.parse(code))  # Makes it standard
+        return code
+    return ""
+
Suggestion importance[1-10]: 6

__

Why: Computing defs_with_usages only for the READ_WRITABLE path reduces unnecessary work and clarifies intent; moderate impact since it's a small perf/clarity improvement and consistent with current usage.

Low


# In this scenario, the testgen code context is too long, so we abort.
with pytest.raises(ValueError, match="Testgen code context has exceeded token limit, cannot proceed"):
# In this scenario, the read-writable code context is too long because the __init_ function is reftencing the global x variable not the class attribute (x), so we abort.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 21, 2025

⚡️ Codeflash found optimizations for this PR

📄 15% (0.15x) speedup for remove_unused_definitions_by_function_names in codeflash/context/unused_definition_remover.py

⏱️ Runtime : 928 milliseconds 804 milliseconds (best of 12 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fix/ctx-global-definitions-deps).

Static Badge

@aseembits93
Copy link
Contributor

@mohammedahmed18 wait where did you get the codeflash optimization?

@mohammedahmed18
Copy link
Contributor Author

@aseembits93 from the extension by the git commit hook

@KRRT7
Copy link
Contributor

KRRT7 commented Nov 21, 2025

@mohammedahmed18 maybe we should add some identifier so we know when the optimizations are done via the VSC extension?

@aseembits93 aseembits93 merged commit c6ab05f into main Nov 21, 2025
22 checks passed
@aseembits93 aseembits93 deleted the fix/ctx-global-definitions-deps branch November 21, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants