[FIX] Extract the required module names inside the read-writable context (CF-862) #935

mohammedahmed18 · 2025-11-21T18:15:12Z

this resolves two main issues:

1- the global definitions are not extracted inside the read-writable code context resulting frequently into mock variables when generating the tests which led to a failed optimization

2- when deciding for each function dependencies we conflict the class attributes with module-level variables if they have the same name, for example

x = "some value"
class MY_CLASS
    def __init__(self):
        self.x = 1

it will say that MY_CLASS depends on the global definition x which is not correct

github-actions · 2025-11-21T18:16:00Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue In DependencyCollector.visit_Name, class attribute references are skipped only when the parent is exactly a cst.Attribute; this may miss nested attribute/qualified cases or Name nodes not directly under Attribute, potentially under-filtering or over-filtering dependencies. Validate behavior on patterns like self.attr.attr, attributes in calls, and annotated assignments. def visit_Name(self, node: cst.Name) -> None: name = node.value # Skip if we're not inside a tracked definition if not self.current_top_level_name or self.current_top_level_name not in self.definitions: return # Skip if we're looking at the variable name itself in an assignment if self.processing_variable and name in self.current_variable_names: return if name in self.definitions and name != self.current_top_level_name: # skip if we are refrencing a class attribute and not a top-level definition if self.class_depth > 0: parent = self.get_metadata(cst.metadata.ParentNodeProvider, node) if parent is not None and isinstance(parent, cst.Attribute): return self.definitions[self.current_top_level_name].dependencies.add(name) API Change prune_cst_for_read_writable_code signature changed to require defs_with_usages; ensure all internal callers are updated and that recursive calls always pass this dict to avoid None or mismatched behavior in nested structures. def prune_cst_for_read_writable_code( # noqa: PLR0911 node: cst.CSTNode, target_functions: set[str], defs_with_usages: dict[str, UsageInfo], prefix: str = "" ) -> tuple[cst.CSTNode \| None, bool]: """Recursively filter the node and its children to build the read-writable codeblock. This contains nodes that lead to target functions. Returns ------- (filtered_node, found_target): filtered_node: The modified CST node or None if it should be removed. found_target: True if a target function was found in this node's subtree. """ if isinstance(node, (cst.Import, cst.ImportFrom)): return None, False if isinstance(node, cst.FunctionDef): qualified_name = f"{prefix}.{node.name.value}" if prefix else node.name.value if qualified_name in target_functions: return node, True return None, False if isinstance(node, cst.ClassDef): # Do not recurse into nested classes if prefix: return None, False # Assuming always an IndentedBlock if not isinstance(node.body, cst.IndentedBlock): raise ValueError("ClassDef body is not an IndentedBlock") # noqa: TRY004 class_prefix = f"{prefix}.{node.name.value}" if prefix else node.name.value new_body = [] found_target = False for stmt in node.body.body: if isinstance(stmt, cst.FunctionDef): qualified_name = f"{class_prefix}.{stmt.name.value}" if qualified_name in target_functions: new_body.append(stmt) found_target = True elif stmt.name.value == "__init__": new_body.append(stmt) # enable __init__ optimizations # If no target functions found, remove the class entirely if not new_body or not found_target: return None, False return node.with_changes(body=cst.IndentedBlock(body=new_body)), found_target if isinstance(node, cst.Assign): for target in node.targets: names = extract_names_from_targets(target.target) for name in names: if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function: return node, True return None, False if isinstance(node, (cst.AnnAssign, cst.AugAssign)): names = extract_names_from_targets(node.target) for name in names: if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function: return node, True return None, False # For other nodes, we preserve them only if they contain target functions in their children. section_names = get_section_names(node) if not section_names: return node, False updates: dict[str, list[cst.CSTNode] \| cst.CSTNode] = {} found_any_target = False for section in section_names: original_content = getattr(node, section, None) if isinstance(original_content, (list, tuple)): new_children = [] section_found_target = False for child in original_content: filtered, found_target = prune_cst_for_read_writable_code( child, target_functions, defs_with_usages, prefix ) if filtered: new_children.append(filtered) section_found_target \|= found_target if section_found_target: found_any_target = True updates[section] = new_children elif original_content is not None: filtered, found_target = prune_cst_for_read_writable_code( original_content, target_functions, defs_with_usages, prefix ) if found_target: Return Type Mismatch remove_unused_definitions_by_function_names docstring and type hint suggest returning a tuple[str, dict], but several return paths still return only code (str) on parse failure; align all branches to return the declared tuple. def remove_unused_definitions_by_function_names( code: str, qualified_function_names: set[str] ) -> tuple[str, dict[str, UsageInfo]]: """Analyze a file and remove top level definitions not used by specified functions. Top level definitions, in this context, are only classes, variables or functions. If a class is referenced by a qualified function, we keep the entire class. Args: ---- code: The code to process qualified_function_names: Set of function names to keep. For methods, use format 'classname.methodname' """ try: module = cst.parse_module(code) except Exception as e: logger.debug(f"Failed to parse code with libcst: {type(e).__name__}: {e}") return code try: defs_with_usages = collect_top_level_defs_with_usages(module, qualified_function_names) # Apply the recursive removal transformation modified_module, _ = remove_unused_definitions_recursively(module, defs_with_usages) return modified_module.code if modified_module else "" # noqa: TRY300 except Exception as e:

github-actions · 2025-11-21T18:16:26Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix mismatched return type The function signature now promises a tuple but only returns a string, which will break callers expecting the usages map. Return both the modified code and `defs_with_usages` consistently across all paths. Also, ensure the parse-failure path returns the expected tuple. codeflash/context/unused_definition_remover.py [491-518] def remove_unused_definitions_by_function_names( code: str, qualified_function_names: set[str] ) -> tuple[str, dict[str, UsageInfo]]: """Analyze a file and remove top level definitions not used by specified functions. Top level definitions, in this context, are only classes, variables or functions. If a class is referenced by a qualified function, we keep the entire class. This method assumes the class either has no inner classes or the nested classes are not referenced by the functions that need to be classified. """ try: module = cst.parse_module(code) except Exception as e: logger.debug(f"Failed to parse code with libcst: {type(e).__name__}: {e}") - return code + return code, {} try: defs_with_usages = collect_top_level_defs_with_usages(module, qualified_function_names) # Apply the recursive removal transformation modified_module, _ = remove_unused_definitions_recursively(module, defs_with_usages) - return modified_module.code if modified_module else "" # noqa: TRY300 + return (modified_module.code if modified_module else ""), defs_with_usages # noqa: TRY300 except Exception as e: # If any other error occurs during processing, return the original code + return code, {} Suggestion importance[1-10]: 9 __ Why: The function's signature was changed to return a tuple but the implementation still returns a string; fixing this prevents runtime errors and aligns with the new API by consistently returning `(code, defs_with_usages)`.	High
Possible issue	Prevent unsafe assignment pruning These branches drop any assignment whose target is not marked used, potentially removing assignments that define names later used within preserved functions (e.g., assignments inside `try/except` or control flow). Guard the pruning to only operate at module top-level, or check the parent context to avoid removing inner-scope statements required for semantics. codeflash/context/code_context_extractor.py [580-594] if isinstance(node, cst.Assign): - for target in node.targets: - names = extract_names_from_targets(target.target) + # Only prune top-level assignments based on defs_with_usages + parent = getattr(node, "parent", None) + if parent is None or isinstance(parent, cst.Module): + for target in node.targets: + names = extract_names_from_targets(target.target) + for name in names: + if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function: + return node, True + return None, False + return node, False + +if isinstance(node, (cst.AnnAssign, cst.AugAssign)): + parent = getattr(node, "parent", None) + if parent is None or isinstance(parent, cst.Module): + names = extract_names_from_targets(node.target) for name in names: if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function: return node, True - return None, False + return None, False + return node, False -if isinstance(node, (cst.AnnAssign, cst.AugAssign)): - names = extract_names_from_targets(node.target) - for name in names: - if name in defs_with_usages and defs_with_usages[name].used_by_qualified_function: - return node, True - return None, False - Suggestion importance[1-10]: 7 __ Why: As written, assignment pruning may erroneously drop non-top-level assignments; adding a top-level check mitigates potential semantic breakage, improving correctness though the parent access approach may require CST metadata to be robust.	Medium
General	Avoid unused computation `defs_with_usages` is computed but not used for the READ_ONLY path, potentially causing inconsistent pruning logic between modes. Pass `defs_with_usages` to the READ_ONLY pruning routine or compute it conditionally only for READ_WRITABLE to avoid unnecessary work and confusion. codeflash/context/code_context_extractor.py [505-514] def parse_code_and_prune_cst( code: str, code_context_type: CodeContextType, target_functions: set[str], helpers_of_helper_functions: set[str] = set(), # noqa: B006 remove_docstrings: bool = False, # noqa: FBT001, FBT002 ) -> str: """Create a read-only version of the code by parsing and filtering the code to keep only class contextual information, and other module scoped variables.""" module = cst.parse_module(code) - defs_with_usages = collect_top_level_defs_with_usages(module, target_functions \| helpers_of_helper_functions) if code_context_type == CodeContextType.READ_WRITABLE: + defs_with_usages = collect_top_level_defs_with_usages(module, target_functions \| helpers_of_helper_functions) filtered_node, found_target = prune_cst_for_read_writable_code(module, target_functions, defs_with_usages) elif code_context_type == CodeContextType.READ_ONLY: filtered_node, found_target = prune_cst_for_read_only_code( module, target_functions, helpers_of_helper_functions, remove_docstrings=remove_docstrings ) + elif code_context_type == CodeContextType.HASHING: + filtered_node, found_target = prune_cst_for_code_hashing(module, target_functions) + else: + filtered_node, found_target = prune_cst_for_test_gen_code( + module, target_functions, helpers_of_helper_functions, remove_docstrings=remove_docstrings + ) + if filtered_node and found_target: + code = str(filtered_node.code) + if code_context_type == CodeContextType.HASHING: + code = ast.unparse(ast.parse(code)) # Makes it standard + return code + return "" + Suggestion importance[1-10]: 6 __ Why: Computing `defs_with_usages` only for the READ_WRITABLE path reduces unnecessary work and clarifies intent; moderate impact since it's a small perf/clarity improvement and consistent with current usage.	Low

…-global-definitions-deps

…h-ai/codeflash into fix/ctx-global-definitions-deps

misrasaurabh1 · 2025-11-21T18:51:52Z

tests/test_code_context_extractor.py


-    # In this scenario, the testgen code context is too long, so we abort.
-    with pytest.raises(ValueError, match="Testgen code context has exceeded token limit, cannot proceed"):
+    # In this scenario, the read-writable code context is too long because the __init_ function is reftencing the global x variable not the class attribute (x), so we abort.


codeflash-ai · 2025-11-21T18:54:24Z

⚡️ Codeflash found optimizations for this PR

📄 15% (0.15x) speedup for `remove_unused_definitions_by_function_names` in `codeflash/context/unused_definition_remover.py`

⏱️ Runtime : 928 milliseconds → 804 milliseconds (best of 12 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function remove_unused_definitions_by_function_names by 15% in PR #935 (fix/ctx-global-definitions-deps) #936

If you approve, it will be merged into this PR (branch fix/ctx-global-definitions-deps).

aseembits93 · 2025-11-21T19:12:17Z

@mohammedahmed18 wait where did you get the codeflash optimization?

mohammedahmed18 · 2025-11-21T19:14:08Z

@aseembits93 from the extension by the git commit hook

KRRT7 · 2025-11-21T20:12:30Z

@mohammedahmed18 maybe we should add some identifier so we know when the optimizations are done via the VSC extension?

keep the refrenced global definitions

15d2027

github-actions bot added the Review effort 3/5 label Nov 21, 2025

mohammedahmed18 requested review from KRRT7, aseembits93 and misrasaurabh1 November 21, 2025 18:19

KRRT7 and others added 4 commits November 21, 2025 13:20

Merge branch 'main' into fix/ctx-global-definitions-deps

0020bf8

fix typing issue

e210a31

Merge branch 'main' of github.com:codeflash-ai/codeflash into fix/ctx…

29b1eb5

…-global-definitions-deps

Merge branch 'fix/ctx-global-definitions-deps' of github.com:codeflas…

2c52be9

…h-ai/codeflash into fix/ctx-global-definitions-deps

misrasaurabh1 reviewed Nov 21, 2025

View reviewed changes

typo

b464c4d

codeflash-ai bot mentioned this pull request Nov 21, 2025

⚡️ Speed up function remove_unused_definitions_by_function_names by 15% in PR #935 (fix/ctx-global-definitions-deps) #936

Closed

codeflash optimization

73e43e2

Merge branch 'main' into fix/ctx-global-definitions-deps

da590d9

KRRT7 approved these changes Nov 21, 2025

View reviewed changes

aseembits93 merged commit c6ab05f into main Nov 21, 2025
22 checks passed

aseembits93 deleted the fix/ctx-global-definitions-deps branch November 21, 2025 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] Extract the required module names inside the read-writable context (CF-862) #935

[FIX] Extract the required module names inside the read-writable context (CF-862) #935

mohammedahmed18 commented Nov 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

misrasaurabh1 Nov 21, 2025

Uh oh!

codeflash-ai bot commented Nov 21, 2025

⚡️ Speed up function `remove_unused_definitions_by_function_names` by 15% in PR #935 (`fix/ctx-global-definitions-deps`) #936

Uh oh!

aseembits93 commented Nov 21, 2025

Uh oh!

mohammedahmed18 commented Nov 21, 2025

Uh oh!

KRRT7 commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[FIX] Extract the required module names inside the read-writable context (CF-862) #935

[FIX] Extract the required module names inside the read-writable context (CF-862) #935

Conversation

mohammedahmed18 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 21, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Nov 21, 2025

PR Code Suggestions ✨

Uh oh!

misrasaurabh1 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Nov 21, 2025

⚡️ Codeflash found optimizations for this PR

📄 15% (0.15x) speedup for remove_unused_definitions_by_function_names in codeflash/context/unused_definition_remover.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function remove_unused_definitions_by_function_names by 15% in PR #935 (fix/ctx-global-definitions-deps) #936

Uh oh!

aseembits93 commented Nov 21, 2025

Uh oh!

mohammedahmed18 commented Nov 21, 2025

Uh oh!

KRRT7 commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mohammedahmed18 commented Nov 21, 2025 •

edited

Loading

📄 15% (0.15x) speedup for `remove_unused_definitions_by_function_names` in `codeflash/context/unused_definition_remover.py`

⚡️ Speed up function `remove_unused_definitions_by_function_names` by 15% in PR #935 (`fix/ctx-global-definitions-deps`) #936