-
Notifications
You must be signed in to change notification settings - Fork 22
runtime timing info even if codeflash_output assignment is missing
#488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
…488 (`fix-runtime-comments`) Certainly! Based on your line profiling, the vast majority of the time in `find_codeflash_output_assignments` is spent in `visitor.visit(tree)` (82.1%), with notable time in `ast.parse` as well (17.6%). Since you did not provide the implementation for `CfoVisitor`, I’ll focus on optimizing the import, AST parsing, and visitor instantiation. // In this function, the main opportunities are efficient use of parsing and traversal, and minimizing repeated work. ### Fast Optimized Version - We use `ast.parse` directly, and this is already quite optimal for parsing, but we can consider reducing memory footprint by using `'eval'` mode if reasonable (not likely useful here). - **Critical:** As the bottleneck is in the `visitor.visit(tree)`, improvement will mainly be realized by rewriting or improving `CfoVisitor`, which is not provided. - There is unnecessary passing of `source_code` to the visitor constructor unless it’s specifically required for the visitor operation. If not used, don’t pass it. - Removing future import if not needed, as it creates a small overhead (but it is sometimes needed for type hinting). - You may benefit slightly from using the `__slots__` declaration on the visitor class if it uses instance attributes heavily (only if you can edit it). - If `CfoVisitor` can be made to use `ast.NodeVisitor.generic_visit` in C (or faster `ast.walk`), do so. But without its code, this can't be rewritten here. Here's an optimal rewrite **without changing the function signature or the visitor** (assuming you can't optimize `CfoVisitor`, because its code is not present). #### Additional Suggestions If you **can** alter `CfoVisitor`, consider the following for major runtime wins. - Use `__slots__` on visitor class to reduce memory use and instantiation time. - Batch process nodes as much as possible rather than recursive methods. - Minimize string manipulation/attributes. - If you only want line numbers, ensure `CfoVisitor` does not store references to nodes; just immediate integers. #### Example Visitor Optimization If the visitor is slow due to excessive recursion or attribute access, a fast pattern using `ast.walk` (if applicable) is like so (this would replace `CfoVisitor` logic). **But do not change this unless you know exactly what CfoVisitor does!** --- ## Summary. **With current information, the program is already minimal and optimal.** Major improvements will come from optimizing (or rewriting) the `CfoVisitor`. The functional wrapper provided is already essentially as optimal as possible unless you inline the visitor logic or batch your traversal. If you can provide the body of `CfoVisitor`, much larger runtime reductions are possible! Let me know if you'd like a deeper optimization with that code.
…% in PR #488 (`fix-runtime-comments`) Here’s a heavily optimized rewrite of your function, focused on the main bottleneck: the `tree.visit(transformer)` call inside the main loop (~95% of your runtime!). Across the entire function, the following optimizations (all applied **without changing any functional output**) are used. 1. **Precompute Data Structures:** Several expensive operations (especially `relative_to` path gymnastics and conversions) are moved out of inner loops and stored as sensible lookups, since their results are almost invariant across tests. 2. **Merge For Loops:** The two near-identical `for` loops per invocation in `leave_SimpleStatementLine` are merged into one, halving search cost. 3. **Optimize Invocation Matching:** An indexed lookup is pre-built mapping the unique tuple keys `(rel_path, qualified_name, cfo_loc)` to their runtimes. This makes runtime-access O(1) instead of requiring a full scan per statement. 4. **Avoid Deep AST/Normalized Source Walks:** If possible, recommend optimizing `find_codeflash_output_assignments` to operate on the CST or directly on the parsed AST rather than reparsing source code. (**The code preserves your current approach but this is a further large opportunity.**) 5. **Faster CST Name/Call detection:** The `leave_SimpleStatementLine`’s `_contains_myfunc_call` is further micro-optimized by breaking as soon as a match is found (using exception for early escape), avoiding unnecessary traversal. 6. **Minimize Object Creations:** The `GeneratedTests` objects are only constructed once and appended. 7. **Eliminating Minor Redundant Computation.** 8. **Reduce try/except Overhead:** Only exceptions propagate; no functional change here. Below is the optimized code, with comments kept as close as possible to your original code (apart from changed logic). **Summary of key gains:** - The O(N*M) runtimes loop is now O(1) due to hash indexes. - All constant/cached values are precomputed outside the node visitor. - Deep tree walks and list traversals have early exits and critical-path logic is tightened. - No functional changes, all corner cases preserved. **Still slow?**: The biggest remaining hit will be the `find_codeflash_output_assignments` (which reparses source); move this to operate directly on CST if possible for further big wins. Let me know your measured speedup! 🚀
⚡️ Codeflash found optimizations for this PR📄 239% (2.39x) speedup for
|
|
have a newer one with more bugfixes |
PR Type
Enhancement
Description
Refactor runtime comment injection to use qualified_name
Replace codeflash_output assignment checks with AST call visitor
Propagate new qualified_name parameter through APIs
Update tests to pass and validate qualified_name handling
Changes diagram
Changes walkthrough 📝
edit_generated_tests.py
Use qualified_name and Call visitor for runtime commentscodeflash/code_utils/edit_generated_tests.py
qualifed_nameparameter to visitor and transformervisit_Assign/visit_AugAssignmethodsvisit_Callto detect function name calls_contains_myfunc_callfor statement scanningfunction_optimizer.py
Propagate qualified_name to runtime comment APIcodeflash/optimization/function_optimizer.py
qualified_nametoadd_runtime_comments_to_generated_testsqualified_nameusingself.function_to_optimizetest_add_runtime_comments.py
Update tests for qualified_name parametertests/test_add_runtime_comments.py
qualified_nameadd_runtime_comments_to_generated_tests