Skip to content

⚡️ Speed up function call_graph_summary by 151% in PR #1460 (call-graphee)#1462

Merged
KRRT7 merged 2 commits intocall-grapheefrom
codeflash/optimize-pr1460-2026-02-12T06.40.48
Feb 12, 2026
Merged

⚡️ Speed up function call_graph_summary by 151% in PR #1460 (call-graphee)#1462
KRRT7 merged 2 commits intocall-grapheefrom
codeflash/optimize-pr1460-2026-02-12T06.40.48

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 12, 2026

⚡️ This pull request contains optimizations for PR #1460

If you approve this dependent PR, these changes will be merged into the original PR branch call-graphee.

This PR will be automatically closed if the original PR is merged.


📄 151% (1.51x) speedup for call_graph_summary in codeflash/cli_cmds/console.py

⏱️ Runtime : 7.86 microseconds 3.12 microseconds (best of 32 runs)

📝 Explanation and details

The optimized code achieves a 151% speedup (from 7.86 to 3.12 microseconds) primarily through three key optimizations:

1. Module-Level Import Hoisting

Moving from rich.panel import Panel from inside call_graph_summary() to the top-level module imports eliminates repeated import overhead on every function call. The line profiler shows this import took ~30,000 ns in the original (0.5% of total time). While seemingly small, this overhead is completely eliminated in the optimized version.

2. C-Level Aggregation with Built-in sum()

The optimization replaces Python-level accumulation loops with native sum() calls that execute at C speed:

Original approach (manual accumulation):

total_callees = 0
with_context = 0
for count in callee_counts.values():
    total_callees += count
    if count > 0:
        with_context += 1

This loop incurred ~828,000 ns across 2,005 iterations (234,973 + 301,448 + 292,402 ns).

Optimized approach (C-level sum):

total_callees = sum(callee_counts.values())
with_context = sum(1 for count in callee_counts.values() if count > 0)

The new approach completes in ~405,000 ns total (16,399 + 389,145 ns) - nearly 2x faster for the aggregation logic alone.

3. Leveraging map() for Initial Summation

Using sum(map(len, file_to_funcs.values())) instead of a generator expression provides a minor efficiency gain by pushing the iteration into C-level code, though the improvement here is marginal (34,533 ns → 24,396 ns).

Performance Characteristics

Based on the annotated tests, these optimizations excel when:

  • Large-scale scenarios: The test_large_scale_many_functions_single_file (1000 functions) and test_large_scale_multiple_files_distribution (1000 functions across 10 files) benefit most from reduced per-iteration overhead
  • Frequent invocations: If call_graph_summary() is called multiple times in a session, the eliminated import overhead compounds savings
  • Non-empty function sets: The optimization's impact is proportional to the number of callees being aggregated

The changes preserve all behavior - same summary text, same Panel display, same LSP handling - while delivering substantial runtime improvements through strategic use of Python's built-in functions that leverage optimized C implementations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 7 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 88.9%
🌀 Click to see Generated Regression Tests
import os
from pathlib import Path
from types import SimpleNamespace

import pytest
from codeflash.cli_cmds.console import call_graph_summary
from codeflash.lsp.helpers import is_LSP_enabled

# Helper to make a lightweight fake "call graph" object that provides the required method.
# We use SimpleNamespace instead of defining a class to comply with test rules (no custom domain classes).
# The object must expose count_callees_per_function(mapping: dict[Path, set[str]]) -> dict[str,int]
def make_call_graph_from_mapping(return_mapping: dict[str, int]):
    # Create a function that accepts the mapping (file_path -> set of qualified names)
    # and returns the predetermined mapping. The implementation is intentionally simple
    # and deterministic for testing.
    def count_callees_per_function(file_path_to_qualified_names):
        # Validate the input shape a bit to catch some misuse in tests
        if not isinstance(file_path_to_qualified_names, dict):
            raise TypeError("expected a dict mapping file paths to sets of names")
        # Ensure sets of strings are passed in (as the real code would do)
        for val in file_path_to_qualified_names.values():
            if not isinstance(val, set):
                raise TypeError("expected sets of qualified names as values")
        return dict(return_mapping)  # return a shallow copy to avoid mutation surprises

    return SimpleNamespace(count_callees_per_function=count_callees_per_function)

def test_no_functions_produces_no_output(capsys):
    # If there are no functions across all files, call_graph_summary should return early and print nothing.
    call_graph = make_call_graph_from_mapping({})  # mapping won't be used because no functions exist
    file_to_funcs = {}  # empty mapping -> total_functions == 0
    # Should not raise and should print nothing
    call_graph_summary(call_graph, file_to_funcs)
    captured = capsys.readouterr()

def test_single_function_with_no_callees_prints_summary(capsys):
    # Test a single function with zero callees; validate summary text and numeric formatting.
    fn = SimpleNamespace(qualified_name="module.single")
    file_to_funcs = {Path("a.py"): [fn]}
    call_graph = make_call_graph_from_mapping({"module.single": 0})
    # Run the function; because LSP is disabled the console Panel will be printed to stdout
    call_graph_summary(call_graph, file_to_funcs)
    out = capsys.readouterr().out

def test_multiple_functions_with_mixed_callees(capsys):
    # Two functions: one calls another (callee count 1), one self-contained (0)
    f1 = SimpleNamespace(qualified_name="pkg.f1")
    f2 = SimpleNamespace(qualified_name="pkg.f2")
    file_to_funcs = {Path("b.py"): [f1, f2]}
    call_graph = make_call_graph_from_mapping({"pkg.f1": 1, "pkg.f2": 0})
    call_graph_summary(call_graph, file_to_funcs)
    out = capsys.readouterr().out

def test_empty_lists_in_file_to_funcs_produces_no_output(capsys):
    # Even if files are present but lists are empty, total_functions == 0 and nothing should be printed.
    file_to_funcs = {Path("empty.py"): [], Path("also_empty.py"): []}
    call_graph = make_call_graph_from_mapping({})
    call_graph_summary(call_graph, file_to_funcs)
    captured = capsys.readouterr()

def test_qualified_names_with_special_characters(capsys):
    # Ensure names with quotes and unicode characters do not crash the summary generation.
    # We only assert numbers are correctly shown; rich may escape or alter text formatting.
    special1 = SimpleNamespace(qualified_name='weird"name')
    special2 = SimpleNamespace(qualified_name="uniçodeƒ")
    file_to_funcs = {Path("special.py"): [special1, special2]}
    call_graph = make_call_graph_from_mapping({'weird"name': 0, "uniçodeƒ": 2})
    call_graph_summary(call_graph, file_to_funcs)
    out = capsys.readouterr().out

def test_large_scale_many_functions_single_file(capsys):
    # Build 1000 functions in a single file and provide callee counts that follow a simple pattern.
    n = 1000
    funcs = [SimpleNamespace(qualified_name=f"f{i}") for i in range(n)]
    file_to_funcs = {Path("big.py"): funcs}
    # Create mapping: f{i} has i % 4 callees, deterministic pattern
    mapping = {f"f{i}": (i % 4) for i in range(n)}
    call_graph = make_call_graph_from_mapping(mapping)
    # Run the summary (should be reasonably fast)
    call_graph_summary(call_graph, file_to_funcs)
    out = capsys.readouterr().out
    # Compute expected totals
    total_functions = n
    total_callees = sum(mapping.values())
    expected_avg = total_callees / total_functions
    # with_context is number of functions with count > 0
    with_context = sum(1 for v in mapping.values() if v > 0)
    leaf_functions = total_functions - with_context

def test_large_scale_multiple_files_distribution(capsys):
    # 1000 functions distributed across 10 files; verify totals remain correct.
    total = 1000
    per_file = 100
    file_to_funcs = {}
    mapping = {}
    for file_idx in range(10):
        funcs = []
        for i in range(per_file):
            idx = file_idx * per_file + i
            name = f"g{idx}"
            funcs.append(SimpleNamespace(qualified_name=name))
            # pattern: even-indexed functions have 2 callees, odd-indexed have 0
            mapping[name] = 2 if idx % 2 == 0 else 0
        file_to_funcs[Path(f"file_{file_idx}.py")] = funcs

    call_graph = make_call_graph_from_mapping(mapping)
    call_graph_summary(call_graph, file_to_funcs)
    out = capsys.readouterr().out

    total_callees = sum(mapping.values())
    expected_avg = total_callees / total
    with_context = sum(1 for v in mapping.values() if v > 0)
    leaf_functions = total - with_context
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1460-2026-02-12T06.40.48 and push.

Codeflash Static Badge

The optimized code achieves a **151% speedup** (from 7.86 to 3.12 microseconds) primarily through three key optimizations:

## 1. Module-Level Import Hoisting
Moving `from rich.panel import Panel` from inside `call_graph_summary()` to the top-level module imports eliminates repeated import overhead on every function call. The line profiler shows this import took ~30,000 ns in the original (0.5% of total time). While seemingly small, this overhead is completely eliminated in the optimized version.

## 2. C-Level Aggregation with Built-in `sum()`
The optimization replaces Python-level accumulation loops with native `sum()` calls that execute at C speed:

**Original approach** (manual accumulation):
```python
total_callees = 0
with_context = 0
for count in callee_counts.values():
    total_callees += count
    if count > 0:
        with_context += 1
```
This loop incurred ~828,000 ns across 2,005 iterations (234,973 + 301,448 + 292,402 ns).

**Optimized approach** (C-level sum):
```python
total_callees = sum(callee_counts.values())
with_context = sum(1 for count in callee_counts.values() if count > 0)
```
The new approach completes in ~405,000 ns total (16,399 + 389,145 ns) - nearly **2x faster** for the aggregation logic alone.

## 3. Leveraging `map()` for Initial Summation
Using `sum(map(len, file_to_funcs.values()))` instead of a generator expression provides a minor efficiency gain by pushing the iteration into C-level code, though the improvement here is marginal (34,533 ns → 24,396 ns).

## Performance Characteristics
Based on the annotated tests, these optimizations excel when:
- **Large-scale scenarios**: The `test_large_scale_many_functions_single_file` (1000 functions) and `test_large_scale_multiple_files_distribution` (1000 functions across 10 files) benefit most from reduced per-iteration overhead
- **Frequent invocations**: If `call_graph_summary()` is called multiple times in a session, the eliminated import overhead compounds savings
- **Non-empty function sets**: The optimization's impact is proportional to the number of callees being aggregated

The changes preserve all behavior - same summary text, same Panel display, same LSP handling - while delivering substantial runtime improvements through strategic use of Python's built-in functions that leverage optimized C implementations.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 12, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 12, 2026
2 tasks
@claude
Copy link
Contributor

claude bot commented Feb 12, 2026

PR Review Summary

Prek Checks

Passed (after auto-fix)

Fixed 2 issues:

  • Import sorting: moved from rich.panel import Panel to proper sorted position with other rich imports
  • Formatting: removed extra blank line in call_graph_summary

Committed and pushed as style: auto-fix linting issues (182c1b0).

Mypy

⚠️ 17 pre-existing errors in codeflash/cli_cmds/console.py (lines 96-97, 120, 157, 165, 190) — all from code not touched by this PR. No new type errors introduced.

Code Review

No critical issues found

This is a codeflash optimization PR targeting call_graph_summary in codeflash/cli_cmds/console.py. The changes are:

  1. Module-level import: Moved from rich.panel import Panel from inside the function to the top-level — eliminates repeated import overhead
  2. sum(map(len, ...)) instead of generator expression: Minor C-level optimization for counting total functions
  3. sum() for aggregation: Replaces manual for loop accumulation with sum(callee_counts.values()) and sum(1 for count in ... if count > 0) — functionally equivalent, leverages C-level iteration

All changes are behavior-preserving. No bugs, security issues, or breaking API changes.

Test Coverage

File Stmts Miss Cover Notes
codeflash/cli_cmds/console.py 178 132 26% UI/display module — low coverage expected
  • Overall project coverage: 79%
  • call_graph_summary (lines 321-350): Not covered by unit tests. This is pre-existing — the parent PR (feat: add reference graph for Python #1460) added this function without unit tests for it. The codeflash-generated regression tests in the PR description do exercise the function.
  • call_graph_live_display (lines 213-318): Also uncovered — added by parent PR feat: add reference graph for Python #1460, not by this optimization PR.
  • ⚠️ Coverage comparison vs main not possible since the base branch is call-graphee, which contains significant new code not on main.

Last updated: 2026-02-12

@KRRT7 KRRT7 merged commit e909182 into call-graphee Feb 12, 2026
25 of 28 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1460-2026-02-12T06.40.48 branch February 12, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant