Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Aug 6, 2025

⚡️ This pull request contains optimizations for PR #620

If you approve this dependent PR, these changes will be merged into the original PR branch pre-commit-update.

This PR will be automatically closed if the original PR is merged.


📄 23% (0.23x) speedup for speedup_critic in codeflash/result/critic.py

⏱️ Runtime : 1.27 milliseconds 1.03 milliseconds (best of 153 runs)

📝 Explanation and details

The optimization introduces a cached GitHub Actions detection mechanism that eliminates repeated expensive environment lookups in the hot path.

Key optimization:

  • Added _in_github_actions_mode() function with @lru_cache(maxsize=1) that caches the result of env_utils.get_pr_number()
  • Replaced the inline bool(env_utils.get_pr_number()) call inside speedup_critic with a call to the cached function

Why this is faster:
The original code called env_utils.get_pr_number() on every invocation of speedup_critic. While get_pr_number() itself is cached, it still involves function call overhead and the boolean conversion. The line profiler shows this operation took 5.69ms (28.8% of total time) in the original vs only 1.09ms (7.7%) in the optimized version.

Performance characteristics:

  • Best for high-frequency scenarios: The optimization shines when speedup_critic is called many times (as shown in the large-scale tests with 500+ candidates), where the 22% speedup compounds significantly
  • Minimal impact on single calls: For individual calls, the improvement is small but consistent
  • GitHub Actions environments benefit most: Since the cached result prevents repeated environment variable lookups that are expensive in CI environments

The optimization maintains identical behavior while reducing redundant work through strategic caching of the GitHub Actions detection logic.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 3 Passed
🌀 Generated Regression Tests 2382 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import json
import os
from functools import lru_cache
from pathlib import Path
from typing import Any, Optional

# imports
import pytest  # used for our unit tests
from codeflash.result.critic import speedup_critic

MIN_IMPROVEMENT_THRESHOLD = 0.01  # 1% improvement threshold

class OptimizedCandidateResult:
    def __init__(self, best_test_runtime: int):
        self.best_test_runtime = best_test_runtime
from codeflash.result.critic import speedup_critic

# ---------------------- BASIC TEST CASES ----------------------

def test_basic_speedup_above_threshold():
    """Test: A basic case where the optimized code is significantly faster (10% speedup)."""
    orig = 100_000
    opt = 90_000
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 4.10μs -> 2.13μs (92.0% faster)

def test_basic_speedup_below_threshold():
    """Test: Optimized code is only 0.5% faster (should not be surfaced)."""
    orig = 100_000
    opt = 99_505
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 2.07μs -> 1.76μs (17.6% faster)

def test_basic_no_speedup():
    """Test: Optimized code is slower than original."""
    orig = 100_000
    opt = 110_000
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 1.97μs -> 1.74μs (13.2% faster)

def test_basic_exactly_at_threshold():
    """Test: Optimized code is exactly at the 1% improvement threshold (should not be surfaced)."""
    orig = 100_000
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD))  # Solve for gain == 1%
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 1.69μs -> 1.54μs (9.86% faster)

def test_basic_best_runtime_until_now():
    """Test: best_runtime_until_now is set, and new candidate is faster and above threshold."""
    orig = 100_000
    prev_best = 91_000
    opt = 90_000
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, prev_best); result = codeflash_output # 2.05μs -> 1.82μs (12.7% faster)

def test_basic_best_runtime_until_now_not_faster():
    """Test: best_runtime_until_now is set, but candidate is not faster than best."""
    orig = 100_000
    prev_best = 89_000
    opt = 90_000
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, prev_best); result = codeflash_output # 2.01μs -> 1.78μs (13.0% faster)

def test_basic_best_runtime_until_now_below_threshold():
    """Test: best_runtime_until_now is set, candidate is faster but below threshold."""
    orig = 100_000
    prev_best = 99_000
    opt = 98_900
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, prev_best); result = codeflash_output # 1.92μs -> 1.68μs (14.3% faster)

# ---------------------- EDGE TEST CASES ----------------------

def test_edge_zero_optimized_runtime():
    """Test: Optimized runtime is zero (should return False, avoid division by zero)."""
    orig = 100_000
    opt = 0
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 1.78μs -> 1.44μs (23.6% faster)

def test_edge_zero_original_runtime():
    """Test: Original runtime is zero, optimized is positive (should be negative gain, not surfaced)."""
    orig = 0
    opt = 10_000
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 2.21μs -> 1.98μs (11.6% faster)

def test_edge_both_zero_runtimes():
    """Test: Both original and optimized runtimes are zero (should return False)."""
    orig = 0
    opt = 0
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 1.82μs -> 1.57μs (15.8% faster)

def test_edge_small_runtime_noise_floor():
    """Test: For very small original runtimes, noise floor is 3x threshold."""
    orig = 5_000  # < 10_000
    opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD))  # Exactly at noise floor
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 1.79μs -> 1.56μs (14.7% faster)

    # Now, just above the noise floor
    opt_better = int(orig / (1 + 3.1 * MIN_IMPROVEMENT_THRESHOLD))
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt_better), orig, None); result2 = codeflash_output # 831ns -> 741ns (12.1% faster)



def test_edge_large_negative_perf_gain():
    """Test: Optimized runtime much worse than original (large negative gain)."""
    orig = 100_000
    opt = 1_000_000
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 2.32μs -> 2.08μs (11.5% faster)

def test_edge_original_runtime_just_below_10k():
    """Test: Original runtime just below 10,000, uses 3x threshold."""
    orig = 9_999
    opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD * 1.1))  # Just above noise floor
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 2.02μs -> 1.79μs (12.9% faster)

def test_edge_original_runtime_just_above_10k():
    """Test: Original runtime just above 10,000, uses normal threshold."""
    orig = 10_001
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD * 1.1))  # Just above normal threshold
    codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 1.82μs -> 1.55μs (17.5% faster)

# ---------------------- LARGE SCALE TEST CASES ----------------------

def test_large_scale_many_candidates_all_below_threshold():
    """Test: Many candidates, all below the threshold, none should be surfaced."""
    orig = 100_000
    for i in range(100):  # 100 candidates
        opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD * 0.5))  # 0.5% improvement
        codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 53.0μs -> 43.3μs (22.4% faster)

def test_large_scale_many_candidates_all_above_threshold():
    """Test: Many candidates, all above the threshold, all should be surfaced."""
    orig = 100_000
    for i in range(100):  # 100 candidates
        opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD * 2))  # 2% improvement
        codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 51.3μs -> 41.5μs (23.6% faster)

def test_large_scale_best_runtime_until_now_progressively_better():
    """Test: Simulate a sequence of improving candidates, only strictly better ones are surfaced."""
    orig = 100_000
    prev_best = None
    prev_runtime = orig
    for i in range(20):
        # Each candidate is 2% better than the last
        opt = int(prev_runtime / 1.02)
        codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, prev_best); should_surface = codeflash_output # 12.1μs -> 10.0μs (20.4% faster)
        # The first candidate should be surfaced, subsequent ones only if strictly better
        if prev_best is None or opt < prev_best:
            pass
        else:
            pass
        prev_best = opt
        prev_runtime = opt

def test_large_scale_small_original_runtime():
    """Test: Many small original runtimes, ensure noise floor is 3x threshold."""
    orig = 5_000
    for i in range(50):
        # Each candidate is just below the 3x threshold
        opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD * 0.95))
        codeflash_output = speedup_critic(OptimizedCandidateResult(opt), orig, None); result = codeflash_output # 27.7μs -> 22.8μs (21.8% faster)
        # Each candidate is just above the 3x threshold
        opt2 = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD * 1.05))
        codeflash_output = speedup_critic(OptimizedCandidateResult(opt2), orig, None); result2 = codeflash_output # 26.4μs -> 21.6μs (22.6% faster)


#------------------------------------------------
from __future__ import annotations

import json
import os
from functools import lru_cache
from pathlib import Path
from typing import Any, Optional

# imports
import pytest  # used for our unit tests
from codeflash.result.critic import speedup_critic


# Mocks for codeflash dependencies
class DummyEnvUtils:
    _pr_number = None
    @classmethod
    def set_pr_number(cls, n):
        cls._pr_number = n
    @classmethod
    def get_pr_number(cls):
        return cls._pr_number

class DummyOptimizedCandidateResult:
    def __init__(self, best_test_runtime: int):
        self.best_test_runtime = best_test_runtime

# Set a mock MIN_IMPROVEMENT_THRESHOLD for testing
MIN_IMPROVEMENT_THRESHOLD = 0.01  # 1%

# Patch env_utils in the tested module
env_utils = DummyEnvUtils

# Patch OptimizedCandidateResult
OptimizedCandidateResult = DummyOptimizedCandidateResult
from codeflash.result.critic import speedup_critic

# =========================
# Unit tests for speedup_critic
# =========================

# -------- Basic Tests --------

def test_basic_improvement_above_threshold():
    # Basic: improvement just above MIN_IMPROVEMENT_THRESHOLD (original >= 10_000ns)
    env_utils.set_pr_number(None)
    orig = 100_000
    opt = int(orig * (1 - (MIN_IMPROVEMENT_THRESHOLD + 0.01)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 2.23μs -> 1.85μs (20.0% faster)

def test_basic_no_improvement():
    # Basic: no improvement at all
    env_utils.set_pr_number(None)
    orig = 100_000
    opt = orig
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 2.03μs -> 1.65μs (23.0% faster)

def test_basic_improvement_below_threshold():
    # Basic: improvement just below MIN_IMPROVEMENT_THRESHOLD (original >= 10_000ns)
    env_utils.set_pr_number(None)
    orig = 100_000
    opt = int(orig * (1 - (MIN_IMPROVEMENT_THRESHOLD - 0.001)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.90μs -> 1.57μs (21.1% faster)

def test_basic_best_runtime_until_now():
    # Basic: best_runtime_until_now provided, candidate is faster and above threshold
    env_utils.set_pr_number(None)
    orig = 100_000
    best_so_far = 90_000
    opt = 80_000  # 20% improvement
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, best_so_far) # 2.14μs -> 1.86μs (15.1% faster)

def test_basic_best_runtime_until_now_not_better():
    # Basic: best_runtime_until_now provided, candidate is not faster
    env_utils.set_pr_number(None)
    orig = 100_000
    best_so_far = 80_000
    opt = 85_000  # worse than best_so_far
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, best_so_far) # 2.11μs -> 1.79μs (17.8% faster)

# -------- Edge Cases --------

def test_edge_original_runtime_very_small():
    # Edge: original_code_runtime < 10_000, noise floor is 3x threshold
    env_utils.set_pr_number(None)
    orig = 9_000
    noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD
    # Just above noise floor
    opt = int(orig * (1 - (noise_floor + 0.01)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.96μs -> 1.66μs (18.1% faster)

def test_edge_original_runtime_very_small_below_noise():
    # Edge: original_code_runtime < 10_000, improvement below 3x threshold
    env_utils.set_pr_number(None)
    orig = 9_000
    noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD
    opt = int(orig * (1 - (noise_floor - 0.001)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.85μs -> 1.64μs (12.8% faster)

def test_edge_optimized_runtime_zero():
    # Edge: optimized_runtime_ns == 0, should not crash, but perf_gain = 0.0
    env_utils.set_pr_number(None)
    orig = 100_000
    opt = 0
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.71μs -> 1.47μs (16.4% faster)

def test_edge_original_runtime_equals_optimized():
    # Edge: original == optimized, no improvement
    env_utils.set_pr_number(None)
    orig = 10_000
    opt = 10_000
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.97μs -> 1.72μs (14.6% faster)

def test_edge_optimized_slower_than_original():
    # Edge: optimized is slower than original
    env_utils.set_pr_number(None)
    orig = 10_000
    opt = 15_000
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.94μs -> 1.58μs (22.8% faster)

def test_edge_best_runtime_until_now_none():
    # Edge: best_runtime_until_now is None, should only check threshold
    env_utils.set_pr_number(None)
    orig = 100_000
    opt = int(orig * (1 - (MIN_IMPROVEMENT_THRESHOLD + 0.02)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.81μs -> 1.52μs (19.0% faster)

def test_edge_best_runtime_until_now_equal_to_candidate():
    # Edge: candidate best_test_runtime equals best_runtime_until_now
    env_utils.set_pr_number(None)
    orig = 100_000
    best_so_far = 80_000
    opt = 80_000
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, best_so_far) # 2.10μs -> 1.74μs (20.6% faster)


def test_github_actions_noise_floor_doubles():
    # When in GH Actions, noise floor doubles
    env_utils.set_pr_number(42)
    orig = 100_000
    noise_floor = MIN_IMPROVEMENT_THRESHOLD * 2
    opt = int(orig * (1 - (noise_floor + 0.01)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.85μs -> 1.50μs (23.4% faster)

def test_github_actions_noise_floor_doubles_small_runtime():
    # When in GH Actions and original < 10_000, noise floor is 6x threshold
    env_utils.set_pr_number(42)
    orig = 9_000
    noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD * 2  # 6x
    opt = int(orig * (1 - (noise_floor + 0.01)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.91μs -> 1.58μs (20.9% faster)

def test_github_actions_noise_floor_blocks_small_improvement():
    # When in GH Actions, improvement below doubled noise floor is rejected
    env_utils.set_pr_number(42)
    orig = 100_000
    noise_floor = MIN_IMPROVEMENT_THRESHOLD * 2
    opt = int(orig * (1 - (noise_floor - 0.001)))
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.78μs -> 1.44μs (23.6% faster)

# -------- Large Scale Tests --------

def test_large_scale_many_candidates():
    # Large scale: test with 500 different candidates, only some should pass
    env_utils.set_pr_number(None)
    orig = 100_000
    threshold = MIN_IMPROVEMENT_THRESHOLD
    # Make 500 candidates with varying improvements
    results = []
    for i in range(500):
        # Improvement from 0% to 49.9%
        improvement = i / 1000  # 0.0 to 0.499
        opt = int(orig * (1 - improvement))
        candidate = OptimizedCandidateResult(opt)
        results.append(speedup_critic(candidate, orig, None)) # 252μs -> 203μs (23.9% faster)
    # Candidates with improvement > threshold should pass
    num_passing = sum(results)

def test_large_scale_github_actions_many_candidates():
    # Large scale: 500 candidates, in GH Actions mode, higher threshold
    env_utils.set_pr_number(1)
    orig = 100_000
    threshold = MIN_IMPROVEMENT_THRESHOLD * 2
    results = []
    for i in range(500):
        improvement = i / 1000  # 0.0 to 0.499
        opt = int(orig * (1 - improvement))
        candidate = OptimizedCandidateResult(opt)
        results.append(speedup_critic(candidate, orig, None)) # 250μs -> 201μs (24.2% faster)
    num_passing = sum(results)

def test_large_scale_best_runtime_until_now():
    # Large scale: best_runtime_until_now filters out slower candidates
    env_utils.set_pr_number(None)
    orig = 100_000
    best_so_far = 80_000
    # Candidates: opt from 100_000 down to 70_000
    results = []
    for opt in range(100_000, 69_999, -1000):  # 100_000, 99_000, ..., 70_000
        candidate = OptimizedCandidateResult(opt)
        results.append(speedup_critic(candidate, orig, best_so_far)) # 18.3μs -> 15.3μs (19.4% faster)

def test_large_scale_small_runtime():
    # Large scale: original_code_runtime < 10_000, 3x threshold
    env_utils.set_pr_number(None)
    orig = 9_000
    noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD
    results = []
    for i in range(500):
        improvement = i / 1000  # 0.0 to 0.499
        opt = int(orig * (1 - improvement))
        candidate = OptimizedCandidateResult(opt)
        results.append(speedup_critic(candidate, orig, None)) # 265μs -> 218μs (21.7% faster)
    num_passing = sum(results)

def test_large_scale_all_fail_when_optimized_slower():
    # Large scale: all optimized runtimes are slower than original
    env_utils.set_pr_number(None)
    orig = 100_000
    results = []
    for i in range(500):
        opt = orig + i * 100  # Always slower
        candidate = OptimizedCandidateResult(opt)
        results.append(speedup_critic(candidate, orig, None)) # 248μs -> 202μs (22.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr620-2025-08-06T04.15.09 and push.

Codeflash

…update`)

The optimization introduces a **cached GitHub Actions detection mechanism** that eliminates repeated expensive environment lookups in the hot path.

**Key optimization**: 
- Added `_in_github_actions_mode()` function with `@lru_cache(maxsize=1)` that caches the result of `env_utils.get_pr_number()` 
- Replaced the inline `bool(env_utils.get_pr_number())` call inside `speedup_critic` with a call to the cached function

**Why this is faster**:
The original code called `env_utils.get_pr_number()` on every invocation of `speedup_critic`. While `get_pr_number()` itself is cached, it still involves function call overhead and the boolean conversion. The line profiler shows this operation took **5.69ms** (28.8% of total time) in the original vs only **1.09ms** (7.7%) in the optimized version.

**Performance characteristics**:
- **Best for high-frequency scenarios**: The optimization shines when `speedup_critic` is called many times (as shown in the large-scale tests with 500+ candidates), where the 22% speedup compounds significantly
- **Minimal impact on single calls**: For individual calls, the improvement is small but consistent
- **GitHub Actions environments benefit most**: Since the cached result prevents repeated environment variable lookups that are expensive in CI environments

The optimization maintains identical behavior while reducing redundant work through strategic caching of the GitHub Actions detection logic.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Aug 6, 2025
@codeflash-ai codeflash-ai bot mentioned this pull request Aug 6, 2025


@lru_cache(maxsize=1)
def _in_github_actions_mode() -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually like this new function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeflash/code_utils/env_utils.py already has it looks like, using it from there

@KRRT7 KRRT7 closed this Aug 6, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr620-2025-08-06T04.15.09 branch August 6, 2025 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants