Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 7, 2025

📄 390% (3.90x) speedup for draw in quantecon/random/utilities.py

⏱️ Runtime : 2.72 milliseconds 556 microseconds (best of 241 runs)

📝 Explanation and details

The key optimization replaces a Python loop with vectorized NumPy operations in the draw function's multiple sample case.

What changed:

  • Replaced the explicit Python loop for i in range(size): out[i] = searchsorted(cdf, rs[i]) with a single vectorized call: out = np.searchsorted(cdf, rs, side='right')
  • Removed the separate np.empty allocation since np.searchsorted returns the output array directly

Why this is faster:
The original code performs size individual calls to the custom searchsorted function in Python, each requiring loop overhead and function call overhead. The optimized version leverages NumPy's highly optimized C implementation that processes the entire array in one operation, eliminating Python loop overhead entirely.

Performance characteristics:

  • Massive speedups for large sample sizes (857% faster for 1000 samples, 934% for 500 samples)
  • Modest improvements for small sample sizes (35-40% faster for 10-100 samples)
  • Single draws remain unchanged, preserving the custom implementation's behavior
  • Edge cases like size=0 show slight regression due to NumPy's overhead for empty arrays, but these are uncommon scenarios

The optimization is most effective when size is an integer (vectorizable case), while preserving the original behavior for single draws and non-integer sizes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 52 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from numba import jit
from quantecon.random.utilities import draw

# unit tests

# ---------------------------
# BASIC TEST CASES
# ---------------------------


def test_draw_multiple_two_outcomes():
    """
    Test draw with a simple 2-outcome CDF, multiple draws.
    """
    cdf = np.array([0.4, 1.0])
    codeflash_output = draw(cdf, size=100); draws = codeflash_output # 41.4μs -> 12.5μs (232% faster)


def test_draw_return_type_single():
    """
    Test that draw returns an int when size is None.
    """
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf); result = codeflash_output # 6.15μs -> 6.07μs (1.30% faster)

def test_draw_return_type_multiple():
    """
    Test that draw returns a numpy array of ints when size is given.
    """
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size=10); result = codeflash_output # 13.8μs -> 10.2μs (35.2% faster)

# ---------------------------
# EDGE TEST CASES
# ---------------------------

def test_draw_cdf_all_ones():
    """
    Test draw with a CDF that is all ones (should always return last index).
    """
    cdf = np.array([1.0, 1.0, 1.0])
    # All random draws are < 1, so always index 0
    draws = [draw(cdf) for _ in range(100)] # 3.36μs -> 3.92μs (14.1% slower)

def test_draw_cdf_single_value():
    """
    Test draw with a CDF of length 1 (degenerate distribution).
    """
    cdf = np.array([1.0])
    draws = [draw(cdf) for _ in range(100)] # 3.00μs -> 3.10μs (3.03% slower)

def test_draw_cdf_with_zero_probabilities():
    """
    Test draw with a CDF that has zero-probability outcomes.
    """
    cdf = np.array([0.0, 0.5, 1.0])
    draws = [draw(cdf) for _ in range(1000)] # 2.98μs -> 2.88μs (3.30% faster)

def test_draw_cdf_close_to_one():
    """
    Test draw with a CDF where the last value is just below 1.
    """
    cdf = np.array([0.2, 0.5, 0.999999])
    draws = [draw(cdf) for _ in range(1000)] # 2.81μs -> 2.74μs (2.30% faster)


def test_draw_invalid_cdf_above_one():
    """
    Test draw with a CDF that exceeds 1 (invalid).
    """
    cdf = np.array([0.3, 1.2])
    # Should not allow cdf > 1, but function does not check
    # So we check that all draws are either 0 or 1
    draws = [draw(cdf) for _ in range(100)] # 5.92μs -> 5.81μs (1.84% faster)

def test_draw_invalid_cdf_not_increasing():
    """
    Test draw with a CDF that is not strictly increasing.
    """
    cdf = np.array([0.2, 0.2, 1.0])
    draws = [draw(cdf) for _ in range(100)] # 3.60μs -> 3.42μs (5.53% faster)

def test_draw_size_zero():
    """
    Test draw with size=0 (should return empty array).
    """
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size=0); result = codeflash_output # 6.19μs -> 9.81μs (36.9% slower)

def test_draw_non_integer_size():
    """
    Test draw with a non-integer size (should ignore and return scalar).
    """
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size='foo'); result = codeflash_output # 3.81μs -> 3.89μs (2.06% slower)

# ---------------------------
# LARGE SCALE TEST CASES
# ---------------------------

def test_draw_large_size():
    """
    Test draw with a large number of samples.
    """
    cdf = np.linspace(0, 1, 11)  # 10 outcomes
    size = 1000
    codeflash_output = draw(cdf, size=size); draws = codeflash_output # 305μs -> 31.9μs (857% faster)

def test_draw_large_cdf():
    """
    Test draw with a large CDF (many outcomes).
    """
    n = 1000
    pmf = np.ones(n) / n
    cdf = np.cumsum(pmf)
    codeflash_output = draw(cdf, size=100); draws = codeflash_output # 42.0μs -> 12.5μs (236% faster)

def test_draw_uniform_distribution():
    """
    Test draw with a uniform distribution over many outcomes.
    """
    n = 100
    pmf = np.ones(n) / n
    cdf = np.cumsum(pmf)
    codeflash_output = draw(cdf, size=1000); draws = codeflash_output # 324μs -> 50.4μs (543% faster)
    # Each outcome should appear at least once (with high probability)
    outcomes = set(draws)

def test_draw_performance_large():
    """
    Test draw performance with large size and CDF (stress test).
    """
    n = 500
    cdf = np.linspace(0, 1, n+1)
    size = 1000
    codeflash_output = draw(cdf, size=size); draws = codeflash_output # 348μs -> 63.4μs (449% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest  # used for our unit tests
from numba import jit
from quantecon.random.utilities import draw

# ------------------- #
# - UNIT TEST SUITE - #
# ------------------- #

# 1. Basic Test Cases
def test_draw_basic_single_draw():
    # Test single draw from a simple cdf
    cdf = np.array([0.5, 1.0])
    # Should always return 0 or 1
    codeflash_output = draw(cdf); val = codeflash_output # 4.18μs -> 5.09μs (17.9% slower)

def test_draw_basic_multiple_draws():
    # Test multiple draws from a simple cdf
    cdf = np.array([0.3, 0.7, 1.0])
    codeflash_output = draw(cdf, size=10); draws = codeflash_output # 12.9μs -> 9.19μs (40.0% faster)

def test_draw_basic_probabilities():
    # Test that the empirical probabilities are roughly correct
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf, size=500); draws = codeflash_output # 150μs -> 14.6μs (934% faster)
    # Count occurrences
    counts = np.bincount(draws, minlength=3)
    # Empirical probabilities
    probs = counts / 500
    # Should roughly match cdf increments
    expected = np.diff(np.concatenate(([0], cdf)))
    for p, e in zip(probs, expected):
        pass

# 2. Edge Test Cases


def test_draw_cdf_one_element():
    # cdf with one element (should always return 0)
    cdf = np.array([1.0])
    for _ in range(10):
        codeflash_output = draw(cdf) # 11.0μs -> 11.0μs (0.164% faster)

def test_draw_cdf_all_ones():
    # cdf with all elements 1 (should always return 0)
    cdf = np.array([1.0, 1.0, 1.0])
    for _ in range(10):
        codeflash_output = draw(cdf) # 7.83μs -> 7.72μs (1.40% faster)

def test_draw_cdf_not_monotonic():
    # cdf not monotonic should still work but be interpreted as is
    cdf = np.array([0.5, 0.4, 1.0])  # Not strictly increasing
    # Should not crash, but output may be weird
    codeflash_output = draw(cdf); val = codeflash_output # 2.80μs -> 2.65μs (5.70% faster)

def test_draw_cdf_with_zeros():
    # cdf with zeros at start
    cdf = np.array([0.0, 0.5, 1.0])
    codeflash_output = draw(cdf, size=10); draws = codeflash_output # 13.3μs -> 10.2μs (30.5% faster)

def test_draw_cdf_with_duplicate_values():
    # cdf with duplicate values
    cdf = np.array([0.2, 0.2, 0.7, 1.0])
    codeflash_output = draw(cdf, size=10); draws = codeflash_output # 9.70μs -> 7.04μs (37.8% faster)

def test_draw_cdf_with_large_values():
    # cdf with values > 1 (should still work, but interpretation is odd)
    cdf = np.array([0.5, 1.2, 2.0])
    codeflash_output = draw(cdf, size=10); draws = codeflash_output # 9.47μs -> 6.30μs (50.3% faster)


def test_draw_cdf_with_inf():
    # cdf with inf should not crash but inf should act as 1.0
    cdf = np.array([0.5, np.inf])
    codeflash_output = draw(cdf, size=10); draws = codeflash_output # 15.9μs -> 11.2μs (42.4% faster)

def test_draw_size_zero():
    # size=0 should return empty array
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size=0); out = codeflash_output # 4.40μs -> 6.95μs (36.6% slower)

def test_draw_non_integer_size():
    # Non-integer size should be ignored and treated as single draw
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size='foo'); val = codeflash_output # 4.08μs -> 5.25μs (22.3% slower)

# 3. Large Scale Test Cases

def test_draw_large_cdf():
    # Large cdf, single draw
    cdf = np.linspace(0, 1, 1000)
    codeflash_output = draw(cdf); val = codeflash_output # 3.81μs -> 3.76μs (1.17% faster)

def test_draw_large_number_of_draws():
    # Large number of draws
    cdf = np.array([0.2, 0.8, 1.0])
    codeflash_output = draw(cdf, size=1000); draws = codeflash_output # 292μs -> 22.2μs (1215% faster)

def test_draw_large_cdf_and_large_draws():
    # Both cdf and draws are large
    cdf = np.linspace(0, 1, 1000)
    codeflash_output = draw(cdf, size=1000); draws = codeflash_output # 354μs -> 68.9μs (415% faster)

def test_draw_large_cdf_monotonicity():
    # Large cdf, ensure monotonicity of output indices
    cdf = np.linspace(0, 1, 1000)
    codeflash_output = draw(cdf, size=1000); draws = codeflash_output # 351μs -> 66.4μs (429% faster)

def test_draw_performance_large_scale():
    # Performance test for large scale (not strict timing, just no crash)
    cdf = np.linspace(0, 1, 1000)
    codeflash_output = draw(cdf, size=1000); draws = codeflash_output # 349μs -> 65.5μs (434% faster)

# Additional: Deterministic test with seeded RNG
def test_draw_deterministic_with_seed():
    # Set seed for reproducibility
    np.random.seed(42)
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size=5); out1 = codeflash_output # 8.22μs -> 6.44μs (27.7% faster)
    np.random.seed(42)
    codeflash_output = draw(cdf, size=5); out2 = codeflash_output # 3.41μs -> 2.64μs (29.1% faster)

# Additional: Test that indices match searchsorted logic

To edit these changes git checkout codeflash/optimize-draw-mgg0qwcu and push.

Codeflash

The key optimization replaces a Python loop with vectorized NumPy operations in the `draw` function's multiple sample case.

**What changed:**
- Replaced the explicit Python loop `for i in range(size): out[i] = searchsorted(cdf, rs[i])` with a single vectorized call: `out = np.searchsorted(cdf, rs, side='right')`
- Removed the separate `np.empty` allocation since `np.searchsorted` returns the output array directly

**Why this is faster:**
The original code performs `size` individual calls to the custom `searchsorted` function in Python, each requiring loop overhead and function call overhead. The optimized version leverages NumPy's highly optimized C implementation that processes the entire array in one operation, eliminating Python loop overhead entirely.

**Performance characteristics:**
- Massive speedups for large sample sizes (857% faster for 1000 samples, 934% for 500 samples)  
- Modest improvements for small sample sizes (35-40% faster for 10-100 samples)
- Single draws remain unchanged, preserving the custom implementation's behavior
- Edge cases like `size=0` show slight regression due to NumPy's overhead for empty arrays, but these are uncommon scenarios

The optimization is most effective when `size` is an integer (vectorizable case), while preserving the original behavior for single draws and non-integer sizes.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 7, 2025 03:47
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant