Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 7, 2025

📄 16% (0.16x) speedup for IVP._integrate_fixed_trajectory in quantecon/_ivp.py

⏱️ Runtime : 14.9 milliseconds 12.8 milliseconds (best of 311 runs)

📝 Explanation and details

The optimization replaces the O(n²) performance bottleneck caused by repeated np.vstack() operations with an efficient pre-allocation strategy.

Key optimization:

  • Pre-allocates solution array: Instead of starting with a single row and repeatedly calling np.vstack() to grow the array (which requires copying all existing data each time), the code pre-allocates chunks of 128 rows and grows by additional chunks only when needed.
  • Direct array indexing: Replaces expensive np.hstack() and np.vstack() calls with direct array assignment using indexing (sol[idx, 0] = self.t).
  • Final trimming: After integration completes, unused pre-allocated rows are trimmed with a single slice operation.

Why it's faster:
The original code's np.vstack() operations had O(n) cost per iteration, leading to O(n²) total complexity as the solution array grew. The line profiler shows 8.5% of runtime was spent on vstack operations. The optimized version achieves amortized O(1) growth by pre-allocating space and only occasionally resizing, reducing memory allocations and data copying.

Performance gains by test type:

  • Small integrations (few steps): 20-30% speedup due to eliminating array creation overhead
  • Medium integrations (10-100 steps): 40-60% speedup as vstack costs become more significant
  • Large systems (1000+ variables): Smaller but consistent 3-10% gains since integration dominates runtime
  • Many steps (500+ iterations): 60-65% speedup where the O(n²) vstack penalty was most severe

The optimization maintains identical output format and behavior while dramatically improving scalability for longer integration trajectories.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon._ivp import IVP
from scipy import integrate

# ---- UNIT TESTS ----

# Basic Test Cases

def test_basic_linear_ode():
    # Test a simple ODE: dy/dt = 1, y(0) = 0, analytic solution: y = t
    def f(t, y): return 1
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    # Integrate from t=0 to t=1 in steps of h=0.2
    h = 0.2
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 117μs -> 75.3μs (55.7% faster)
    # Check solution is approximately t
    for row in sol:
        pass

def test_basic_quadratic_ode():
    # dy/dt = 2t, y(0) = 0, analytic solution: y = t^2
    def f(t, y): return 2 * t
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = 0.1
    T = 0.5
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 105μs -> 68.1μs (55.6% faster)
    # Check solution is close to t^2
    for row in sol:
        pass

def test_basic_system_ode():
    # System: dy1/dt = y2, dy2/dt = -y1, y1(0)=1, y2(0)=0 (simple harmonic oscillator)
    def f(t, y): return [y[1], -y[0]]
    ivp = IVP(f)
    ivp.set_initial_value([1, 0], 0)
    h = 0.1
    T = 0.3
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 199μs -> 122μs (62.4% faster)

# Edge Test Cases

def test_zero_step_size():
    # h=0 should not advance time; should return only initial condition
    def f(t, y): return 1
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = 0.0
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 62.8μs -> 50.8μs (23.6% faster)

def test_negative_step_forward_in_time():
    # Negative h, but T > t0, should not advance; should return only initial condition
    def f(t, y): return 1
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = -0.1
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 29.9μs -> 23.9μs (24.8% faster)

def test_backward_integration():
    # Integrate backwards in time: h<0, T < t0
    def f(t, y): return -1
    ivp = IVP(f)
    ivp.set_initial_value(1, 1)
    h = -0.2
    T = 0.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 49.0μs -> 34.8μs (40.8% faster)
    # Check solution is approximately y = t
    for row in sol:
        pass

def test_non_integer_steps():
    # h does not evenly divide T-t0
    def f(t, y): return 2
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = 0.3
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 106μs -> 65.4μs (62.6% faster)
    # Solution should be close to y = 2*t
    for row in sol:
        pass

def test_multidimensional_y():
    # ODE with y as a 3-element vector
    def f(t, y): return [1, 2, 3]
    ivp = IVP(f)
    ivp.set_initial_value([0, 0, 0], 0)
    h = 0.5
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 109μs -> 69.7μs (57.6% faster)
    # Each y should be [t, t*2, t*3]
    for row in sol:
        pass

def test_initial_condition_at_target():
    # t0 == T, should return only initial condition
    def f(t, y): return 1
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 1.0)
    h = 0.1
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 25.9μs -> 20.8μs (24.8% faster)

def test_non_list_return():
    # f returns a scalar
    def f(t, y): return 2
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = 0.1
    T = 0.2
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 106μs -> 65.3μs (62.9% faster)
    # Solution should be close to y = 2*t
    for row in sol:
        pass

# Large Scale Test Cases

def test_large_scale_linear_ode():
    # Large number of steps, dy/dt = 1, y(0)=0
    def f(t, y): return 1
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = 0.001
    T = 0.999
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 112μs -> 67.7μs (66.0% faster)
    # Solution should be close to y = t
    for row in sol[::100]:  # check every 100th row for efficiency
        pass

def test_large_scale_multidimensional_ode():
    # Large system: y is a 10-element vector, dy/dt = range(1,11)
    def f(t, y): return list(range(1, 11))
    ivp = IVP(f)
    ivp.set_initial_value([0]*10, 0)
    h = 0.01
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 129μs -> 83.2μs (55.3% faster)
    # Each y[i] should be close to (i+1)*t
    for row in sol[::10]:  # check every 10th row for efficiency
        for i in range(10):
            pass

def test_large_scale_backward_integration():
    # Large backward integration: dy/dt = -2, y(2)=2, integrate to t=1
    def f(t, y): return -2
    ivp = IVP(f)
    ivp.set_initial_value(2, 2)
    h = -0.001
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 61.5μs -> 41.8μs (47.2% faster)
    # Solution should be close to y = t
    for row in sol[::100]:  # check every 100th row for efficiency
        pass

def test_large_scale_noninteger_steps():
    # Large scale, h does not evenly divide T-t0
    def f(t, y): return 3
    ivp = IVP(f)
    ivp.set_initial_value(0, 0)
    h = 0.007
    T = 0.999
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 113μs -> 68.6μs (65.2% faster)
    # Solution should be close to y = 3*t
    for row in sol[::100]:  # check every 100th row for efficiency
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon._ivp import IVP
from scipy import integrate

# unit tests

# ---- Basic Test Cases ----

def test_basic_scalar_ode_forward():
    # Test integrating dy/dt = -y, y(0)=1, analytical solution y(t)=exp(-t)
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = 0.1
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 185μs -> 117μs (57.6% faster)

def test_basic_vector_ode_forward():
    # Test integrating dx/dt = y, dy/dt = -x, x(0)=1, y(0)=0 (circle)
    def f(t, Y): return [Y[1], -Y[0]]
    ivp = IVP(f)
    ivp.set_initial_value([1.0, 0.0], 0.0)
    h = 0.1
    T = np.pi/2
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 302μs -> 178μs (69.0% faster)

def test_basic_backward_integration():
    # Test integrating backward in time
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 1.0)
    h = -0.1
    T = 0.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 183μs -> 115μs (59.4% faster)

# ---- Edge Test Cases ----

def test_zero_step_size():
    # Test h=0 should not advance time, should break immediately
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = 0.0
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 59.6μs -> 47.5μs (25.6% faster)

def test_t_equals_T():
    # Test initial time equals target time
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(2.0, 2.0)
    h = 0.1
    T = 2.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 31.8μs -> 27.2μs (16.9% faster)

def test_negative_h_forward_T():
    # Test negative h with T > t0, should not advance (wrong direction)
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = -0.1
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 29.9μs -> 24.5μs (22.1% faster)

def test_large_h_overshoot():
    # Test h larger than (T-t0), should overshoot T in one step
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = 5.0
    T = 1.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 187μs -> 115μs (61.5% faster)

def test_non_list_output():
    # Test ODE function returning a numpy array
    def f(t, y): return np.array([-y])
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = 0.1
    T = 0.2
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 127μs -> 87.8μs (44.7% faster)

def test_jacobian_argument():
    # Test with a jacobian provided
    def f(t, y): return -y
    def jac(t, y): return np.array([[-1]])
    ivp = IVP(f, jac=jac)
    ivp.set_initial_value(1.0, 0.0)
    h = 0.1
    T = 0.2
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 120μs -> 81.8μs (47.6% faster)

def test_non_successful_integration():
    # Test with ODE that fails after first step (by returning tuple)
    def f(t, y): return (y,)
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = 0.1
    T = 0.2
    step = True
    relax = False
    # Should raise an error (scipy.ode expects ndarray/list, not tuple)
    with pytest.raises(Exception):
        ivp._integrate_fixed_trajectory(h, T, step, relax)

# ---- Large Scale Test Cases ----

def test_large_number_of_steps():
    # Test integrating with many steps, but <1000
    def f(t, y): return -y
    ivp = IVP(f)
    ivp.set_initial_value(1.0, 0.0)
    h = 0.01
    T = 5.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 415μs -> 250μs (65.8% faster)
    # Number of steps should be about T/h + 1
    expected_steps = int(T/h) + 1

def test_large_vector_ode():
    # Test system with 1000 variables, each decaying independently
    def f(t, y): return [-yi for yi in y]
    y0 = [1.0]*1000
    ivp = IVP(f)
    ivp.set_initial_value(y0, 0.0)
    h = 0.1
    T = 0.2
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 1.82ms -> 1.76ms (3.42% faster)

def test_performance_large_steps():
    # Test that integration of a large system is reasonably fast and doesn't crash
    def f(t, y): return [-yi for yi in y]
    y0 = [1.0]*999
    ivp = IVP(f)
    ivp.set_initial_value(y0, 0.0)
    h = 0.5
    T = 10.0
    step = True
    relax = False
    codeflash_output = ivp._integrate_fixed_trajectory(h, T, step, relax); sol = codeflash_output # 10.1ms -> 9.17ms (9.68% faster)
    # Should not crash, and shape should be correct
    expected_steps = int(T/h) + 1
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-IVP._integrate_fixed_trajectory-mggp3cyz and push.

Codeflash

The optimization replaces the O(n²) performance bottleneck caused by repeated `np.vstack()` operations with an efficient pre-allocation strategy.

**Key optimization:**
- **Pre-allocates solution array**: Instead of starting with a single row and repeatedly calling `np.vstack()` to grow the array (which requires copying all existing data each time), the code pre-allocates chunks of 128 rows and grows by additional chunks only when needed.
- **Direct array indexing**: Replaces expensive `np.hstack()` and `np.vstack()` calls with direct array assignment using indexing (`sol[idx, 0] = self.t`).
- **Final trimming**: After integration completes, unused pre-allocated rows are trimmed with a single slice operation.

**Why it's faster:**
The original code's `np.vstack()` operations had O(n) cost per iteration, leading to O(n²) total complexity as the solution array grew. The line profiler shows 8.5% of runtime was spent on vstack operations. The optimized version achieves amortized O(1) growth by pre-allocating space and only occasionally resizing, reducing memory allocations and data copying.

**Performance gains by test type:**
- **Small integrations** (few steps): 20-30% speedup due to eliminating array creation overhead
- **Medium integrations** (10-100 steps): 40-60% speedup as vstack costs become more significant  
- **Large systems** (1000+ variables): Smaller but consistent 3-10% gains since integration dominates runtime
- **Many steps** (500+ iterations): 60-65% speedup where the O(n²) vstack penalty was most severe

The optimization maintains identical output format and behavior while dramatically improving scalability for longer integration trajectories.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 7, 2025 15:09
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant