Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 7, 2025

📄 20% (0.20x) speedup for _get_action_profile in quantecon/game_theory/mclennan_tourky.py

⏱️ Runtime : 310 microseconds 259 microseconds (best of 199 runs)

📝 Explanation and details

The optimized code achieves a 19% speedup by replacing the generator expression with a more efficient two-step approach using zip() and list comprehension.

Key optimization: Instead of tuple(x[indptr[i]:indptr[i+1]] for i in range(N)), the code uses:

  1. zip(indptr[:-1], indptr[1:]) to pair consecutive indices without explicit range iteration
  2. List comprehension [x[start:end] for start, end in ...] followed by tuple(sliced)

Why this is faster:

  • Eliminates repeated indexing: The original code performs indptr[i] and indptr[i+1] lookups in each iteration. The optimized version pre-computes these pairs with zip(), reducing attribute access overhead.
  • Reduces generator overhead: List comprehension is more efficient than generator expressions when the result is immediately consumed by tuple(), as it avoids the generator state machine.
  • Better memory locality: The zip() operation creates consecutive start/end pairs that are processed sequentially.

Test case performance patterns:

  • Large-scale scenarios see the biggest gains (30-33% faster for sparse players, 29% for many players)
  • Small inputs show modest improvements (1-15% faster)
  • Edge cases with very small datasets may be slightly slower due to the additional zip() overhead, but this is negligible compared to the gains on realistic workloads

The optimization is particularly effective for game theory applications with many players or complex action spaces, which are the primary use cases for this function.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 44 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from quantecon.game_theory.mclennan_tourky import _get_action_profile

# unit tests

# -------------- Basic Test Cases --------------

def test_single_player_single_action():
    # 1 player, 1 action
    x = [0.5]
    indptr = [0, 1]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.48μs -> 2.15μs (15.5% faster)

def test_two_players_equal_actions():
    # 2 players, 2 actions each
    x = [0.1, 0.9, 0.7, 0.3]
    indptr = [0, 2, 4]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.14μs -> 2.11μs (1.71% faster)

def test_three_players_varied_actions():
    # 3 players, 1, 2, 3 actions
    x = [0.4, 0.2, 0.8, 0.1, 0.2, 0.7]
    indptr = [0, 1, 3, 6]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.16μs -> 2.11μs (2.28% faster)

def test_empty_action_profile():
    # 0 players, empty input
    x = []
    indptr = [0]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.56μs -> 1.68μs (7.43% slower)

def test_player_with_zero_actions():
    # One player, zero actions
    x = []
    indptr = [0, 0]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.95μs -> 1.90μs (3.11% faster)

def test_multiple_players_some_zero_actions():
    # 3 players, 0, 2, 0 actions
    x = [0.5, 0.5]
    indptr = [0, 0, 2, 2]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.14μs -> 2.10μs (1.81% faster)

def test_input_as_tuple_or_list():
    # Accepts both lists and tuples for x and indptr
    x = (0.2, 0.8, 0.1, 0.9)
    indptr = (0, 2, 4)
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.00μs -> 1.99μs (0.605% faster)

def test_input_with_floats_and_ints():
    # Accepts both floats and ints in x
    x = [1, 0, 0.5, 0.5]
    indptr = [0, 2, 4]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.93μs -> 1.99μs (3.12% slower)

# -------------- Edge Test Cases --------------



def test_indptr_length_too_long():
    # indptr longer than x+1, last player has zero actions
    x = [0.1, 0.2]
    indptr = [0, 1, 2, 2]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.76μs -> 2.45μs (12.7% faster)

def test_indptr_with_non_increasing_indices():
    # indptr must be non-decreasing
    x = [0.1, 0.2]
    indptr = [0, 2, 1]
    # This will cause slicing to go backwards, which gives []
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.09μs -> 2.06μs (1.51% faster)


def test_x_longer_than_indptr():
    # x is longer than needed by indptr
    x = [0.1, 0.2, 0.3, 0.4]
    indptr = [0, 2]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.97μs -> 1.87μs (5.07% faster)

def test_negative_indices_in_indptr():
    # negative indices in indptr
    x = [0.1, 0.2]
    indptr = [0, -1, 2]
    # x[0:-1] is [0.1], x[-1:2] is [0.2]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.99μs -> 2.01μs (1.10% slower)

def test_x_is_string():
    # x is a string (should treat as sequence of chars)
    x = "abcd"
    indptr = [0, 2, 4]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.04μs -> 1.93μs (6.02% faster)

def test_indptr_is_string():
    # indptr is a string (should fail)
    x = [0.1, 0.2]
    indptr = "012"
    with pytest.raises(TypeError):
        _get_action_profile(x, indptr) # 2.41μs -> 2.25μs (6.83% faster)

def test_empty_x_nonempty_indptr():
    # x is empty, indptr has more than one entry
    x = []
    indptr = [0, 0, 0]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.13μs -> 2.17μs (1.80% slower)

def test_large_indices_out_of_bounds():
    # indptr indices out of bounds of x
    x = [0.1]
    indptr = [0, 10]
    # x[0:10] will just return [0.1]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.84μs -> 1.79μs (3.08% faster)

def test_float_indptr():
    # indptr contains floats (should fail)
    x = [0.1, 0.2]
    indptr = [0.0, 2.0]
    with pytest.raises(TypeError):
        _get_action_profile(x, indptr) # 2.24μs -> 2.00μs (11.7% faster)

def test_negative_length_x():
    # x is negative length (impossible in Python, just check empty)
    x = []
    indptr = [0, 0]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.93μs -> 2.00μs (3.89% slower)

# -------------- Large Scale Test Cases --------------

def test_large_number_of_players():
    # 100 players, each with 5 actions
    num_players = 100
    actions_per_player = 5
    x = [float(i) for i in range(num_players * actions_per_player)]
    indptr = [i * actions_per_player for i in range(num_players + 1)]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 10.2μs -> 9.04μs (13.0% faster)
    for i in range(num_players):
        expected = [float(j) for j in range(i * actions_per_player, (i + 1) * actions_per_player)]

def test_large_number_of_actions_one_player():
    # 1 player, 1000 actions
    x = [float(i) for i in range(1000)]
    indptr = [0, 1000]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.91μs -> 3.10μs (6.01% slower)

def test_large_sparse_players():
    # 1000 players, each with 0 or 1 action, alternating
    num_players = 1000
    x = [float(i) for i in range(num_players // 2)]
    indptr = [0]
    for i in range(num_players):
        if i % 2 == 0:
            indptr.append(indptr[-1])  # zero actions
        else:
            indptr.append(indptr[-1] + 1)  # one action
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 75.8μs -> 56.9μs (33.2% faster)
    for i in range(num_players):
        if i % 2 == 0:
            pass
        else:
            pass

def test_large_irregular_actions():
    # 50 players, random actions per player (between 0 and 20)
    import random
    random.seed(42)
    actions_per_player = [random.randint(0, 20) for _ in range(50)]
    x = []
    indptr = [0]
    for n in actions_per_player:
        x.extend([float(len(x) + i) for i in range(n)])
        indptr.append(indptr[-1] + n)
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 6.44μs -> 5.84μs (10.2% faster)
    for i, n in enumerate(actions_per_player):
        expected = [float(sum(actions_per_player[:i]) + j) for j in range(n)]

def test_large_with_some_empty_players():
    # 100 players, every 10th has 0 actions, rest have 3
    num_players = 100
    x = []
    indptr = [0]
    for i in range(num_players):
        if i % 10 == 0:
            indptr.append(indptr[-1])
        else:
            x.extend([float(len(x) + j) for j in range(3)])
            indptr.append(indptr[-1] + 3)
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 9.57μs -> 8.78μs (8.95% faster)
    for i in range(num_players):
        if i % 10 == 0:
            pass
        else:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from quantecon.game_theory.mclennan_tourky import _get_action_profile

# Basic Test Cases

def test_single_player_single_action():
    # One player, one action
    x = [0.5]
    indptr = [0, 1]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.15μs -> 2.04μs (5.04% faster)

def test_two_players_equal_actions():
    # Two players, two actions each
    x = [0.3, 0.7, 0.4, 0.6]
    indptr = [0, 2, 4]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.08μs -> 2.05μs (1.71% faster)

def test_three_players_unequal_actions():
    # Three players, actions: 2, 1, 3
    x = [0.1, 0.9, 0.5, 0.2, 0.3, 0.5]
    indptr = [0, 2, 3, 6]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.15μs -> 2.07μs (3.66% faster)

def test_empty_action_profile():
    # No players
    x = []
    indptr = [0]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.60μs -> 1.70μs (6.05% slower)

def test_player_with_zero_actions():
    # One player with zero actions
    x = []
    indptr = [0, 0]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.79μs -> 1.89μs (5.19% slower)

def test_multiple_players_some_zero_actions():
    # Multiple players, some with zero actions
    x = [0.2, 0.8, 0.5]
    indptr = [0, 2, 2, 3]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.05μs -> 2.08μs (1.63% slower)

# Edge Test Cases


def test_negative_values_in_x():
    # x contains negative values
    x = [-1.0, 0.0, 1.0]
    indptr = [0, 1, 3]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.57μs -> 2.38μs (8.03% faster)

def test_non_monotonic_indptr():
    # indptr is not strictly increasing (should not crash, but may produce empty slices)
    x = [1, 2, 3]
    indptr = [0, 2, 2, 3]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.29μs -> 2.19μs (4.66% faster)



def test_x_longer_than_indptr():
    # x is longer than required by indptr (should ignore extra elements)
    x = [0.1, 0.2, 0.3, 0.4]
    indptr = [0, 2, 3]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.52μs -> 2.32μs (8.31% faster)

def test_indptr_with_duplicates():
    # indptr has duplicate values (should produce empty slices)
    x = [0.1, 0.2]
    indptr = [0, 0, 2]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.19μs -> 2.07μs (6.10% faster)

def test_indptr_with_negative_index():
    # indptr contains negative index
    x = [0.1, 0.2]
    indptr = [-1, 1, 2]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.13μs -> 2.08μs (2.60% faster)

def test_x_is_tuple():
    # x is a tuple instead of a list
    x = (0.1, 0.2, 0.3)
    indptr = [0, 2, 3]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.10μs -> 2.07μs (1.40% faster)

def test_x_is_string():
    # x is a string (should treat as sequence of characters)
    x = "abcde"
    indptr = [0, 2, 5]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.01μs -> 2.01μs (0.149% faster)

def test_indptr_is_tuple():
    # indptr is a tuple instead of a list
    x = [0.1, 0.2, 0.3]
    indptr = (0, 1, 3)
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 1.99μs -> 2.04μs (2.84% slower)

def test_x_is_range():
    # x is a range object
    x = range(10)
    indptr = [0, 3, 10]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 2.74μs -> 2.78μs (1.26% slower)


def test_large_number_of_players():
    # 500 players, each with 2 actions
    N = 500
    x = [float(i) for i in range(N * 2)]
    indptr = [2 * i for i in range(N + 1)]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 40.0μs -> 30.9μs (29.5% faster)

def test_large_number_of_actions_per_player():
    # 5 players, each with 200 actions
    N = 5
    actions_per_player = 200
    x = [float(i) for i in range(N * actions_per_player)]
    indptr = [actions_per_player * i for i in range(N + 1)]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 3.39μs -> 3.59μs (5.55% slower)

def test_large_sparse_players_some_zero_actions():
    # 1000 players, only every 10th has actions
    N = 1000
    x = [float(i) for i in range(100)]
    indptr = [0]
    for i in range(1, N + 1):
        if i % 10 == 0:
            indptr.append(indptr[-1] + 1)
        else:
            indptr.append(indptr[-1])
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 78.6μs -> 60.4μs (30.1% faster)
    for i in range(N):
        if (i + 1) % 10 == 0:
            pass
        else:
            pass

def test_large_x_longer_than_needed():
    # x is longer than needed, should ignore extras
    N = 100
    x = [float(i) for i in range(N + 10)]
    indptr = [i for i in range(N + 1)]
    codeflash_output = _get_action_profile(x, indptr); result = codeflash_output # 8.80μs -> 7.89μs (11.6% faster)
    for i in range(N):
        pass

To edit these changes git checkout codeflash/optimize-_get_action_profile-mgg06jyg and push.

Codeflash

The optimized code achieves a **19% speedup** by replacing the generator expression with a more efficient two-step approach using `zip()` and list comprehension.

**Key optimization:** Instead of `tuple(x[indptr[i]:indptr[i+1]] for i in range(N))`, the code uses:
1. `zip(indptr[:-1], indptr[1:])` to pair consecutive indices without explicit range iteration
2. List comprehension `[x[start:end] for start, end in ...]` followed by `tuple(sliced)`

**Why this is faster:**
- **Eliminates repeated indexing**: The original code performs `indptr[i]` and `indptr[i+1]` lookups in each iteration. The optimized version pre-computes these pairs with `zip()`, reducing attribute access overhead.
- **Reduces generator overhead**: List comprehension is more efficient than generator expressions when the result is immediately consumed by `tuple()`, as it avoids the generator state machine.
- **Better memory locality**: The `zip()` operation creates consecutive start/end pairs that are processed sequentially.

**Test case performance patterns:**
- **Large-scale scenarios** see the biggest gains (30-33% faster for sparse players, 29% for many players)
- **Small inputs** show modest improvements (1-15% faster)
- **Edge cases** with very small datasets may be slightly slower due to the additional `zip()` overhead, but this is negligible compared to the gains on realistic workloads

The optimization is particularly effective for game theory applications with many players or complex action spaces, which are the primary use cases for this function.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 7, 2025 03:31
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants