Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 24, 2025

📄 13% (0.13x) speedup for _bool in src/deepgram/extensions/telemetry/proto_encoder.py

⏱️ Runtime : 83.8 microseconds 73.8 microseconds (best of 148 runs)

📝 Explanation and details

The optimized code achieves a 13% speedup through two key optimizations in the protobuf encoding functions:

1. Fast path for single-byte varints in _varint():

  • Added an early return if value <= 0x7F: return bytes((value,)) to avoid creating a bytearray and performing loops for small integers (≤127)
  • This optimization is highly effective since protobuf field keys are typically small numbers that fit in a single byte

2. Replaced bytearray with list + local method reference:

  • Changed from bytearray() to [] and cached out.append as a local variable append
  • This eliminates the overhead of bytearray operations and method lookups in the loop

3. Direct byte literals in _bool():

  • Replaced _varint(1 if value else 0) with (b'\x01' if value else b'\x00')
  • Eliminates function call overhead for encoding boolean values since they always result in single-byte values

The test results show consistent 8-27% improvements across all cases, with particularly strong gains for:

  • Small field numbers (single-byte keys): 15-27% faster
  • False values: Often 15-25% faster due to avoiding the varint call entirely
  • Empty containers: Up to 42% faster for falsy values like [] and {}

These optimizations are especially beneficial for typical protobuf usage where field numbers are small and boolean values are common.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 46 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from deepgram.extensions.telemetry.proto_encoder import _bool

# unit tests

# --- Basic Test Cases ---

def test_bool_true_basic():
    # Test with field_number=1, value=True
    # Should encode as key (1 << 3 | 0) = 8, varint(1) = b'\x01'
    codeflash_output = _bool(1, True) # 1.75μs -> 1.62μs (8.08% faster)

def test_bool_false_basic():
    # Test with field_number=1, value=False
    # Should encode as key (1 << 3 | 0) = 8, varint(0) = b'\x00'
    codeflash_output = _bool(1, False) # 1.65μs -> 1.43μs (15.3% faster)

def test_bool_true_field2():
    # Test with field_number=2, value=True
    # key: (2 << 3 | 0) = 16, varint(1) = b'\x01'
    codeflash_output = _bool(2, True) # 1.56μs -> 1.50μs (4.34% faster)

def test_bool_false_field2():
    # Test with field_number=2, value=False
    # key: (2 << 3 | 0) = 16, varint(0) = b'\x00'
    codeflash_output = _bool(2, False) # 1.64μs -> 1.45μs (12.8% faster)

def test_bool_true_field15():
    # Test with field_number=15, value=True
    # key: (15 << 3 | 0) = 120, varint(1) = b'\x01'
    codeflash_output = _bool(15, True) # 1.56μs -> 1.33μs (17.4% faster)

def test_bool_false_field15():
    # Test with field_number=15, value=False
    # key: (15 << 3 | 0) = 120, varint(0) = b'\x00'
    codeflash_output = _bool(15, False) # 1.58μs -> 1.38μs (14.2% faster)

# --- Edge Test Cases ---

def test_bool_field_zero_true():
    # Test with field_number=0, value=True
    # key: (0 << 3 | 0) = 0, varint(1) = b'\x01'
    codeflash_output = _bool(0, True) # 1.62μs -> 1.40μs (15.8% faster)

def test_bool_field_zero_false():
    # Test with field_number=0, value=False
    # key: (0 << 3 | 0) = 0, varint(0) = b'\x00'
    codeflash_output = _bool(0, False) # 1.47μs -> 1.32μs (10.8% faster)

def test_bool_field_max_single_byte_true():
    # Test with field_number=15 (max single byte), value=True
    # key: (15 << 3 | 0) = 120, varint(1) = b'\x01'
    codeflash_output = _bool(15, True) # 1.61μs -> 1.39μs (16.0% faster)

def test_bool_field_max_single_byte_false():
    # Test with field_number=15, value=False
    # key: (15 << 3 | 0) = 120, varint(0) = b'\x00'
    codeflash_output = _bool(15, False) # 1.59μs -> 1.39μs (14.6% faster)

def test_bool_field_multibyte_true():
    # Test with field_number=128, value=True
    # key: (128 << 3 | 0) = 1024, varint(1024) = b'\x80\x08', varint(1) = b'\x01'
    codeflash_output = _bool(128, True) # 2.41μs -> 2.18μs (10.5% faster)

def test_bool_field_multibyte_false():
    # Test with field_number=128, value=False
    # key: (128 << 3 | 0) = 1024, varint(1024) = b'\x80\x08', varint(0) = b'\x00'
    codeflash_output = _bool(128, False) # 2.17μs -> 1.86μs (16.7% faster)




def test_bool_value_non_bool_true():
    # Test with value=1 (truthy non-bool)
    codeflash_output = _bool(3, 1) # 3.17μs -> 2.77μs (14.1% faster)

def test_bool_value_non_bool_false():
    # Test with value=0 (falsy non-bool)
    codeflash_output = _bool(3, 0) # 2.03μs -> 1.62μs (25.1% faster)

def test_bool_value_none():
    # None is falsy
    codeflash_output = _bool(4, None) # 1.81μs -> 1.49μs (21.7% faster)

def test_bool_value_empty_string():
    # Empty string is falsy
    codeflash_output = _bool(5, "") # 1.70μs -> 1.43μs (19.1% faster)

def test_bool_value_nonempty_string():
    # Non-empty string is truthy
    codeflash_output = _bool(5, "yes") # 1.56μs -> 1.49μs (5.12% faster)

def test_bool_value_list():
    # Non-empty list is truthy
    codeflash_output = _bool(6, [1]) # 1.60μs -> 1.40μs (14.2% faster)
    # Empty list is falsy
    codeflash_output = _bool(6, []) # 831ns -> 585ns (42.1% faster)

def test_bool_value_dict():
    # Non-empty dict is truthy
    codeflash_output = _bool(7, {"a": 1}) # 1.50μs -> 1.37μs (9.36% faster)
    # Empty dict is falsy
    codeflash_output = _bool(7, {}) # 864ns -> 645ns (34.0% faster)

# --- Large Scale Test Cases ---







def test_bool_value_custom_object():
    # Custom object with __bool__ returning True
    class Truthy:
        def __bool__(self): return True
    class Falsy:
        def __bool__(self): return False
    codeflash_output = _bool(8, Truthy()) # 3.67μs -> 3.29μs (11.7% faster)
    codeflash_output = _bool(8, Falsy()) # 1.01μs -> 815ns (24.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from deepgram.extensions.telemetry.proto_encoder import _bool

# unit tests

# --- Basic Test Cases ---

def test_bool_true_basic():
    # Test basic True value encoding for field 1
    # field_number=1, wire_type=0, value=True
    # Expected: _key(1,0) + _varint(1) == b'\x08\x01'
    codeflash_output = _bool(1, True) # 2.03μs -> 1.60μs (26.8% faster)

def test_bool_false_basic():
    # Test basic False value encoding for field 1
    # field_number=1, wire_type=0, value=False
    # Expected: _key(1,0) + _varint(0) == b'\x08\x00'
    codeflash_output = _bool(1, False) # 1.76μs -> 1.49μs (18.6% faster)

def test_bool_true_field2():
    # Test True value encoding for field 2
    # field_number=2, wire_type=0, value=True
    # _key(2,0) == _varint(16) == b'\x10'
    # _varint(1) == b'\x01'
    codeflash_output = _bool(2, True) # 1.63μs -> 1.39μs (17.3% faster)

def test_bool_false_field2():
    # Test False value encoding for field 2
    codeflash_output = _bool(2, False) # 1.68μs -> 1.40μs (20.2% faster)

def test_bool_true_field15():
    # Test True value encoding for a higher field number
    # field_number=15, wire_type=0, value=True
    # (15 << 3) | 0 = 120, _varint(120) == b'x'
    codeflash_output = _bool(15, True) # 1.54μs -> 1.44μs (6.58% faster)

def test_bool_false_field15():
    # Test False value encoding for a higher field number
    codeflash_output = _bool(15, False) # 1.57μs -> 1.34μs (17.2% faster)

# --- Edge Test Cases ---

def test_bool_field_number_zero_true():
    # Test edge case: field_number=0, value=True
    # _key(0,0) == _varint(0) == b'\x00'
    codeflash_output = _bool(0, True) # 1.64μs -> 1.49μs (9.95% faster)

def test_bool_field_number_zero_false():
    # Test edge case: field_number=0, value=False
    codeflash_output = _bool(0, False) # 1.51μs -> 1.42μs (6.85% faster)

def test_bool_field_number_max_single_byte_true():
    # Test edge case: field_number=15, wire_type=0 (max single byte for _varint)
    # (15 << 3) | 0 = 120, _varint(120) == b'x'
    codeflash_output = _bool(15, True) # 1.63μs -> 1.48μs (9.91% faster)

def test_bool_field_number_max_single_byte_false():
    codeflash_output = _bool(15, False) # 1.64μs -> 1.45μs (13.2% faster)

def test_bool_field_number_multibyte_true():
    # Test edge case: field_number=128, wire_type=0 (multibyte varint)
    # (128 << 3) | 0 = 1024, _varint(1024) == b'\x80\x08'
    # _varint(1) == b'\x01'
    codeflash_output = _bool(128, True) # 2.40μs -> 2.18μs (9.76% faster)

def test_bool_field_number_multibyte_false():
    codeflash_output = _bool(128, False) # 2.03μs -> 1.86μs (9.02% faster)

def test_bool_field_number_large_true():
    # Test with a large field number near 1000
    # field_number=999, wire_type=0, (999 << 3) = 7992
    # _varint(7992) == b'\xb8\x3e'
    codeflash_output = _bool(999, True) # 2.08μs -> 1.91μs (8.96% faster)

def test_bool_field_number_large_false():
    codeflash_output = _bool(999, False) # 2.02μs -> 1.84μs (9.79% faster)

def test_bool_non_bool_value_true_equivalent():
    # Test with non-bool type that is truthy (should encode as True)
    codeflash_output = _bool(1, 1) # 1.54μs -> 1.37μs (12.5% faster)
    codeflash_output = _bool(1, "nonempty") # 740ns -> 681ns (8.66% faster)

def test_bool_non_bool_value_false_equivalent():
    # Test with non-bool type that is falsy (should encode as False)
    codeflash_output = _bool(1, 0) # 1.51μs -> 1.32μs (15.0% faster)
    codeflash_output = _bool(1, "") # 810ns -> 636ns (27.4% faster)

def test_bool_negative_field_number():
    # Negative field numbers are not valid, but test behavior
    # (-1 << 3) = -8; _varint(-8) = _varint(18446744073709551608)
    # Should not raise, but output will be a long varint
    codeflash_output = _bool(-1, True); result_true = codeflash_output # 3.83μs -> 3.41μs (12.4% faster)
    codeflash_output = _bool(-1, False); result_false = codeflash_output # 2.02μs -> 1.72μs (17.3% faster)

def test_bool_field_number_type_error():
    # Non-integer field_number should raise TypeError
    with pytest.raises(TypeError):
        _bool("foo", True) # 1.60μs -> 1.53μs (4.70% faster)
    with pytest.raises(TypeError):
        _bool(None, False) # 810ns -> 890ns (8.99% slower)
    with pytest.raises(TypeError):
        _bool(1.5, True) # 605ns -> 656ns (7.77% slower)

# --- Large Scale Test Cases ---





#------------------------------------------------
from deepgram.extensions.telemetry.proto_encoder import _bool

def test__bool():
    _bool(0, True)

def test__bool_2():
    _bool(0, False)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_7zeygj7s/tmpze2qsx6z/test_concolic_coverage.py::test__bool 3.04μs 2.64μs 15.3%✅
codeflash_concolic_7zeygj7s/tmpze2qsx6z/test_concolic_coverage.py::test__bool_2 1.77μs 1.55μs 14.4%✅

To edit these changes git checkout codeflash/optimize-_bool-mh4iu1zx and push.

Codeflash

The optimized code achieves a 13% speedup through two key optimizations in the protobuf encoding functions:

**1. Fast path for single-byte varints in `_varint()`:**
- Added an early return `if value <= 0x7F: return bytes((value,))` to avoid creating a bytearray and performing loops for small integers (≤127)
- This optimization is highly effective since protobuf field keys are typically small numbers that fit in a single byte

**2. Replaced bytearray with list + local method reference:**
- Changed from `bytearray()` to `[]` and cached `out.append` as a local variable `append`
- This eliminates the overhead of bytearray operations and method lookups in the loop

**3. Direct byte literals in `_bool()`:**
- Replaced `_varint(1 if value else 0)` with `(b'\x01' if value else b'\x00')`
- Eliminates function call overhead for encoding boolean values since they always result in single-byte values

The test results show consistent 8-27% improvements across all cases, with particularly strong gains for:
- Small field numbers (single-byte keys): 15-27% faster
- False values: Often 15-25% faster due to avoiding the varint call entirely
- Empty containers: Up to 42% faster for falsy values like `[]` and `{}`

These optimizations are especially beneficial for typical protobuf usage where field numbers are small and boolean values are common.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 24, 2025 07:20
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant