Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 9% (0.09x) speedup for _get_tokenizer_config_size in src/cohere/manually_maintained/tokenizers.py

⏱️ Runtime : 139 microseconds 128 microseconds (best of 328 runs)

📝 Explanation and details

The optimized code replaces a for-loop with direct header access using Python's or operator for short-circuit evaluation.

Key changes:

  1. Eliminated the for-loop: Instead of iterating through ["x-goog-stored-content-length", "Content-Length"] and calling head_response.headers.get() on each iteration, the code now uses headers.get("x-goog-stored-content-length") or headers.get("Content-Length").

  2. Cached headers reference: Added headers = head_response.headers to avoid repeated attribute access.

Why this is faster:

  • Reduced function calls: The original loop made 2 .get() calls in most cases (74 hits in profiler), while the optimized version makes at most 2 calls but often just 1 due to short-circuit evaluation.
  • Eliminated loop overhead: No iterator creation, condition checking, or break statements.
  • Better CPU cache locality: Direct sequential execution instead of loop branching.

Performance characteristics:
The optimization shows consistent 6-17% improvements across all test cases, with the best gains on edge cases like invalid values (14.3% faster) and large numbers (17.2% faster). The speedup is most pronounced when the first header (x-goog-stored-content-length) is present, as it avoids the second .get() call entirely due to short-circuit evaluation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 40 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
import requests
from cohere.manually_maintained.tokenizers import _get_tokenizer_config_size


# unit tests
@pytest.fixture
def patch_requests_head(monkeypatch):
    """Fixture to patch requests.head for controlled header responses."""
    def _patch(headers):
        class DummyResponse:
            def __init__(self, headers):
                self.headers = headers
        monkeypatch.setattr(requests, "head", lambda url: DummyResponse(headers))
    return _patch

# --- Basic Test Cases ---

def test_content_length_basic(patch_requests_head):
    # Basic: Content-Length present, typical value
    patch_requests_head({"Content-Length": "1048576"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 5.07μs -> 4.53μs (11.9% faster)

def test_x_goog_stored_content_length_basic(patch_requests_head):
    # Basic: x-goog-stored-content-length present, typical value
    patch_requests_head({"x-goog-stored-content-length": "2097152"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.73μs -> 3.47μs (7.52% faster)

def test_both_headers_present_prefers_x_goog(patch_requests_head):
    # Basic: Both headers present, should prefer x-goog-stored-content-length
    patch_requests_head({
        "Content-Length": "1048576",
        "x-goog-stored-content-length": "3145728"
    })
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.46μs -> 3.35μs (3.50% faster)

def test_rounding_to_2_decimal_places(patch_requests_head):
    # Basic: Value that rounds to two decimals
    patch_requests_head({"Content-Length": str(1234567)})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.42μs -> 3.08μs (11.0% faster)

# --- Edge Test Cases ---

def test_header_value_is_zero(patch_requests_head):
    # Edge: Header value is zero
    patch_requests_head({"Content-Length": "0"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.54μs -> 3.19μs (11.1% faster)

def test_header_value_is_one_byte(patch_requests_head):
    # Edge: Header value is 1 byte
    patch_requests_head({"Content-Length": "1"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.65μs -> 3.25μs (12.4% faster)

def test_header_value_is_just_under_one_mb(patch_requests_head):
    # Edge: 1048575 bytes (1 less than 1 MB)
    patch_requests_head({"Content-Length": "1048575"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.65μs -> 3.42μs (6.87% faster)

def test_header_value_is_large_number(patch_requests_head):
    # Edge: Large value, e.g., 999,999,999 bytes
    patch_requests_head({"Content-Length": "999999999"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.53μs -> 3.19μs (10.7% faster)

def test_header_value_is_not_integer(patch_requests_head):
    # Edge: Header value is not an integer string
    patch_requests_head({"Content-Length": "notanumber"})
    with pytest.raises(ValueError):
        _get_tokenizer_config_size("http://some.url") # 3.99μs -> 3.50μs (14.0% faster)

def test_header_missing_raises_type_error(patch_requests_head):
    # Edge: Both headers missing, should raise TypeError (int(None) is invalid)
    patch_requests_head({})
    with pytest.raises(TypeError):
        _get_tokenizer_config_size("http://some.url") # 3.03μs -> 2.83μs (7.07% faster)

def test_header_value_is_negative(patch_requests_head):
    # Edge: Negative value (should be allowed, but returns negative MB)
    patch_requests_head({"Content-Length": "-1048576"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 4.30μs -> 4.05μs (6.35% faster)

def test_header_value_is_float_string(patch_requests_head):
    # Edge: Header value is a float string (should raise ValueError)
    patch_requests_head({"Content-Length": "12345.67"})
    with pytest.raises(ValueError):
        _get_tokenizer_config_size("http://some.url") # 4.07μs -> 3.60μs (13.3% faster)

def test_header_value_is_whitespace(patch_requests_head):
    # Edge: Header value is whitespace (should raise ValueError)
    patch_requests_head({"Content-Length": "   "})
    with pytest.raises(ValueError):
        _get_tokenizer_config_size("http://some.url") # 3.88μs -> 3.50μs (10.8% faster)

def test_header_value_is_empty_string(patch_requests_head):
    # Edge: Header value is empty string (should raise ValueError)
    patch_requests_head({"Content-Length": ""})
    with pytest.raises(ValueError):
        _get_tokenizer_config_size("http://some.url") # 3.58μs -> 3.28μs (9.31% faster)

def test_headers_case_insensitivity(patch_requests_head):
    # Edge: Header keys are case-insensitive (simulate lower-case)
    patch_requests_head({"content-length": "1048576"})
    # The function expects exact case, so this should raise TypeError
    with pytest.raises(TypeError):
        _get_tokenizer_config_size("http://some.url") # 3.04μs -> 2.77μs (9.71% faster)

# --- Large Scale Test Cases ---

def test_large_scale_content_length(patch_requests_head):
    # Large scale: 999 MB file
    patch_requests_head({"Content-Length": str(999 * 1024 * 1024)})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 4.51μs -> 4.16μs (8.46% faster)

def test_large_scale_x_goog_content_length(patch_requests_head):
    # Large scale: 999 MB file via x-goog-stored-content-length
    patch_requests_head({"x-goog-stored-content-length": str(999 * 1024 * 1024)})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.52μs -> 3.48μs (1.24% faster)

def test_large_scale_both_headers(patch_requests_head):
    # Large scale: Both headers, x-goog is larger and should be used
    patch_requests_head({
        "Content-Length": str(500 * 1024 * 1024),
        "x-goog-stored-content-length": str(999 * 1024 * 1024)
    })
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.34μs -> 3.31μs (0.937% faster)

def test_large_scale_non_mb_value(patch_requests_head):
    # Large scale: Value not a multiple of MB
    patch_requests_head({"Content-Length": str(987654321)})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.38μs -> 3.19μs (5.70% faster)

def test_large_scale_minimum_value(patch_requests_head):
    # Large scale: Minimum non-zero value (1 byte)
    patch_requests_head({"Content-Length": "1"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.57μs -> 3.27μs (9.24% faster)

def test_large_scale_maximum_value(patch_requests_head):
    # Large scale: Maximum allowed test value (999,999,999 bytes)
    patch_requests_head({"Content-Length": "999999999"})
    codeflash_output = _get_tokenizer_config_size("http://some.url"); result = codeflash_output # 3.29μs -> 3.10μs (6.30% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import types
import typing

# imports
import pytest
# function to test
import requests
from cohere.manually_maintained.tokenizers import _get_tokenizer_config_size

# --- Test helpers ---

class DummyResponse:
    """A dummy response object to simulate requests.head()."""
    def __init__(self, headers):
        self.headers = headers

def patch_requests_head(monkeypatch, headers):
    """Monkeypatch requests.head to return a DummyResponse with the given headers."""
    def dummy_head(url):
        return DummyResponse(headers)
    monkeypatch.setattr(requests, "head", dummy_head)

# --- Unit tests ---

# 1. Basic Test Cases

def test_content_length_header(monkeypatch):
    """Test with a valid Content-Length header."""
    patch_requests_head(monkeypatch, {"Content-Length": "1048576"})  # 1 MB
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.50μs -> 3.38μs (3.76% faster)

def test_x_goog_stored_content_length_header(monkeypatch):
    """Test with a valid x-goog-stored-content-length header and no Content-Length."""
    patch_requests_head(monkeypatch, {"x-goog-stored-content-length": "2097152"})  # 2 MB
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.09μs -> 2.89μs (7.03% faster)

def test_both_headers_present(monkeypatch):
    """Test when both headers are present; x-goog-stored-content-length should take precedence."""
    patch_requests_head(monkeypatch, {
        "x-goog-stored-content-length": "3145728",  # 3 MB
        "Content-Length": "1048576"  # 1 MB
    })
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 2.97μs -> 2.93μs (1.13% faster)

def test_rounding(monkeypatch):
    """Test correct rounding to 2 decimal places."""
    patch_requests_head(monkeypatch, {"Content-Length": str(1536000)})  # 1.46484375 MB
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.08μs -> 2.86μs (7.72% faster)

# 2. Edge Test Cases

def test_no_headers(monkeypatch):
    """Test when neither header is present; should raise TypeError or ValueError."""
    patch_requests_head(monkeypatch, {})
    with pytest.raises(TypeError):
        _get_tokenizer_config_size("http://example.com/tokenizer.json") # 2.95μs -> 2.66μs (10.8% faster)

def test_content_length_zero(monkeypatch):
    """Test Content-Length of zero."""
    patch_requests_head(monkeypatch, {"Content-Length": "0"})
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.72μs -> 3.35μs (11.2% faster)

def test_content_length_non_integer(monkeypatch):
    """Test Content-Length with a non-integer value; should raise ValueError."""
    patch_requests_head(monkeypatch, {"Content-Length": "notanint"})
    with pytest.raises(ValueError):
        _get_tokenizer_config_size("http://example.com/tokenizer.json") # 3.67μs -> 3.34μs (9.82% faster)

def test_header_case_insensitivity(monkeypatch):
    """Test that header lookup is case-sensitive (should be, per code)."""
    patch_requests_head(monkeypatch, {"content-length": "1048576"})
    # Should not find the header, so raises TypeError
    with pytest.raises(TypeError):
        _get_tokenizer_config_size("http://example.com/tokenizer.json") # 2.84μs -> 2.57μs (10.6% faster)

def test_content_length_with_spaces(monkeypatch):
    """Test Content-Length with leading/trailing spaces."""
    patch_requests_head(monkeypatch, {"Content-Length": " 1048576 "})
    # int() should handle whitespace
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.91μs -> 3.43μs (13.8% faster)

def test_content_length_negative(monkeypatch):
    """Test negative Content-Length."""
    patch_requests_head(monkeypatch, {"Content-Length": "-1048576"})
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.52μs -> 3.17μs (11.2% faster)

def test_content_length_float_string(monkeypatch):
    """Test Content-Length as a float string; should raise ValueError."""
    patch_requests_head(monkeypatch, {"Content-Length": "12345.67"})
    with pytest.raises(ValueError):
        _get_tokenizer_config_size("http://example.com/tokenizer.json") # 3.62μs -> 3.17μs (14.3% faster)

def test_content_length_large_value(monkeypatch):
    """Test very large Content-Length value."""
    large_bytes = 999_999_999
    patch_requests_head(monkeypatch, {"Content-Length": str(large_bytes)})
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.60μs -> 3.07μs (17.2% faster)
    expected = round(large_bytes / 1024 / 1024, 2)

def test_header_value_none(monkeypatch):
    """Test header key present but value is None; should skip and raise TypeError."""
    patch_requests_head(monkeypatch, {"Content-Length": None})
    with pytest.raises(TypeError):
        _get_tokenizer_config_size("http://example.com/tokenizer.json") # 2.86μs -> 2.55μs (12.4% faster)


def test_large_number_of_headers(monkeypatch):
    """Test with many irrelevant headers and one valid Content-Length."""
    headers = {f"X-Dummy-{i}": "foo" for i in range(500)}
    headers["Content-Length"] = "524288000"  # 500 MB
    patch_requests_head(monkeypatch, headers)
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 5.23μs -> 4.87μs (7.37% faster)

def test_large_content_length(monkeypatch):
    """Test with a very large Content-Length value close to 1GB."""
    one_gb = 1024 * 1024 * 1024
    patch_requests_head(monkeypatch, {"Content-Length": str(one_gb)})
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.93μs -> 3.51μs (12.0% faster)

def test_many_headers_with_x_goog(monkeypatch):
    """Test with many headers and a valid x-goog-stored-content-length."""
    headers = {f"X-Header-{i}": "bar" for i in range(999)}
    headers["x-goog-stored-content-length"] = "15728640"  # 15 MB
    patch_requests_head(monkeypatch, headers)
    codeflash_output = _get_tokenizer_config_size("http://example.com/tokenizer.json"); result = codeflash_output # 3.49μs -> 3.21μs (8.55% faster)

def test_multiple_calls_different_sizes(monkeypatch):
    """Test multiple calls with different sizes to ensure no caching or leakage."""
    patch_requests_head(monkeypatch, {"Content-Length": "1048576"})
    codeflash_output = _get_tokenizer_config_size("http://example.com/1") # 3.40μs -> 3.12μs (8.83% faster)
    patch_requests_head(monkeypatch, {"Content-Length": "2097152"})
    codeflash_output = _get_tokenizer_config_size("http://example.com/2") # 1.33μs -> 1.29μs (3.18% faster)
    patch_requests_head(monkeypatch, {"x-goog-stored-content-length": "3145728"})
    codeflash_output = _get_tokenizer_config_size("http://example.com/3") # 1.04μs -> 1.13μs (8.64% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from cohere.manually_maintained.tokenizers import _get_tokenizer_config_size
import pytest

def test__get_tokenizer_config_size():
    with pytest.raises(MissingSchema, match="Invalid\\ URL\\ '':\\ No\\ scheme\\ supplied\\.\\ Perhaps\\ you\\ meant\\ https://\\?"):
        _get_tokenizer_config_size('')

To edit these changes git checkout codeflash/optimize-_get_tokenizer_config_size-mgzlyzrq and push.

Codeflash

The optimized code replaces a for-loop with direct header access using Python's `or` operator for short-circuit evaluation. 

**Key changes:**
1. **Eliminated the for-loop**: Instead of iterating through `["x-goog-stored-content-length", "Content-Length"]` and calling `head_response.headers.get()` on each iteration, the code now uses `headers.get("x-goog-stored-content-length") or headers.get("Content-Length")`.

2. **Cached headers reference**: Added `headers = head_response.headers` to avoid repeated attribute access.

**Why this is faster:**
- **Reduced function calls**: The original loop made 2 `.get()` calls in most cases (74 hits in profiler), while the optimized version makes at most 2 calls but often just 1 due to short-circuit evaluation.
- **Eliminated loop overhead**: No iterator creation, condition checking, or break statements.
- **Better CPU cache locality**: Direct sequential execution instead of loop branching.

**Performance characteristics:**
The optimization shows consistent 6-17% improvements across all test cases, with the best gains on edge cases like invalid values (14.3% faster) and large numbers (17.2% faster). The speedup is most pronounced when the first header (`x-goog-stored-content-length`) is present, as it avoids the second `.get()` call entirely due to short-circuit evaluation.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 20, 2025 20:49
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant