Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 9% (0.09x) speedup for _Reranker.voyageai in weaviate/collections/classes/config.py

⏱️ Runtime : 1.88 milliseconds 1.72 milliseconds (best of 19 runs)

📝 Explanation and details

The optimization adds a conditional check to avoid passing the model parameter when it's None, which provides an 8% speedup by reducing Python's keyword argument overhead.

Key changes:

  • Added if model is None: check to call _RerankerVoyageAIConfig() without arguments
  • Only passes model=model when the value is not None

Why this is faster:
In CPython, when you pass keyword arguments, Python creates a dictionary to hold the key-value pairs before passing them to the function. By avoiding the model=model kwarg when model is None (which matches the default anyway), we eliminate this dictionary creation overhead. This is a micro-optimization that becomes measurable when the function is called frequently.

Performance benefits by test case:

  • Default/None cases see the biggest gains (6-23% faster): test_default_model_is_none, test_model_none_explicit, test_voyageai_stress_with_none
  • Non-None model cases see minimal impact (0.4-1.2% faster): the optimization only applies when model is None
  • Mixed workloads benefit proportionally based on the ratio of None vs non-None calls

The optimization is most effective when the function is frequently called with the default None value, which appears common based on the test cases focusing on default behavior.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2058 Passed
⏪ Replay Tests 2 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Literal, Optional, Union

# imports
import pytest
from weaviate.collections.classes.config import _Reranker

RerankerVoyageAIModel = Literal["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]

class _RerankerProvider:
    # Minimal stub for testing, simulates config object
    def __init__(self, model):
        self.model = model

class _RerankerVoyageAIConfig(_RerankerProvider):
    # Inherits from provider, used for type checking in tests
    pass
from weaviate.collections.classes.config import _Reranker

# unit tests

# -----------------------
# Basic Test Cases
# -----------------------

def test_default_model_is_none():
    """Test default call returns config with model=None."""
    codeflash_output = _Reranker.voyageai(); config = codeflash_output # 8.14μs -> 7.26μs (12.1% faster)

@pytest.mark.parametrize("model", [
    "rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"
])
def test_valid_models(model):
    """Test all valid model names are accepted."""
    codeflash_output = _Reranker.voyageai(model=model); config = codeflash_output # 19.2μs -> 19.1μs (0.429% faster)

def test_model_none_explicit():
    """Test passing model=None explicitly."""
    codeflash_output = _Reranker.voyageai(model=None); config = codeflash_output # 3.97μs -> 3.60μs (10.5% faster)

# -----------------------
# Edge Test Cases
# -----------------------

@pytest.mark.parametrize("invalid_model", [
    "",  # empty string
    "rerank-3",  # non-existent model
    "RERANK-2",  # case sensitivity
    "rerank 2",  # space instead of dash
    "rerank-lite-2",  # similar but invalid
    "rerank_2",  # underscore instead of dash
    "rerank-2!",  # special character
    "rerank-lite-1 ",  # trailing space
    " rerank-lite-1",  # leading space
    "rerank-1\n",  # newline
    "rerank",  # partial
    "lite-1",  # partial
    "rerank-2-lite-extra",  # extra suffix
])
def test_invalid_string_models_raise_valueerror(invalid_model):
    """Test that invalid string models raise ValueError."""
    with pytest.raises(ValueError):
        _Reranker.voyageai(model=invalid_model)

@pytest.mark.parametrize("non_str_model", [
    123,  # integer
    1.23,  # float
    True,  # boolean
    False,
    [],  # list
    {},  # dict
    object(),  # generic object
    b"rerank-2",  # bytes
])
def test_non_string_model_raises_typeerror(non_str_model):
    """Test that non-string, non-None model values raise TypeError."""
    with pytest.raises(TypeError):
        _Reranker.voyageai(model=non_str_model)

def test_model_is_tuple_string():
    """Test passing a tuple containing valid string raises TypeError."""
    with pytest.raises(TypeError):
        _Reranker.voyageai(model=("rerank-2",))

def test_model_is_list_string():
    """Test passing a list containing valid string raises TypeError."""
    with pytest.raises(TypeError):
        _Reranker.voyageai(model=["rerank-2"])

def test_model_is_none_type():
    """Test passing None explicitly is accepted."""
    codeflash_output = _Reranker.voyageai(model=None); config = codeflash_output # 8.20μs -> 7.73μs (6.07% faster)

# -----------------------
# Large Scale Test Cases
# -----------------------

def test_many_valid_models_in_sequence():
    """Test calling the function many times with valid models does not leak state."""
    valid_models = ["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]
    configs = []
    for i in range(250):  # 250*4 = 1000
        model = valid_models[i % 4]
        codeflash_output = _Reranker.voyageai(model=model); config = codeflash_output # 239μs -> 237μs (0.974% faster)
        configs.append(config)
    # All configs should have the correct model
    for i, config in enumerate(configs):
        pass


def test_performance_many_calls(monkeypatch):
    """Test performance: calling the function 1000 times with valid and None models."""
    # No assertion on timing, just that it doesn't crash or leak
    for i in range(1000):
        if i % 2 == 0:
            codeflash_output = _Reranker.voyageai(model="rerank-2"); config = codeflash_output
        else:
            codeflash_output = _Reranker.voyageai(); config = codeflash_output

def test_all_valid_and_invalid_models_mix():
    """Test a mix of valid and invalid models in a batch."""
    valid = ["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]
    invalid = ["rerank-3", "rerank_2", "RERANK-2", "rerank-lite-2"]
    results = []
    for i in range(500):
        if i % 2 == 0:
            # Valid
            m = valid[i % 4]
            codeflash_output = _Reranker.voyageai(model=m); config = codeflash_output
            results.append(config.model)
        else:
            # Invalid
            m = invalid[i % 4]
            with pytest.raises(ValueError):
                _Reranker.voyageai(model=m)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Literal, Optional, Union

# imports
import pytest
from weaviate.collections.classes.config import _Reranker

RerankerVoyageAIModel = Literal["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]

class _RerankerProvider:
    pass

class _RerankerVoyageAIConfig(_RerankerProvider):
    def __init__(self, model: Optional[Union[RerankerVoyageAIModel, str]] = None):
        # Validate model argument
        allowed = {"rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"}
        if model is not None:
            if not isinstance(model, str):
                raise TypeError("model must be a string or None")
            if model not in allowed:
                raise ValueError(f"Invalid model: {model!r}. Allowed values: {sorted(allowed)}")
        self.model = model
from weaviate.collections.classes.config import _Reranker

# unit tests

# --------------------------
# Basic Test Cases
# --------------------------

def test_voyageai_default_model():
    # Should succeed and set model to None (default)
    codeflash_output = _Reranker.voyageai(); config = codeflash_output # 7.52μs -> 6.95μs (8.22% faster)

@pytest.mark.parametrize(
    "model",
    ["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]
)
def test_voyageai_valid_models(model):
    # Should succeed for all valid model names
    codeflash_output = _Reranker.voyageai(model=model); config = codeflash_output # 19.1μs -> 19.0μs (0.590% faster)

def test_voyageai_valid_model_with_explicit_none():
    # Should succeed if model=None is passed explicitly
    codeflash_output = _Reranker.voyageai(model=None); config = codeflash_output # 3.90μs -> 3.78μs (3.10% faster)

# --------------------------
# Edge Test Cases
# --------------------------

@pytest.mark.parametrize(
    "model",
    [
        "RERANK-2",            # uppercase
        "rerank-2 ",           # trailing space
        " rerank-2",           # leading space
        "rerank2",             # missing dash
        "rerank-1.0",          # extra dot
        "rerank-lite-2",       # invalid number
        "",                    # empty string
        "rerank-2-lite-extra", # extra suffix
        "lite-1",              # partial
        "rerank",              # incomplete
    ]
)
def test_voyageai_invalid_model_strings(model):
    # Should raise ValueError for invalid model names
    with pytest.raises(ValueError):
        _Reranker.voyageai(model=model)

@pytest.mark.parametrize(
    "model",
    [123, 0.5, True, False, [], {}, object()]
)
def test_voyageai_invalid_model_types(model):
    # Should raise TypeError for non-string, non-None types
    with pytest.raises(TypeError):
        _Reranker.voyageai(model=model)

def test_voyageai_model_is_tuple():
    # Should raise TypeError if model is a tuple
    with pytest.raises(TypeError):
        _Reranker.voyageai(model=("rerank-2",))

def test_voyageai_model_is_bytes():
    # Should raise TypeError if model is bytes
    with pytest.raises(TypeError):
        _Reranker.voyageai(model=b"rerank-2")

def test_voyageai_model_is_none_string():
    # Should raise ValueError if model is string "None"
    with pytest.raises(ValueError):
        _Reranker.voyageai(model="None")

def test_voyageai_model_is_none_case_variation():
    # Should raise ValueError if model is string "none"
    with pytest.raises(ValueError):
        _Reranker.voyageai(model="none")

# --------------------------
# Large Scale Test Cases
# --------------------------

def test_voyageai_many_valid_and_invalid_models():
    # Test a mix of valid and invalid models in a loop (under 1000 iterations)
    valid_models = ["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]
    invalid_models = [f"rerank-{i}" for i in range(3, 100)]  # 97 invalid
    # All valid should succeed
    for model in valid_models:
        codeflash_output = _Reranker.voyageai(model=model); config = codeflash_output
    # All invalid should fail
    for model in invalid_models:
        with pytest.raises(ValueError):
            _Reranker.voyageai(model=model)

def test_voyageai_stress_with_none():
    # Stress test: call with None 500 times
    for _ in range(500):
        codeflash_output = _Reranker.voyageai(model=None); config = codeflash_output # 441μs -> 358μs (23.4% faster)

def test_voyageai_stress_with_valid_models():
    # Stress test: call with valid models in a loop
    valid_models = ["rerank-2", "rerank-2-lite", "rerank-lite-1", "rerank-1"]
    for i in range(250):
        model = valid_models[i % 4]
        codeflash_output = _Reranker.voyageai(model=model); config = codeflash_output # 236μs -> 233μs (1.20% faster)
⏪ Replay Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_pytest_testcollectiontest_batch_py_testcollectiontest_classes_generative_py_testcollectiontest_confi__replay_test_0.py::test_weaviate_collections_classes_config__Reranker_voyageai 12.3μs 12.5μs -1.91%⚠️

To edit these changes git checkout codeflash/optimize-_Reranker.voyageai-mh355k5l and push.

Codeflash

The optimization adds a conditional check to avoid passing the `model` parameter when it's `None`, which provides an 8% speedup by reducing Python's keyword argument overhead.

**Key changes:**
- Added `if model is None:` check to call `_RerankerVoyageAIConfig()` without arguments
- Only passes `model=model` when the value is not `None`

**Why this is faster:**
In CPython, when you pass keyword arguments, Python creates a dictionary to hold the key-value pairs before passing them to the function. By avoiding the `model=model` kwarg when `model` is `None` (which matches the default anyway), we eliminate this dictionary creation overhead. This is a micro-optimization that becomes measurable when the function is called frequently.

**Performance benefits by test case:**
- **Default/None cases** see the biggest gains (6-23% faster): `test_default_model_is_none`, `test_model_none_explicit`, `test_voyageai_stress_with_none` 
- **Non-None model cases** see minimal impact (0.4-1.2% faster): the optimization only applies when model is None
- **Mixed workloads** benefit proportionally based on the ratio of None vs non-None calls

The optimization is most effective when the function is frequently called with the default `None` value, which appears common based on the test cases focusing on default behavior.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 08:09
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant