Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 6% (0.06x) speedup for _NamedVectors.text2vec_aws in weaviate/collections/classes/config_named_vectors.py

⏱️ Runtime : 326 microseconds 308 microseconds (best of 38 runs)

📝 Explanation and details

The optimization achieves a 5% speedup by extracting the _Text2VecAWSConfig object creation into a separate local variable assignment before the return statement.

Key changes:

  • Split the nested object construction into two steps: first create the vectorizer object, then use it in the _NamedVectorConfigCreate constructor
  • Changed the return type annotation from bare _NamedVectorConfigCreate to string "_NamedVectorConfigCreate" (forward reference)

Why this improves performance:
The optimization reduces the complexity of the return statement by avoiding nested object construction within the constructor call. When Python executes the original code, it has to manage multiple stack frames simultaneously - one for the outer _NamedVectorConfigCreate constructor and another for the nested _Text2VecAWSConfig constructor. By separating these operations, Python can optimize the object allocation and reduce the overhead of managing nested constructor calls.

Test case effectiveness:
The optimization shows consistent improvements across all test scenarios, with speedups ranging from 1-10%. It's particularly effective for:

  • Basic cases with minimal arguments (8-9% improvement)
  • Edge cases with empty strings or None values (4-9% improvement)
  • Large-scale cases with many source properties (1-7% improvement)

The performance gain is most pronounced in simpler scenarios where the constructor overhead represents a larger fraction of total execution time.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 21 Passed
⏪ Replay Tests 1 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import List, Optional, Union

# imports
import pytest
from weaviate.collections.classes.config_named_vectors import _NamedVectors


# Minimal stubs for dependencies (since we are not to use external libraries or mocking)
class AWSModel:
    pass

class AWSService:
    pass

class _VectorIndexConfigCreate:
    def __init__(self, config_value=None):
        self.config_value = config_value

class _Text2VecAWSConfig:
    def __init__(
        self,
        model: Optional[Union[AWSModel, str]],
        endpoint: Optional[str],
        region: str,
        service: Union[AWSService, str],
        vectorizeClassName: bool,
    ):
        self.model = model
        self.endpoint = endpoint
        self.region = region
        self.service = service
        self.vectorizeClassName = vectorizeClassName

class _NamedVectorConfigCreate:
    def __init__(
        self,
        name: str,
        source_properties: Optional[List[str]],
        vectorizer: _Text2VecAWSConfig,
        vector_index_config: Optional[_VectorIndexConfigCreate],
    ):
        self.name = name
        self.source_properties = source_properties
        self.vectorizer = vectorizer
        self.vector_index_config = vector_index_config
from weaviate.collections.classes.config_named_vectors import _NamedVectors

# unit tests

# 1. Basic Test Cases

def test_basic_minimal_required_fields():
    # Test with only required fields
    codeflash_output = _NamedVectors.text2vec_aws(name="myvec", region="us-west-2"); result = codeflash_output # 13.5μs -> 12.5μs (8.03% faster)



def test_basic_vectorize_collection_name_false():
    # Test with vectorize_collection_name set to False
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec5",
        region="us-east-2",
        vectorize_collection_name=False,
    ); result = codeflash_output # 16.9μs -> 15.6μs (8.36% faster)

# 2. Edge Test Cases


def test_edge_long_name_and_region():
    # Test with long strings for name and region
    long_name = "n" * 512
    long_region = "r" * 128
    codeflash_output = _NamedVectors.text2vec_aws(name=long_name, region=long_region); result = codeflash_output # 17.0μs -> 15.7μs (8.34% faster)

def test_edge_source_properties_special_characters():
    # Test with special characters in source_properties
    props = ["title", "body$", "🦄", "中文"]
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-special",
        region="us-west-1",
        source_properties=props,
    ); result = codeflash_output # 10.9μs -> 10.4μs (5.37% faster)


def test_edge_vector_index_config_none():
    # Test with vector_index_config as None explicitly
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-vic-none",
        region="us-west-2",
        vector_index_config=None,
    ); result = codeflash_output # 16.6μs -> 15.9μs (4.77% faster)

def test_edge_source_properties_none():
    # Test with source_properties as None explicitly
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-sp-none",
        region="us-west-2",
        source_properties=None,
    ); result = codeflash_output # 9.68μs -> 8.99μs (7.72% faster)

def test_edge_source_properties_large_list():
    # Test with a large number of source_properties (but <1000)
    props = [f"prop{i}" for i in range(999)]
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-large-props",
        region="us-west-2",
        source_properties=props,
    ); result = codeflash_output # 19.9μs -> 19.3μs (3.14% faster)

def test_edge_endpoint_empty_string():
    # Test with endpoint as empty string
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-endpoint-empty",
        region="us-west-2",
        endpoint="",
    ); result = codeflash_output # 9.14μs -> 8.38μs (9.00% faster)

def test_edge_model_empty_string():
    # Test with model as empty string
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-model-empty",
        region="us-west-2",
        model="",
    ); result = codeflash_output # 8.78μs -> 8.25μs (6.43% faster)

def test_edge_service_empty_string():
    # Test with service as empty string
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec-service-empty",
        region="us-west-2",
        service="",
    ); result = codeflash_output # 8.53μs -> 7.96μs (7.11% faster)


def test_large_scale_many_source_properties():
    # Test with 999 source_properties
    props = [f"field_{i}" for i in range(999)]
    codeflash_output = _NamedVectors.text2vec_aws(
        name="large_scale_vec",
        region="eu-west-1",
        source_properties=props,
    ); result = codeflash_output # 25.9μs -> 25.7μs (1.05% faster)

def test_large_scale_long_strings_everywhere():
    # Test with very long strings for all string fields
    long_str = "x" * 1000
    props = [long_str for _ in range(10)]
    codeflash_output = _NamedVectors.text2vec_aws(
        name=long_str,
        region=long_str,
        endpoint=long_str,
        model=long_str,
        service=long_str,
        source_properties=props,
    ); result = codeflash_output # 11.1μs -> 10.6μs (4.28% faster)




#------------------------------------------------
from typing import List, Optional, Union

# imports
import pytest
from weaviate.collections.classes.config_named_vectors import _NamedVectors


# Minimal stubs for dependencies (since we can't import actual weaviate classes)
class AWSModel:
    def __init__(self, name):
        self.name = name

class AWSService:
    def __init__(self, name):
        self.name = name

class _VectorIndexConfigCreate:
    def __init__(self, config_value):
        self.config_value = config_value

class _Text2VecAWSConfig:
    def __init__(self, model, endpoint, region, service, vectorizeClassName):
        self.model = model
        self.endpoint = endpoint
        self.region = region
        self.service = service
        self.vectorizeClassName = vectorizeClassName

class _NamedVectorConfigCreate:
    def __init__(
        self,
        name,
        source_properties,
        vectorizer,
        vector_index_config,
    ):
        self.name = name
        self.source_properties = source_properties
        self.vectorizer = vectorizer
        self.vector_index_config = vector_index_config
from weaviate.collections.classes.config_named_vectors import _NamedVectors

# unit tests

# 1. Basic Test Cases

def test_basic_minimal_required_args():
    # Only required arguments
    codeflash_output = _NamedVectors.text2vec_aws(name="myvec", region="us-west-2"); result = codeflash_output # 17.9μs -> 16.4μs (8.86% faster)




def test_basic_vectorize_collection_name_false():
    # vectorize_collection_name explicitly set to False
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec4",
        region="us-east-2",
        vectorize_collection_name=False,
    ); result = codeflash_output # 16.7μs -> 15.1μs (10.1% faster)

# 2. Edge Test Cases


def test_edge_long_name_and_region():
    # Very long strings for name and region
    long_name = "n" * 256
    long_region = "r" * 128
    codeflash_output = _NamedVectors.text2vec_aws(
        name=long_name,
        region=long_region,
    ); result = codeflash_output # 16.4μs -> 15.8μs (3.69% faster)



def test_edge_endpoint_empty_string():
    # endpoint as empty string
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec7",
        region="us-west-1",
        endpoint="",
    ); result = codeflash_output # 16.8μs -> 15.7μs (7.01% faster)

def test_edge_source_properties_large_list():
    # source_properties with many elements
    props = [f"prop{i}" for i in range(1000)]
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec8",
        region="us-west-1",
        source_properties=props,
    ); result = codeflash_output # 20.8μs -> 19.4μs (7.02% faster)

def test_edge_vector_index_config_none():
    # vector_index_config explicitly None
    codeflash_output = _NamedVectors.text2vec_aws(
        name="vec9",
        region="us-west-2",
        vector_index_config=None,
    ); result = codeflash_output # 9.46μs -> 9.13μs (3.57% faster)

def test_edge_bool_vectorize_collection_name_types():
    # vectorize_collection_name as True/False
    for val in [True, False]:
        codeflash_output = _NamedVectors.text2vec_aws(
            name="vec10",
            region="us-west-2",
            vectorize_collection_name=val,
        ); result = codeflash_output # 13.1μs -> 12.2μs (7.32% faster)




def test_large_scale_source_properties_max_length():
    # source_properties with maximum allowed elements (1000)
    props = [f"property_{i}" for i in range(1000)]
    codeflash_output = _NamedVectors.text2vec_aws(
        name="largevec",
        region="us-east-1",
        source_properties=props,
    ); result = codeflash_output # 26.4μs -> 25.3μs (4.29% faster)
⏪ Replay Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_pytest_testcollectiontest_batch_py_testcollectiontest_classes_generative_py_testcollectiontest_confi__replay_test_0.py::test_weaviate_collections_classes_config_named_vectors__NamedVectors_text2vec_aws 20.8μs 19.7μs 5.94%✅

To edit these changes git checkout codeflash/optimize-_NamedVectors.text2vec_aws-mh2wl0s5 and push.

Codeflash

The optimization achieves a **5% speedup** by extracting the `_Text2VecAWSConfig` object creation into a separate local variable assignment before the return statement. 

**Key changes:**
- Split the nested object construction into two steps: first create the `vectorizer` object, then use it in the `_NamedVectorConfigCreate` constructor
- Changed the return type annotation from bare `_NamedVectorConfigCreate` to string `"_NamedVectorConfigCreate"` (forward reference)

**Why this improves performance:**
The optimization reduces the complexity of the return statement by avoiding nested object construction within the constructor call. When Python executes the original code, it has to manage multiple stack frames simultaneously - one for the outer `_NamedVectorConfigCreate` constructor and another for the nested `_Text2VecAWSConfig` constructor. By separating these operations, Python can optimize the object allocation and reduce the overhead of managing nested constructor calls.

**Test case effectiveness:**
The optimization shows consistent improvements across all test scenarios, with speedups ranging from 1-10%. It's particularly effective for:
- Basic cases with minimal arguments (8-9% improvement)  
- Edge cases with empty strings or None values (4-9% improvement)
- Large-scale cases with many source properties (1-7% improvement)

The performance gain is most pronounced in simpler scenarios where the constructor overhead represents a larger fraction of total execution time.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 04:09
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant