Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 8% (0.08x) speedup for use_async_with_weaviate_cloud in weaviate/connect/helpers.py

⏱️ Runtime : 1.42 milliseconds 1.32 milliseconds (best of 40 runs)

📝 Explanation and details

The optimization improves performance by 7% through strategic lazy validation optimizations in two key functions:

1. Optimized Type Checking in __parse_weaviate_cloud_cluster_url:

  • Replaced _validate_input(_ValidateArgument([str], "cluster_url", cluster_url)) (always called) with if type(cluster_url) is str: pass else: _validate_input(...)
  • Uses faster type(x) is str check instead of expensive validation framework
  • Only performs costly validation when the input is actually invalid (rare case)
  • Reduces execution time from ~691μs to ~286μs (58% faster for this function)

2. Optimized Type Checking in __parse_auth_credentials:

  • Changed isinstance(creds, str) to type(creds) is str for the most common case
  • type(x) is str is faster than isinstance(x, str) as it avoids method resolution overhead
  • Keeps isinstance for the more complex credential type checking where inheritance matters

Performance Impact:

  • These optimizations are most effective for valid inputs (the common case), where the fast path avoids expensive validation
  • Test results show 6-30% improvements across various scenarios, with the largest gains on simple valid inputs
  • The optimization particularly benefits scenarios with frequent client creation, as shown by consistent speedups across all test cases
  • Edge cases with invalid inputs still get proper validation, maintaining correctness while optimizing the happy path

The changes maintain full backward compatibility and error handling while optimizing the most frequently executed code paths.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 25 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Dict, Optional, Tuple, Union
from urllib.parse import urlparse

# imports
import pytest
from weaviate.connect.helpers import use_async_with_weaviate_cloud

# Minimal stubs for required classes and functions to allow tests to run deterministically.
# These are simplified and do not implement any real logic, just enough for the tests.

class WeaviateInvalidInputError(Exception):
    pass

# Auth credentials stubs
class _APIKey:
    def __init__(self, api_key):
        self.api_key = api_key

class _BearerToken:
    def __init__(self, token):
        self.token = token

class _ClientCredentials:
    def __init__(self, client_id, client_secret):
        self.client_id = client_id
        self.client_secret = client_secret

class _ClientPassword:
    def __init__(self, username, password):
        self.username = username
        self.password = password

class Auth:
    @staticmethod
    def api_key(api_key: str):
        return _APIKey(api_key)
    @staticmethod
    def bearer_token(token: str):
        return _BearerToken(token)
    @staticmethod
    def client_credentials(client_id, client_secret):
        return _ClientCredentials(client_id, client_secret)
    @staticmethod
    def client_password(username, password):
        return _ClientPassword(username, password)

# AdditionalConfig stub
class AdditionalConfig:
    def __init__(self, **kwargs):
        self.config = kwargs

# ConnectionParams and ProtocolParams stubs
class ProtocolParams:
    def __init__(self, host, port, secure):
        self.host = host
        self.port = port
        self.secure = secure

class ConnectionParams:
    def __init__(self, http, grpc):
        self.http = http
        self.grpc = grpc

# WeaviateAsyncClient stub
class WeaviateAsyncClient:
    def __init__(
        self,
        connection_params,
        auth_client_secret,
        additional_headers,
        additional_config,
        skip_init_checks,
    ):
        self.connection_params = connection_params
        self.auth_client_secret = auth_client_secret
        self.additional_headers = additional_headers
        self.additional_config = additional_config
        self.skip_init_checks = skip_init_checks

    # Simulate async context manager
    async def __aenter__(self):
        return self
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        pass

    async def connect(self):
        pass

    async def close(self):
        pass

    async def is_ready(self):
        # Always return True for testing
        return True
from weaviate.connect.helpers import use_async_with_weaviate_cloud

# ---------------------- UNIT TESTS ----------------------

# 1. Basic Test Cases

def test_basic_hostname_api_key():
    """Test with a basic hostname and API key string."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="abc123.something.weaviate.network",
        auth_credentials="my-api-key"
    ); client = codeflash_output # 66.8μs -> 59.0μs (13.3% faster)





def test_basic_hostname_none_auth():
    """Test with None for auth_credentials."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="host.weaviate.network",
        auth_credentials=None
    ); client = codeflash_output # 59.7μs -> 54.7μs (9.28% faster)


def test_basic_hostname_skip_init_checks_false():
    """Test skip_init_checks default value is False."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="host.weaviate.network",
        auth_credentials="my-api-key"
    ); client = codeflash_output # 62.1μs -> 56.3μs (10.3% faster)

# 2. Edge Test Cases

def test_edge_http_url_input():
    """Test with a full http URL for cluster_url."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="https://abc123.something.weaviate.network",
        auth_credentials="key"
    ); client = codeflash_output # 50.0μs -> 44.5μs (12.5% faster)

def test_edge_non_weaviate_network_hostname():
    """Test with a hostname not ending in '.weaviate.network'."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="custom-hostname",
        auth_credentials="key"
    ); client = codeflash_output # 39.7μs -> 34.5μs (15.0% faster)

def test_edge_empty_headers_dict():
    """Test with empty headers dict."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="abc123.something.weaviate.network",
        auth_credentials="key",
        headers={}
    ); client = codeflash_output # 42.0μs -> 40.5μs (3.59% faster)


def test_edge_invalid_auth_credentials_type():
    """Test with invalid auth_credentials type should raise ValueError."""
    with pytest.raises(ValueError):
        use_async_with_weaviate_cloud(
            cluster_url="abc123.something.weaviate.network",
            auth_credentials=object()  # Invalid type
        ) # 23.3μs -> 18.1μs (28.8% faster)



def test_edge_dot_in_non_weaviate_network_hostname():
    """Test with a dot in a non-weaviate.network hostname."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="foo.bar",
        auth_credentials="key"
    ); client = codeflash_output # 62.1μs -> 55.8μs (11.4% faster)

def test_edge_headers_with_special_chars():
    """Test with headers containing special characters."""
    headers = {"X-API-KEY": "fööbär", "Another": "😀"}
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="abc123.something.weaviate.network",
        auth_credentials="key",
        headers=headers
    ); client = codeflash_output # 47.2μs -> 44.4μs (6.33% faster)


def test_edge_skip_init_checks_true():
    """Test skip_init_checks True."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="abc123.something.weaviate.network",
        auth_credentials="key",
        skip_init_checks=True
    ); client = codeflash_output # 62.2μs -> 56.0μs (11.0% faster)

# 3. Large Scale Test Cases

def test_large_scale_many_headers():
    """Test with a large number of headers."""
    headers = {f"Header-{i}": f"Value-{i}" for i in range(500)}
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="abc123.something.weaviate.network",
        auth_credentials="key",
        headers=headers
    ); client = codeflash_output # 142μs -> 147μs (3.11% slower)

def test_large_scale_long_cluster_url():
    """Test with a very long cluster_url."""
    long_hostname = "a" * 200 + ".something.weaviate.network"
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url=long_hostname,
        auth_credentials="key"
    ); client = codeflash_output # 45.3μs -> 40.7μs (11.3% faster)
    ident, domain = long_hostname.split(".", 1)





def test_large_scale_hostname_with_many_dots():
    """Test with a cluster_url containing many dots."""
    cluster_url = "sub1.sub2.sub3.sub4.sub5.weaviate.network"
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url=cluster_url,
        auth_credentials="key"
    ); client = codeflash_output # 62.0μs -> 55.8μs (11.0% faster)
    ident, domain = cluster_url.split(".", 1)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import random
import string
from typing import Dict, Optional
from urllib.parse import urlparse

# imports
import pytest
from weaviate.connect.helpers import use_async_with_weaviate_cloud


# Minimal stubs for dependencies, as we cannot import real weaviate classes here.
class _APIKey:
    def __init__(self, api_key: str):
        self.api_key = api_key

class Auth:
    @staticmethod
    def api_key(api_key: str) -> _APIKey:
        return _APIKey(api_key)

class ProtocolParams:
    def __init__(self, host: str, port: int, secure: bool):
        self.host = host
        self.port = port
        self.secure = secure

class ConnectionParams:
    def __init__(self, http: ProtocolParams, grpc: ProtocolParams):
        self.http = http
        self.grpc = grpc

class AdditionalConfig:
    def __init__(self, **kwargs):
        self.config = kwargs

class WeaviateAsyncClient:
    def __init__(
        self,
        connection_params: ConnectionParams,
        auth_client_secret: Optional[_APIKey],
        additional_headers: Optional[Dict[str, str]],
        additional_config: Optional[AdditionalConfig],
        skip_init_checks: bool,
    ):
        self.connection_params = connection_params
        self.auth_client_secret = auth_client_secret
        self.additional_headers = additional_headers
        self.additional_config = additional_config
        self.skip_init_checks = skip_init_checks
        self.connected = False

    async def connect(self):
        self.connected = True

    async def is_ready(self):
        return self.connected

    async def close(self):
        self.connected = False

    async def __aenter__(self):
        await self.connect()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        await self.close()

# Exception for invalid input
class WeaviateInvalidInputError(Exception):
    pass
from weaviate.connect.helpers import use_async_with_weaviate_cloud

# --------------------- UNIT TESTS ----------------------

# 1. Basic Test Cases

def test_basic_hostname_api_key():
    """Test basic usage with a hostname and API key string."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="testcluster.weaviate.network",
        auth_credentials="my-key",
    ); client = codeflash_output # 41.8μs -> 37.9μs (10.5% faster)





def test_invalid_auth_credentials_type():
    """Test that an invalid type for auth_credentials raises error."""
    with pytest.raises(ValueError):
        use_async_with_weaviate_cloud(
            cluster_url="edgecluster.weaviate.network",
            auth_credentials=object(),  # not str, not _APIKey, not None
        ) # 24.0μs -> 18.6μs (29.0% faster)

def test_hostname_no_weaviate_network():
    """Test cluster_url that does not end with .weaviate.network."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="mycustomhost",
        auth_credentials="key"
    ); client = codeflash_output # 52.9μs -> 49.8μs (6.09% faster)

def test_url_with_port_and_path():
    """Test parsing a URL with port and path, should extract netloc."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="https://myhost.weaviate.network:8080/api",
        auth_credentials="key"
    ); client = codeflash_output # 48.1μs -> 43.7μs (9.98% faster)

def test_none_auth_credentials():
    """Test with None for auth_credentials."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="noneauth.weaviate.network",
        auth_credentials=None
    ); client = codeflash_output # 38.1μs -> 33.6μs (13.5% faster)

def test_empty_headers_and_config():
    """Test passing empty dict for headers and config."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="emptyheaders.weaviate.network",
        auth_credentials="key",
        headers={},
        additional_config=None
    ); client = codeflash_output # 41.9μs -> 39.6μs (5.94% faster)

def test_skip_init_checks_true():
    """Test skip_init_checks=True is propagated."""
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="skipinit.weaviate.network",
        auth_credentials="key",
        skip_init_checks=True
    ); client = codeflash_output # 38.4μs -> 33.8μs (13.7% faster)

def test_long_cluster_url():
    """Test very long cluster_url."""
    long_name = "a" * 200 + ".weaviate.network"
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url=long_name,
        auth_credentials="key"
    ); client = codeflash_output # 38.3μs -> 34.3μs (11.8% faster)

def test_special_characters_in_cluster_url():
    """Test cluster_url with special characters."""
    special = "tést-clüster.weaviate.network"
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url=special,
        auth_credentials="key"
    ); client = codeflash_output # 38.4μs -> 35.4μs (8.35% faster)

# 3. Large Scale Test Cases

def test_large_number_of_headers():
    """Test with a large number of headers."""
    headers = {f"X-Key-{i}": f"val-{i}" for i in range(1000)}
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url="largeheaders.weaviate.network",
        auth_credentials="key",
        headers=headers
    ); client = codeflash_output # 235μs -> 234μs (0.657% faster)
    for i in range(1000):
        pass

def test_large_cluster_url():
    """Test with a cluster_url near 1000 chars."""
    prefix = ''.join(random.choices(string.ascii_lowercase, k=980))
    cluster_url = f"{prefix}.weaviate.network"
    codeflash_output = use_async_with_weaviate_cloud(
        cluster_url=cluster_url,
        auth_credentials="key"
    ); client = codeflash_output # 53.8μs -> 47.3μs (13.7% faster)




#------------------------------------------------
from weaviate.connect.helpers import use_async_with_weaviate_cloud
import pytest

def test_use_async_with_weaviate_cloud():
    with pytest.raises(ValidationError):
        use_async_with_weaviate_cloud('', None, headers={}, additional_config=None, skip_init_checks=False)

Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-use_async_with_weaviate_cloud-mh3at2yp and push.

Codeflash

The optimization improves performance by **7%** through strategic lazy validation optimizations in two key functions:

**1. Optimized Type Checking in `__parse_weaviate_cloud_cluster_url`:**
- Replaced `_validate_input(_ValidateArgument([str], "cluster_url", cluster_url))` (always called) with `if type(cluster_url) is str: pass else: _validate_input(...)`
- Uses faster `type(x) is str` check instead of expensive validation framework
- Only performs costly validation when the input is actually invalid (rare case)
- Reduces execution time from ~691μs to ~286μs (58% faster for this function)

**2. Optimized Type Checking in `__parse_auth_credentials`:**
- Changed `isinstance(creds, str)` to `type(creds) is str` for the most common case
- `type(x) is str` is faster than `isinstance(x, str)` as it avoids method resolution overhead
- Keeps `isinstance` for the more complex credential type checking where inheritance matters

**Performance Impact:**
- These optimizations are most effective for **valid inputs** (the common case), where the fast path avoids expensive validation
- Test results show 6-30% improvements across various scenarios, with the largest gains on simple valid inputs
- The optimization particularly benefits scenarios with frequent client creation, as shown by consistent speedups across all test cases
- Edge cases with invalid inputs still get proper validation, maintaining correctness while optimizing the happy path

The changes maintain full backward compatibility and error handling while optimizing the most frequently executed code paths.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 10:47
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant