Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 37% (0.37x) speedup for __parse_weaviate_cloud_cluster_url in weaviate/connect/helpers.py

⏱️ Runtime : 16.6 milliseconds 12.1 milliseconds (best of 77 runs)

📝 Explanation and details

The optimization targets the hot path in _validate_input by adding fast-path validation for the most common case where expected == [str].

Key changes:

  • Fast-path for single _ValidateArgument with [str] expected: Instead of calling the generic _is_valid() function, directly uses isinstance(validate.value, str) when the expected type list contains only str
  • Fast-path for lists of string-only validations: Optimizes the case where all validations in a list expect only strings
  • Preserves fallback behavior: Only uses the original _is_valid() path for complex validation scenarios

Why this is faster:
The original code calls _is_valid() for every validation, which handles complex generic types, Union types, and Sequence types. For the simple case of validating strings (which dominates in __parse_weaviate_cloud_cluster_url), this generic approach is overkill. The optimized version eliminates this function call overhead and complex type checking logic, reducing validation time from ~21.6ms to ~10.8ms (a 50% improvement in the validation function itself).

Test case performance:
The optimization particularly excels on:

  • Simple hostname validations (74-105% faster): Most test cases involving basic hostnames see dramatic improvements
  • Bulk operations (87-101% faster): Large-scale tests with 1000 iterations show consistent speedups
  • URL parsing cases see smaller but meaningful improvements (9-25% faster) since URL parsing itself (urlparse) remains the bottleneck

The 37% overall speedup comes from eliminating the validation bottleneck that was consuming 52% of the original runtime.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 7040 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Tuple
from urllib.parse import urlparse

# imports
import pytest  # used for our unit tests
from weaviate.connect.helpers import __parse_weaviate_cloud_cluster_url


# Function to test and dependencies (minimal viable implementation for testing)
class WeaviateInvalidInputError(Exception):
    pass

class _ValidateArgument:
    def __init__(self, expected, name, value):
        self.expected = expected
        self.name = name
        self.value = value
from weaviate.connect.helpers import __parse_weaviate_cloud_cluster_url

# -------------------- UNIT TESTS --------------------

# 1. Basic Test Cases

def test_hostname_weaviate_network():
    # Basic hostname ending with .weaviate.network
    cluster_url = "mycluster.weaviate.network"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 6.09μs -> 3.17μs (92.1% faster)

def test_hostname_without_weaviate_network():
    # Hostname not ending with .weaviate.network
    cluster_url = "mycluster"
    expected = ("mycluster", "grpc-mycluster")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.23μs -> 2.14μs (98.1% faster)

def test_http_url_weaviate_network():
    # HTTP URL with .weaviate.network
    cluster_url = "http://mycluster.weaviate.network"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 15.5μs -> 13.4μs (15.7% faster)

def test_https_url_weaviate_network():
    # HTTPS URL with .weaviate.network
    cluster_url = "https://mycluster.weaviate.network"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 12.6μs -> 10.5μs (19.7% faster)

def test_http_url_without_weaviate_network():
    # HTTP URL without .weaviate.network
    cluster_url = "http://mycluster"
    expected = ("mycluster", "grpc-mycluster")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 12.1μs -> 9.85μs (22.7% faster)

def test_https_url_without_weaviate_network():
    # HTTPS URL without .weaviate.network
    cluster_url = "https://mycluster"
    expected = ("mycluster", "grpc-mycluster")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 11.7μs -> 9.53μs (23.3% faster)

# 2. Edge Test Cases

def test_hostname_with_subdomain_weaviate_network():
    # Hostname with subdomain and .weaviate.network
    cluster_url = "subdomain.mycluster.weaviate.network"
    expected = ("subdomain.mycluster.weaviate.network", "subdomain.grpc.mycluster.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.24μs -> 2.27μs (87.0% faster)

def test_http_url_with_port_and_path():
    # HTTP URL with port and path
    cluster_url = "http://mycluster.weaviate.network:8080/some/path"
    # urlparse will extract netloc as 'mycluster.weaviate.network:8080'
    expected = ("mycluster.weaviate.network:8080", "mycluster.grpc.weaviate.network:8080")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 13.2μs -> 10.8μs (22.6% faster)

def test_https_url_with_query():
    # HTTPS URL with query string
    cluster_url = "https://mycluster.weaviate.network?foo=bar"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 13.2μs -> 10.9μs (20.7% faster)

def test_hostname_with_dash():
    # Hostname with dash
    cluster_url = "my-cluster.weaviate.network"
    expected = ("my-cluster.weaviate.network", "my-cluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.16μs -> 2.34μs (77.6% faster)

def test_hostname_with_uppercase():
    # Hostname with uppercase letters
    cluster_url = "MyCluster.weaviate.network"
    expected = ("MyCluster.weaviate.network", "MyCluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.02μs -> 2.30μs (75.1% faster)

def test_hostname_with_numbers():
    # Hostname with numbers
    cluster_url = "cluster123.weaviate.network"
    expected = ("cluster123.weaviate.network", "cluster123.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.00μs -> 2.25μs (77.5% faster)

def test_hostname_with_multiple_dots():
    # Hostname with multiple dots but not .weaviate.network
    cluster_url = "my.cluster"
    expected = ("my.cluster", "grpc-my.cluster")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 3.60μs -> 1.97μs (82.4% faster)

def test_hostname_with_trailing_dot():
    # Hostname with trailing dot
    cluster_url = "mycluster."
    expected = ("mycluster.", "grpc-mycluster.")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 3.63μs -> 1.96μs (85.6% faster)




def test_hostname_with_only_weaviate_network():
    # Only .weaviate.network, no ident
    cluster_url = ".weaviate.network"
    # split('.', 1) will give ['', 'weaviate.network']
    expected = (".weaviate.network", ".grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 5.96μs -> 3.09μs (92.9% faster)

def test_hostname_with_long_ident():
    # Long ident before .weaviate.network
    cluster_url = "thisisaverylongclustername.weaviate.network"
    expected = ("thisisaverylongclustername.weaviate.network", "thisisaverylongclustername.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.73μs -> 2.49μs (90.2% faster)

def test_hostname_with_special_characters():
    # Hostname with special characters
    cluster_url = "my_cluster-01.weaviate.network"
    expected = ("my_cluster-01.weaviate.network", "my_cluster-01.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.46μs -> 2.43μs (83.3% faster)

def test_http_url_with_username_password():
    # HTTP URL with username and password
    cluster_url = "http://user:pass@mycluster.weaviate.network"
    # urlparse netloc: 'user:pass@mycluster.weaviate.network'
    expected = ("user:pass@mycluster.weaviate.network", "user:pass@mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 15.2μs -> 13.2μs (15.5% faster)

def test_hostname_with_multiple_weaviate_network():
    # Hostname with multiple .weaviate.network endings (should only split at first)
    cluster_url = "mycluster.weaviate.network.weaviate.network"
    expected = ("mycluster.weaviate.network.weaviate.network", "mycluster.grpc.weaviate.network.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 4.29μs -> 2.35μs (82.8% faster)

# 3. Large Scale Test Cases

def test_many_unique_clusters():
    # Test a large number of unique cluster hostnames
    for i in range(1000):
        cluster_url = f"cluster{i}.weaviate.network"
        expected = (f"cluster{i}.weaviate.network", f"cluster{i}.grpc.weaviate.network")
        codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 1.27ms -> 674μs (88.3% faster)

def test_many_non_weaviate_clusters():
    # Test a large number of hostnames not ending with .weaviate.network
    for i in range(1000):
        cluster_url = f"cluster{i}"
        expected = (f"cluster{i}", f"grpc-cluster{i}")
        codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 1.18ms -> 588μs (100% faster)

def test_many_http_urls():
    # Test a large number of HTTP URLs with .weaviate.network
    for i in range(1000):
        cluster_url = f"http://cluster{i}.weaviate.network"
        expected = (f"cluster{i}.weaviate.network", f"cluster{i}.grpc.weaviate.network")
        codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 3.85ms -> 3.17ms (21.5% faster)

def test_many_https_urls_with_ports():
    # Test a large number of HTTPS URLs with ports
    for i in range(1000):
        cluster_url = f"https://cluster{i}.weaviate.network:{8000+i}"
        expected = (f"cluster{i}.weaviate.network:{8000+i}", f"cluster{i}.grpc.weaviate.network:{8000+i}")
        codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 3.76ms -> 3.08ms (22.0% faster)

def test_large_ident_length():
    # Test hostnames with very long ident part
    ident = "a" * 500
    cluster_url = f"{ident}.weaviate.network"
    expected = (f"{ident}.weaviate.network", f"{ident}.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 5.72μs -> 3.25μs (76.2% faster)

def test_large_non_weaviate_hostname_length():
    # Test hostnames with very long name not ending in .weaviate.network
    ident = "b" * 500
    cluster_url = f"{ident}"
    expected = (ident, f"grpc-{ident}")
    codeflash_output = __parse_weaviate_cloud_cluster_url(cluster_url) # 3.62μs -> 2.00μs (80.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any, List, Sequence, Tuple, Union, get_args, get_origin
from urllib.parse import urlparse

# imports
import pytest
from weaviate.connect.helpers import __parse_weaviate_cloud_cluster_url


# --- Exception class as used in the validator ---
class WeaviateInvalidInputError(Exception):
    pass

# --- Helper classes and functions from weaviate/validator.py ---

class _ValidateArgument:
    def __init__(self, expected: List[Any], name: str, value: Any):
        self.expected = expected
        self.name = name
        self.value = value
from weaviate.connect.helpers import __parse_weaviate_cloud_cluster_url

# --- Unit tests ---

# 1. BASIC TEST CASES

def test_hostname_with_weaviate_network_suffix():
    # Test a typical weaviate cloud cluster hostname
    host = "mycluster.weaviate.network"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 5.99μs -> 3.44μs (74.1% faster)

def test_hostname_without_weaviate_network_suffix():
    # Test a hostname not ending with .weaviate.network
    host = "mycluster.example.com"
    expected = ("mycluster.example.com", "grpc-mycluster.example.com")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 4.31μs -> 2.19μs (97.4% faster)

def test_full_http_url_with_weaviate_network():
    # Test a full URL with http scheme and .weaviate.network domain
    url = "https://mycluster.weaviate.network"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 15.1μs -> 13.4μs (13.0% faster)

def test_full_http_url_without_weaviate_network():
    # Test a full URL with http scheme and custom domain
    url = "http://mycluster.example.com"
    expected = ("mycluster.example.com", "grpc-mycluster.example.com")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 12.4μs -> 9.99μs (23.9% faster)

def test_full_http_url_with_port():
    # Test a full URL with port number
    url = "https://mycluster.weaviate.network:8080"
    expected = ("mycluster.weaviate.network:8080", "mycluster.grpc.weaviate.network:8080")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 12.1μs -> 9.78μs (23.5% faster)

def test_full_http_url_with_path_and_query():
    # Test a full URL with path and query string
    url = "https://mycluster.weaviate.network/some/path?foo=bar"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 13.3μs -> 10.9μs (22.0% faster)

# 2. EDGE TEST CASES

def test_hostname_with_multiple_subdomains():
    # Test a hostname with multiple subdomains and .weaviate.network
    host = "foo.bar.weaviate.network"
    expected = ("foo.bar.weaviate.network", "foo.grpc.bar.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 4.25μs -> 2.36μs (80.2% faster)

def test_hostname_with_dash_and_numbers():
    # Hostname with dashes and numbers
    host = "my-cluster-123.weaviate.network"
    expected = ("my-cluster-123.weaviate.network", "my-cluster-123.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 4.11μs -> 2.29μs (80.0% faster)

def test_hostname_with_trailing_dot():
    # Hostname with trailing dot (should not be valid, but let's see what happens)
    host = "mycluster.weaviate.network."
    expected = ("mycluster.weaviate.network.", "grpc-mycluster.weaviate.network.")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 3.73μs -> 1.93μs (93.9% faster)

def test_hostname_with_leading_dot():
    # Hostname with leading dot (invalid, but should be handled)
    host = ".mycluster.weaviate.network"
    expected = (".mycluster.weaviate.network", "grpc-.mycluster.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 3.97μs -> 2.25μs (75.9% faster)

def test_hostname_with_no_dot():
    # Hostname with no dot at all
    host = "localhost"
    expected = ("localhost", "grpc-localhost")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 3.66μs -> 1.79μs (105% faster)

def test_http_url_with_no_netloc():
    # URL with no netloc (e.g. http:///)
    url = "http:///"
    expected = ("", "grpc-")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 13.8μs -> 11.4μs (20.5% faster)



def test_url_with_username_and_password():
    # URL with username and password
    url = "https://user:pass@mycluster.weaviate.network"
    expected = ("mycluster.weaviate.network", "mycluster.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 16.7μs -> 13.9μs (20.0% faster)

def test_url_with_ipv4_address():
    # URL with IPv4 address
    url = "http://127.0.0.1"
    expected = ("127.0.0.1", "grpc-127.0.0.1")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 12.8μs -> 10.2μs (25.5% faster)

def test_url_with_ipv6_address():
    # URL with IPv6 address
    url = "http://[::1]"
    expected = ("[::1]", "grpc-[::1]")
    codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 27.3μs -> 24.9μs (9.66% faster)

def test_hostname_with_weaviate_network_but_no_ident():
    # Hostname is just .weaviate.network (no ident)
    host = ".weaviate.network"
    expected = (".weaviate.network", ".grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 4.36μs -> 2.38μs (83.0% faster)

def test_hostname_with_multiple_weaviate_network_suffixes():
    # Hostname with multiple .weaviate.network suffixes
    host = "foo.bar.weaviate.network.weaviate.network"
    expected = ("foo.bar.weaviate.network.weaviate.network", "foo.grpc.bar.weaviate.network.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 4.21μs -> 2.39μs (76.1% faster)

# 3. LARGE SCALE TEST CASES

def test_many_unique_hostnames_scalability():
    # Test a large number of unique hostnames for performance and correctness
    for i in range(1000):
        host = f"cluster{i}.weaviate.network"
        expected = (host, f"cluster{i}.grpc.weaviate.network")
        codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 1.29ms -> 690μs (87.0% faster)

def test_many_unique_custom_domains_scalability():
    # Test a large number of custom domains for performance and correctness
    for i in range(1000):
        host = f"cluster{i}.example.com"
        expected = (host, f"grpc-cluster{i}.example.com")
        codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 1.18ms -> 586μs (101% faster)

def test_many_full_urls_with_ports():
    # Test a large number of full URLs with ports
    for i in range(1000):
        url = f"https://cluster{i}.weaviate.network:{8000+i}"
        expected = (f"cluster{i}.weaviate.network:{8000+i}", f"cluster{i}.grpc.weaviate.network:{8000+i}")
        codeflash_output = __parse_weaviate_cloud_cluster_url(url) # 3.77ms -> 3.09ms (21.8% faster)

def test_large_hostname_length():
    # Test a single, very long hostname
    ident = "a" * 200
    host = f"{ident}.weaviate.network"
    expected = (host, f"{ident}.grpc.weaviate.network")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 5.09μs -> 2.94μs (73.1% faster)

def test_large_hostname_custom_domain():
    # Test a single, very long custom domain
    ident = "b" * 200
    host = f"{ident}.example.com"
    expected = (host, f"grpc-{ident}.example.com")
    codeflash_output = __parse_weaviate_cloud_cluster_url(host) # 3.58μs -> 1.98μs (80.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from weaviate.connect.helpers import __parse_weaviate_cloud_cluster_url

def test___parse_weaviate_cloud_cluster_url():
    __parse_weaviate_cloud_cluster_url('.weaviate.network')

def test___parse_weaviate_cloud_cluster_url_2():
    __parse_weaviate_cloud_cluster_url('')

Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-__parse_weaviate_cloud_cluster_url-mh39jo7n and push.

Codeflash

The optimization targets the hot path in `_validate_input` by adding fast-path validation for the most common case where `expected == [str]`. 

**Key changes:**
- **Fast-path for single `_ValidateArgument` with `[str]` expected**: Instead of calling the generic `_is_valid()` function, directly uses `isinstance(validate.value, str)` when the expected type list contains only `str`
- **Fast-path for lists of string-only validations**: Optimizes the case where all validations in a list expect only strings
- **Preserves fallback behavior**: Only uses the original `_is_valid()` path for complex validation scenarios

**Why this is faster:**
The original code calls `_is_valid()` for every validation, which handles complex generic types, Union types, and Sequence types. For the simple case of validating strings (which dominates in `__parse_weaviate_cloud_cluster_url`), this generic approach is overkill. The optimized version eliminates this function call overhead and complex type checking logic, reducing validation time from ~21.6ms to ~10.8ms (a 50% improvement in the validation function itself).

**Test case performance:**
The optimization particularly excels on:
- Simple hostname validations (74-105% faster): Most test cases involving basic hostnames see dramatic improvements
- Bulk operations (87-101% faster): Large-scale tests with 1000 iterations show consistent speedups
- URL parsing cases see smaller but meaningful improvements (9-25% faster) since URL parsing itself (`urlparse`) remains the bottleneck

The 37% overall speedup comes from eliminating the validation bottleneck that was consuming 52% of the original runtime.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 10:12
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant