Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 10, 2025

📄 24% (0.24x) speedup for FeatureRegistryClientWithOverride.parse_feature_path in google/cloud/aiplatform/utils/__init__.py

⏱️ Runtime : 4.25 milliseconds 3.42 milliseconds (best of 297 runs)

📝 Explanation and details

The optimized code achieves a 24% speedup by precompiling the regular expression pattern at module load time instead of compiling it every time the function is called.

Key optimization:

  • Pattern precompilation: The regex pattern is compiled once as _FEATURE_PATH_PATTERN at module import, eliminating the need to recompile it on every function call
  • Method change: Switched from re.match() to the precompiled pattern's .match() method

Why this improves performance:
Regular expression compilation is computationally expensive, involving parsing the pattern string and building a finite state machine. The original code was recompiling this complex pattern on every invocation, which the profiler shows as 79.4% of the total execution time (8.62ms out of 10.85ms). By precompiling, this overhead is eliminated entirely.

Performance characteristics by test case:

  • Valid paths (standard parsing): 25-35% faster - the optimization directly reduces the main bottleneck
  • Invalid/mismatched paths (empty result): 100-140% faster - these benefit most because they avoid compilation overhead while failing fast
  • Large inputs (long segment names): 6-10% faster - compilation overhead becomes relatively smaller compared to actual matching work
  • Batch operations (many calls): 18-38% faster - the benefit compounds with repeated calls since compilation happens only once

This optimization is particularly effective for applications that parse many feature paths, as the compilation cost is amortized across all calls while maintaining identical functionality.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 3133 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re
from typing import Dict

# imports
import pytest  # used for our unit tests
from aiplatform.utils.__init__ import FeatureRegistryClientWithOverride

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_parse_feature_path_basic_valid():
    # Standard valid path
    path = "projects/myproj/locations/us-central1/featureGroups/mygroup/features/myfeature"
    expected = {
        "project": "myproj",
        "location": "us-central1",
        "feature_group": "mygroup",
        "feature": "myfeature"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.90μs -> 2.17μs (33.3% faster)

def test_parse_feature_path_basic_valid_with_numbers_and_underscores():
    # Path with numbers and underscores
    path = "projects/proj_123/locations/location_1/featureGroups/group_2/features/feature_3"
    expected = {
        "project": "proj_123",
        "location": "location_1",
        "feature_group": "group_2",
        "feature": "feature_3"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.79μs -> 2.09μs (33.7% faster)

def test_parse_feature_path_basic_valid_with_hyphens_and_dots():
    # Path with hyphens and dots
    path = "projects/proj-abc/locations/europe-west1/featureGroups/group.name/features/feature.name"
    expected = {
        "project": "proj-abc",
        "location": "europe-west1",
        "feature_group": "group.name",
        "feature": "feature.name"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.82μs -> 2.13μs (31.9% faster)

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_parse_feature_path_empty_string():
    # Empty string should return empty dict
    path = ""
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 1.38μs -> 587ns (136% faster)

def test_parse_feature_path_missing_segments():
    # Missing segments (too short)
    path = "projects/myproj/locations/us-central1/features/myfeature"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.63μs -> 1.95μs (35.1% faster)

def test_parse_feature_path_extra_segments():
    # Extra segments (too long)
    path = "projects/myproj/locations/us-central1/featureGroups/mygroup/features/myfeature/extra"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 3.01μs -> 2.35μs (28.2% faster)

def test_parse_feature_path_wrong_order():
    # Wrong order of segments
    path = "projects/myproj/featureGroups/mygroup/locations/us-central1/features/myfeature"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.88μs -> 2.15μs (33.8% faster)

def test_parse_feature_path_wrong_prefix():
    # Wrong prefix
    path = "project/myproj/locations/us-central1/featureGroups/mygroup/features/myfeature"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 1.41μs -> 695ns (103% faster)

def test_parse_feature_path_case_sensitivity():
    # Case sensitivity check
    path = "Projects/myproj/Locations/us-central1/FeatureGroups/mygroup/Features/myfeature"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 1.41μs -> 637ns (121% faster)

def test_parse_feature_path_empty_segments():
    # Empty segment values
    path = "projects//locations//featureGroups//features/"
    expected = {
        "project": "",
        "location": "",
        "feature_group": "",
        "feature": ""
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 1.35μs -> 587ns (129% faster)

def test_parse_feature_path_special_characters():
    # Segments with special characters
    path = "projects/proj$@!/locations/loc#%/featureGroups/group^&*/features/feature()"
    expected = {
        "project": "proj$@!",
        "location": "loc#%",
        "feature_group": "group^&*",
        "feature": "feature()"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 3.22μs -> 2.60μs (23.9% faster)

def test_parse_feature_path_slashes_in_segments():
    # Segments containing slashes (should not match, as slashes separate segments)
    path = "projects/my/proj/locations/us/central1/featureGroups/my/group/features/my/feature"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.94μs -> 2.30μs (27.9% faster)

def test_parse_feature_path_partial_match():
    # Only partial match (missing 'features' segment)
    path = "projects/myproj/locations/us-central1/featureGroups/mygroup/"
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 2.92μs -> 2.21μs (32.7% faster)

def test_parse_feature_path_non_string_input():
    # Non-string input should not match (should not raise, just return {})
    path = None
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(str(path)); result = codeflash_output # 1.34μs -> 605ns (122% faster)

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_parse_feature_path_large_project_name():
    # Large project name
    large_project = "a" * 500
    path = f"projects/{large_project}/locations/us-central1/featureGroups/mygroup/features/myfeature"
    expected = {
        "project": large_project,
        "location": "us-central1",
        "feature_group": "mygroup",
        "feature": "myfeature"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 7.25μs -> 6.58μs (10.2% faster)

def test_parse_feature_path_large_feature_group_name():
    # Large feature group name
    large_feature_group = "g" * 800
    path = f"projects/myproj/locations/us-central1/featureGroups/{large_feature_group}/features/myfeature"
    expected = {
        "project": "myproj",
        "location": "us-central1",
        "feature_group": large_feature_group,
        "feature": "myfeature"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 9.19μs -> 8.62μs (6.53% faster)

def test_parse_feature_path_large_feature_name():
    # Large feature name
    large_feature = "f" * 999
    path = f"projects/myproj/locations/us-central1/featureGroups/mygroup/features/{large_feature}"
    expected = {
        "project": "myproj",
        "location": "us-central1",
        "feature_group": "mygroup",
        "feature": large_feature
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 11.1μs -> 10.4μs (6.11% faster)

def test_parse_feature_path_many_unique_paths():
    # Test many unique paths in a loop, all should parse correctly
    for i in range(100):
        path = f"projects/proj{i}/locations/loc{i}/featureGroups/group{i}/features/feature{i}"
        expected = {
            "project": f"proj{i}",
            "location": f"loc{i}",
            "feature_group": f"group{i}",
            "feature": f"feature{i}"
        }
        codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 108μs -> 83.7μs (30.0% faster)

def test_parse_feature_path_performance_large_batch():
    # Batch test with 1000 valid and 1000 invalid paths
    valid_paths = [
        f"projects/p{i}/locations/l{i}/featureGroups/g{i}/features/f{i}"
        for i in range(500)
    ]
    invalid_paths = [
        f"projects/p{i}/locations/l{i}/featureGroups/g{i}/features"  # missing feature
        for i in range(500)
    ]
    # All valid paths should parse correctly
    for i, path in enumerate(valid_paths):
        expected = {
            "project": f"p{i}",
            "location": f"l{i}",
            "feature_group": f"g{i}",
            "feature": f"f{i}"
        }
        codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 470μs -> 341μs (37.7% faster)
    # All invalid paths should return empty dict
    for i, path in enumerate(invalid_paths):
        codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path); result = codeflash_output # 754μs -> 628μs (20.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from typing import Dict

# imports
import pytest  # used for our unit tests
from aiplatform.utils.__init__ import FeatureRegistryClientWithOverride

# unit tests

# Basic Test Cases

def test_parse_feature_path_basic_valid():
    # Test with a valid, typical feature path
    path = "projects/myproject/locations/us-central1/featureGroups/group1/features/featureA"
    expected = {
        "project": "myproject",
        "location": "us-central1",
        "feature_group": "group1",
        "feature": "featureA"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 4.00μs -> 3.18μs (25.8% faster)

def test_parse_feature_path_basic_valid_with_numbers():
    # Test with numbers in project, location, feature_group, and feature
    path = "projects/123/locations/456/featureGroups/789/features/000"
    expected = {
        "project": "123",
        "location": "456",
        "feature_group": "789",
        "feature": "000"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 2.78μs -> 2.10μs (32.0% faster)

def test_parse_feature_path_basic_valid_with_special_chars():
    # Test with special characters (except slash) in each segment
    path = "projects/my.project-1/locations/us_central1/featureGroups/group-1/features/feature_A-1"
    expected = {
        "project": "my.project-1",
        "location": "us_central1",
        "feature_group": "group-1",
        "feature": "feature_A-1"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 3.06μs -> 2.29μs (33.7% faster)

# Edge Test Cases

def test_parse_feature_path_empty_string():
    # Test with empty string
    path = ""
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.43μs -> 661ns (116% faster)

def test_parse_feature_path_missing_segments():
    # Test with missing segments
    path = "projects/myproject/locations/us-central1/featureGroups/group1"
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 2.91μs -> 2.24μs (29.9% faster)

def test_parse_feature_path_extra_segments():
    # Test with extra segments (should not match)
    path = "projects/myproject/locations/us-central1/featureGroups/group1/features/featureA/extra"
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 2.91μs -> 2.35μs (24.1% faster)

def test_parse_feature_path_wrong_order():
    # Test with segments in wrong order
    path = "locations/us-central1/projects/myproject/featureGroups/group1/features/featureA"
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.39μs -> 683ns (104% faster)

def test_parse_feature_path_wrong_prefix():
    # Test with wrong prefix
    path = "project/myproject/locations/us-central1/featureGroups/group1/features/featureA"
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.45μs -> 724ns (99.6% faster)

def test_parse_feature_path_empty_segments():
    # Test with empty segments
    path = "projects//locations//featureGroups//features/"
    expected = {
        "project": "",
        "location": "",
        "feature_group": "",
        "feature": ""
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.35μs -> 629ns (114% faster)

def test_parse_feature_path_segments_with_slash():
    # Test with segment containing a slash (should break the match)
    path = "projects/my/project/locations/us-central1/featureGroups/group1/features/featureA"
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 3.17μs -> 2.56μs (23.8% faster)

def test_parse_feature_path_segments_with_reserved_words():
    # Test with reserved words as values
    path = "projects/projects/locations/locations/featureGroups/featureGroups/features/features"
    expected = {
        "project": "projects",
        "location": "locations",
        "feature_group": "featureGroups",
        "feature": "features"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 2.80μs -> 2.28μs (22.8% faster)

def test_parse_feature_path_leading_trailing_spaces():
    # Test with leading/trailing spaces in the path (should not match)
    path = "  projects/myproject/locations/us-central1/featureGroups/group1/features/featureA  "
    expected = {}
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.39μs -> 624ns (123% faster)

def test_parse_feature_path_unicode_characters():
    # Test with unicode characters in segments
    path = "projects/项目/locations/地點/featureGroups/组/features/特征"
    expected = {
        "project": "项目",
        "location": "地點",
        "feature_group": "组",
        "feature": "特征"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 3.47μs -> 2.76μs (25.6% faster)

# Large Scale Test Cases

def test_parse_feature_path_long_segment_names():
    # Test with very long segment names
    long_str = "a" * 256
    path = f"projects/{long_str}/locations/{long_str}/featureGroups/{long_str}/features/{long_str}"
    expected = {
        "project": long_str,
        "location": long_str,
        "feature_group": long_str,
        "feature": long_str
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 11.1μs -> 10.4μs (6.77% faster)

def test_parse_feature_path_many_paths():
    # Test with many valid paths to check scalability/performance
    for i in range(1000):
        path = f"projects/proj{i}/locations/loc{i}/featureGroups/group{i}/features/feat{i}"
        expected = {
            "project": f"proj{i}",
            "location": f"loc{i}",
            "feature_group": f"group{i}",
            "feature": f"feat{i}"
        }
        codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.05ms -> 803μs (31.3% faster)

def test_parse_feature_path_many_invalid_paths():
    # Test with many invalid paths to check scalability/performance
    for i in range(1000):
        path = f"projects/proj{i}/locations/loc{i}/featureGroups/group{i}/features"
        expected = {}
        codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 1.73ms -> 1.46ms (18.6% faster)

def test_parse_feature_path_large_feature_name():
    # Test with a very large feature name
    large_feature = "feature" + "X" * 900
    path = f"projects/myproject/locations/us-central1/featureGroups/group1/features/{large_feature}"
    expected = {
        "project": "myproject",
        "location": "us-central1",
        "feature_group": "group1",
        "feature": large_feature
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 11.2μs -> 10.3μs (8.69% faster)

def test_parse_feature_path_large_feature_group_name():
    # Test with a very large feature group name
    large_group = "group" + "Y" * 900
    path = f"projects/myproject/locations/us-central1/featureGroups/{large_group}/features/featureA"
    expected = {
        "project": "myproject",
        "location": "us-central1",
        "feature_group": large_group,
        "feature": "featureA"
    }
    codeflash_output = FeatureRegistryClientWithOverride.parse_feature_path(path) # 10.3μs -> 9.34μs (10.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-FeatureRegistryClientWithOverride.parse_feature_path-mgklftfj and push.

Codeflash

The optimized code achieves a **24% speedup** by precompiling the regular expression pattern at module load time instead of compiling it every time the function is called.

**Key optimization:**
- **Pattern precompilation**: The regex pattern is compiled once as `_FEATURE_PATH_PATTERN` at module import, eliminating the need to recompile it on every function call
- **Method change**: Switched from `re.match()` to the precompiled pattern's `.match()` method

**Why this improves performance:**
Regular expression compilation is computationally expensive, involving parsing the pattern string and building a finite state machine. The original code was recompiling this complex pattern on every invocation, which the profiler shows as 79.4% of the total execution time (8.62ms out of 10.85ms). By precompiling, this overhead is eliminated entirely.

**Performance characteristics by test case:**
- **Valid paths** (standard parsing): 25-35% faster - the optimization directly reduces the main bottleneck
- **Invalid/mismatched paths** (empty result): 100-140% faster - these benefit most because they avoid compilation overhead while failing fast
- **Large inputs** (long segment names): 6-10% faster - compilation overhead becomes relatively smaller compared to actual matching work
- **Batch operations** (many calls): 18-38% faster - the benefit compounds with repeated calls since compilation happens only once

This optimization is particularly effective for applications that parse many feature paths, as the compilation cost is amortized across all calls while maintaining identical functionality.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 10, 2025 08:37
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant