Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 213% (2.13x) speedup for filter_sensitive_headers in src/deepgram/extensions/core/telemetry_events.py

⏱️ Runtime : 3.64 milliseconds 1.16 milliseconds (best of 305 runs)

📝 Explanation and details

The optimization replaces any(key_lower.startswith(prefix) for prefix in sensitive_prefixes) with the more efficient key_lower.startswith(sensitive_prefixes).

Key Change:

  • Direct tuple prefix checking: Python's str.startswith() method natively accepts a tuple of prefixes, eliminating the need for a generator expression and any() function call.

Why This is Faster:

  • Eliminates generator overhead: The original code creates a generator object and iterates through it with any(), which involves Python's iterator protocol overhead
  • Reduces function calls: Instead of multiple startswith() calls wrapped in any(), there's a single startswith() call that handles the tuple internally in optimized C code
  • Better memory efficiency: No temporary generator object creation

Performance Impact:
The line profiler shows the prefix checking line (if key_lower.startswith...) dropped from 65.3% of total runtime (13.57ms) to 23.6% (2.12ms) - a ~6.4x improvement on this specific line. This optimization is particularly effective for:

  • Large header sets: Test cases with 500-1000 headers show 200-350% speedups
  • Mixed sensitive/safe headers: Cases with both types benefit most (45-90% faster)
  • Frequent prefix matching: When many headers match sensitive prefixes, the reduced overhead compounds

The overall 212% speedup demonstrates how optimizing the most expensive operation (prefix checking) in a tight loop can dramatically improve performance across all test scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 46 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from typing import Dict, Mapping

# imports
import pytest
from deepgram.extensions.core.telemetry_events import filter_sensitive_headers

# unit tests

# 1. Basic Test Cases

def test_empty_headers_none():
    """Test that None input returns None."""
    codeflash_output = filter_sensitive_headers(None) # 360ns -> 330ns (9.09% faster)

def test_empty_headers_dict():
    """Test that empty dict input returns None."""
    codeflash_output = filter_sensitive_headers({}) # 359ns -> 348ns (3.16% faster)

def test_only_safe_headers():
    """Test that safe headers are preserved."""
    headers = {
        "Content-Type": "application/json",
        "User-Agent": "pytest",
        "Accept": "application/xml"
    }
    expected = {
        "Content-Type": "application/json",
        "User-Agent": "pytest",
        "Accept": "application/xml"
    }
    codeflash_output = filter_sensitive_headers(headers) # 4.25μs -> 2.62μs (61.8% faster)

def test_only_sensitive_headers():
    """Test that only sensitive headers are removed, resulting in None."""
    headers = {
        "Authorization": "Bearer xyz",
        "Cookie": "sessionid=abc",
        "X-API-Key": "12345",
        "Set-Cookie": "foo=bar",
        "x-auth-token": "token"
    }
    codeflash_output = filter_sensitive_headers(headers) # 2.01μs -> 1.93μs (4.09% faster)

def test_mixed_headers():
    """Test that sensitive headers are removed and safe ones are kept."""
    headers = {
        "Authorization": "Bearer xyz",
        "Content-Type": "application/json",
        "User-Agent": "pytest",
        "Cookie": "sessionid=abc"
    }
    expected = {
        "Content-Type": "application/json",
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.72μs -> 2.43μs (53.3% faster)

def test_case_insensitivity():
    """Test that header filtering is case-insensitive."""
    headers = {
        "authorization": "Bearer xyz",
        "AUTHORIZATION": "Bearer abc",
        "Content-Type": "application/json",
        "cookie": "sessionid=abc",
        "Set-Cookie": "foo=bar",
        "X-API-KEY": "12345",
        "x-auth-token": "token",
        "User-Agent": "pytest"
    }
    expected = {
        "Content-Type": "application/json",
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.99μs -> 2.64μs (51.1% faster)

def test_safe_headers_with_similar_names():
    """Test that headers with similar names but not matching sensitive rules are kept."""
    headers = {
        "Authorization-Info": "not sensitive",
        "Cookie-Policy": "strict",
        "X-API-Keys": "multiple",
        "X-Auths": "many",
        "Set-Cookier": "should stay",
        "Content-Type": "application/json"
    }
    expected = {
        "Authorization-Info": "not sensitive",
        "Cookie-Policy": "strict",
        "X-API-Keys": "multiple",
        "X-Auths": "many",
        "Set-Cookier": "should stay",
        "Content-Type": "application/json"
    }
    codeflash_output = filter_sensitive_headers(headers) # 5.58μs -> 2.90μs (92.4% faster)

# 2. Edge Test Cases

def test_header_with_empty_string_key_and_value():
    """Test that an empty string key is kept (not sensitive)."""
    headers = {
        "": "",
        "Content-Type": "application/json"
    }
    expected = {
        "": "",
        "Content-Type": "application/json"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.18μs -> 2.01μs (57.9% faster)

def test_header_with_none_value():
    """Test that None values are converted to 'None' string."""
    headers = {
        "Content-Type": None,
        "User-Agent": "pytest"
    }
    expected = {
        "Content-Type": "None",
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.36μs -> 2.02μs (66.4% faster)

def test_header_with_integer_value():
    """Test that integer values are converted to string."""
    headers = {
        "Content-Length": 123,
        "User-Agent": "pytest"
    }
    expected = {
        "Content-Length": "123",
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.20μs -> 2.11μs (51.8% faster)

def test_sensitive_header_with_whitespace():
    """Test that sensitive headers with leading/trailing whitespace are not filtered (since not exact match)."""
    headers = {
        " Authorization ": "Bearer xyz",
        "Cookie ": "sessionid=abc",
        "X-API-Key ": "12345",
        "Content-Type": "application/json"
    }
    expected = {
        " Authorization ": "Bearer xyz",
        "Cookie ": "sessionid=abc",
        "X-API-Key ": "12345",
        "Content-Type": "application/json"
    }
    codeflash_output = filter_sensitive_headers(headers) # 4.87μs -> 2.45μs (98.9% faster)

def test_sensitive_header_with_prefix_and_suffix():
    """Test that headers starting with sensitive prefixes are filtered, even with suffix."""
    headers = {
        "Authorization-Extra": "should be filtered",
        "Sec-WebSocket-Key": "should be filtered",
        "Cookie-Policy": "should be kept",
        "X-API-Key-Secondary": "should be filtered",
        "X-Auth-Extra": "should be filtered",
        "User-Agent": "pytest"
    }
    expected = {
        "Cookie-Policy": "should be kept",
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 5.31μs -> 2.79μs (90.6% faster)

def test_sensitive_header_substring_not_filtered():
    """Test that headers containing sensitive substrings but not as prefix are kept."""
    headers = {
        "My-Authorization": "not filtered",
        "Api-Cookie": "not filtered",
        "Key-X-API": "not filtered",
        "Auth-X": "not filtered",
        "User-Agent": "pytest"
    }
    expected = {
        "My-Authorization": "not filtered",
        "Api-Cookie": "not filtered",
        "Key-X-API": "not filtered",
        "Auth-X": "not filtered",
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 4.85μs -> 2.72μs (78.3% faster)

def test_sensitive_header_bearer():
    """Test that 'bearer' is filtered as a sensitive header."""
    headers = {
        "Bearer": "token",
        "User-Agent": "pytest"
    }
    expected = {
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 2.64μs -> 1.85μs (42.7% faster)

def test_sensitive_header_set_cookie():
    """Test that 'Set-Cookie' is filtered as a sensitive header."""
    headers = {
        "Set-Cookie": "foo=bar",
        "User-Agent": "pytest"
    }
    expected = {
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 2.57μs -> 1.78μs (44.5% faster)

def test_sensitive_header_sec_prefix():
    """Test that headers starting with 'sec-' are filtered."""
    headers = {
        "Sec-WebSocket-Key": "should be filtered",
        "Sec-Fetch-Site": "should be filtered",
        "User-Agent": "pytest"
    }
    expected = {
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.76μs -> 2.14μs (75.5% faster)

def test_sensitive_header_x_auth_prefix():
    """Test that headers starting with 'x-auth' are filtered."""
    headers = {
        "X-Auth-Token": "should be filtered",
        "X-Auth-Extra": "should be filtered",
        "User-Agent": "pytest"
    }
    expected = {
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.68μs -> 2.01μs (83.1% faster)

def test_sensitive_header_x_api_key_prefix():
    """Test that headers starting with 'x-api-key' are filtered."""
    headers = {
        "X-API-Key": "should be filtered",
        "X-API-Key-Secondary": "should be filtered",
        "User-Agent": "pytest"
    }
    expected = {
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.50μs -> 2.08μs (68.1% faster)

def test_sensitive_header_cookie_prefix():
    """Test that headers starting with 'cookie' are filtered."""
    headers = {
        "Cookie": "should be filtered",
        "Cookie-Policy": "should be filtered",
        "CookieExtra": "should be filtered",
        "User-Agent": "pytest"
    }
    expected = {
        "User-Agent": "pytest"
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.87μs -> 2.28μs (70.1% faster)

# 3. Large Scale Test Cases

def test_large_number_of_safe_headers():
    """Test performance and correctness with 1000 safe headers."""
    headers = {f"Safe-Header-{i}": f"value-{i}" for i in range(1000)}
    expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(1000)}
    codeflash_output = filter_sensitive_headers(headers) # 487μs -> 156μs (212% faster)

def test_large_number_of_sensitive_headers():
    """Test that all sensitive headers are filtered out from a large set."""
    sensitive_names = [
        "Authorization", "Cookie", "Set-Cookie", "X-API-Key", "X-Auth-Token", "Bearer",
        "Sec-WebSocket-Key", "Sec-Fetch-Site", "X-Auth-Extra", "X-API-Key-Secondary"
    ]
    headers = {name + f"-{i}": f"value-{i}" for i, name in enumerate(sensitive_names)}
    # Only those with sensitive prefix will be filtered, but those with suffix are not always filtered unless prefix matches.
    # Let's add some that are exact matches too
    for name in sensitive_names:
        headers[name] = "should be filtered"
    # Add some safe headers
    for i in range(10):
        headers[f"Safe-Header-{i}"] = f"value-{i}"
    # Only safe headers should remain
    expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(10)}
    codeflash_output = filter_sensitive_headers(headers) # 15.7μs -> 6.71μs (134% faster)

def test_large_mixed_headers():
    """Test with 500 safe and 500 sensitive headers mixed together."""
    headers = {}
    # Add 500 safe headers
    for i in range(500):
        headers[f"Safe-Header-{i}"] = f"value-{i}"
    # Add 500 sensitive headers (using sensitive prefixes and exact names)
    for i in range(250):
        headers[f"Authorization-{i}"] = f"sensitive-{i}"
        headers[f"X-API-Key-{i}"] = f"sensitive-{i}"
    for i in range(250):
        headers["Cookie"] = "should be filtered"
        headers["Set-Cookie"] = "should be filtered"
    # Only safe headers should remain
    expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(500)}
    codeflash_output = filter_sensitive_headers(headers) # 428μs -> 130μs (228% faster)

def test_large_headers_all_filtered():
    """Test that all headers are filtered when all are sensitive."""
    headers = {}
    for i in range(1000):
        headers[f"Authorization-{i}"] = f"value-{i}"
    codeflash_output = filter_sensitive_headers(headers) # 289μs -> 97.1μs (198% faster)

def test_large_headers_all_safe():
    """Test that all headers are kept when none are sensitive."""
    headers = {}
    for i in range(1000):
        headers[f"Safe-Header-{i}"] = f"value-{i}"
    expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(1000)}
    codeflash_output = filter_sensitive_headers(headers) # 487μs -> 154μs (216% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from collections import OrderedDict
# function to test
from typing import Dict, Mapping

# imports
import pytest
from deepgram.extensions.core.telemetry_events import filter_sensitive_headers

# unit tests

# --------------------------
# BASIC TEST CASES
# --------------------------

def test_none_input_returns_none():
    # Should return None if input is None
    codeflash_output = filter_sensitive_headers(None) # 347ns -> 354ns (1.98% slower)

def test_empty_dict_returns_none():
    # Should return None if input is an empty dict
    codeflash_output = filter_sensitive_headers({}) # 344ns -> 347ns (0.865% slower)

def test_no_sensitive_headers_returns_all():
    # No sensitive headers, all should be returned as strings
    headers = {'Content-Type': 'application/json', 'Accept': 'text/html'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.95μs -> 2.47μs (60.2% faster)

def test_sensitive_header_removed():
    # Sensitive header should be removed
    headers = {'Authorization': 'secret', 'Content-Type': 'application/json'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.90μs -> 2.06μs (41.0% faster)

def test_case_insensitive_sensitive_header():
    # Should match sensitive headers regardless of case
    headers = {'AUTHORIZATION': 'secret', 'Content-Type': 'application/json'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.83μs -> 2.05μs (38.3% faster)

def test_sensitive_prefix_header_removed():
    # Should remove headers with sensitive prefixes (case-insensitive)
    headers = {'X-Api-Key': 'abc', 'X-Auth-Token': 'def', 'Content-Type': 'text/plain'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.89μs -> 2.06μs (40.1% faster)

def test_multiple_sensitive_and_non_sensitive():
    # Only non-sensitive headers should remain
    headers = {
        'Authorization': 'secret',
        'Cookie': 'yum',
        'Set-Cookie': 'id=1',
        'Content-Type': 'application/json',
        'Accept': 'text/html',
        'X-Api-Key': 'hidden'
    }
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.88μs -> 2.67μs (45.1% faster)

def test_value_conversion_to_string():
    # All values should be converted to strings
    headers = {'Content-Length': 123, 'Accept': True}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.54μs -> 2.28μs (55.5% faster)

# --------------------------
# EDGE TEST CASES
# --------------------------

def test_sensitive_header_within_word_not_removed():
    # Only headers that start with the sensitive prefix should be removed, not those containing it
    headers = {'My-Authorization-Header': 'public', 'Content-Type': 'application/json'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.33μs -> 2.06μs (61.8% faster)

def test_sensitive_header_with_spaces():
    # Headers with spaces should not be matched as sensitive
    headers = {' Authorization ': 'secret', 'Content-Type': 'application/json'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.24μs -> 1.94μs (67.1% faster)

def test_sensitive_header_with_mixed_case_and_prefix():
    # Should match prefix regardless of case
    headers = {'x-AUTH-token': 'abc', 'Accept': 'yes'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.56μs -> 1.88μs (35.9% faster)

def test_sensitive_header_as_substring_only():
    # Should not remove headers where sensitive word is only a substring, not prefix
    headers = {'my-cookie-jar': 'open', 'Content-Type': 'text/html'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.22μs -> 2.06μs (56.5% faster)

def test_all_sensitive_headers_returns_none():
    # If all headers are sensitive, should return None
    headers = {
        'Authorization': 'secret',
        'Cookie': 'yum',
        'Set-Cookie': 'id=1',
        'X-Api-Key': 'hidden',
        'X-Auth': 'token'
    }
    codeflash_output = filter_sensitive_headers(headers) # 3.46μs -> 2.07μs (66.9% faster)

def test_ordered_dict_input():
    # Should accept OrderedDict and preserve keys
    headers = OrderedDict([('Accept', 'a'), ('Authorization', 'b'), ('X-Api-Key', 'c')])
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.30μs -> 2.48μs (33.0% faster)


def test_header_key_is_empty_string():
    # Empty string as a key should not be filtered
    headers = {'': 'empty', 'Authorization': 'secret'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.74μs -> 2.60μs (43.5% faster)

def test_header_value_is_none():
    # None values should be converted to string 'None'
    headers = {'Accept': None, 'Authorization': 'secret'}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.20μs -> 2.23μs (43.1% faster)

# --------------------------
# LARGE SCALE TEST CASES
# --------------------------

def test_large_number_of_non_sensitive_headers():
    # Should handle a large number of headers efficiently
    headers = {f'Header-{i}': f'value-{i}' for i in range(500)}
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 250μs -> 81.6μs (207% faster)

def test_large_number_of_sensitive_headers():
    # Should filter out all sensitive headers in a large set
    headers = {f'Authorization-{i}': 'secret' for i in range(500)}
    # None of these should be filtered out, as they do not match exactly or by prefix (except for those with prefix)
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 147μs -> 50.3μs (192% faster)

def test_mixed_large_sensitive_and_non_sensitive_headers():
    # Mix of sensitive and non-sensitive headers
    headers = {f'Header-{i}': f'value-{i}' for i in range(500)}
    headers.update({f'Authorization-{i}': 'secret' for i in range(500)})
    headers.update({f'X-Api-Key-{i}': 'key' for i in range(200)})
    headers.update({'Accept': 'all'})
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 476μs -> 148μs (221% faster)
    # Only non-sensitive headers should remain
    expected = {f'Header-{i}': f'value-{i}' for i in range(500)}
    expected['Accept'] = 'all'

def test_large_headers_with_varied_cases():
    # Test case-insensitivity with large input
    headers = {f'X-API-KEY-{i}': 'key' for i in range(500)}
    headers.update({f'header-{i}': 'value' for i in range(500)})
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 459μs -> 130μs (251% faster)
    # All 'X-API-KEY-{i}' should be filtered out
    expected = {f'header-{i}': 'value' for i in range(500)}

def test_large_input_all_filtered_out():
    # All headers are sensitive, should return None
    headers = {f'X-Auth-{i}': 'token' for i in range(500)}
    codeflash_output = filter_sensitive_headers(headers) # 237μs -> 54.5μs (335% faster)

def test_large_input_all_non_sensitive_with_some_empty_keys():
    # Large input, some keys are empty strings
    headers = {f'Header-{i}': f'value-{i}' for i in range(499)}
    headers[''] = 'empty'
    codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 243μs -> 78.3μs (211% faster)
    expected = {f'Header-{i}': f'value-{i}' for i in range(499)}
    expected[''] = 'empty'
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from deepgram.extensions.core.telemetry_events import filter_sensitive_headers

def test_filter_sensitive_headers():
    filter_sensitive_headers({'': ''})

def test_filter_sensitive_headers_2():
    filter_sensitive_headers({})
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_d0k9fm5y/tmpdeqw_shz/test_concolic_coverage.py::test_filter_sensitive_headers 2.55μs 1.76μs 45.0%✅
codeflash_concolic_d0k9fm5y/tmpdeqw_shz/test_concolic_coverage.py::test_filter_sensitive_headers_2 365ns 378ns -3.44%⚠️

To edit these changes git checkout codeflash/optimize-filter_sensitive_headers-mh2vh80f and push.

Codeflash

The optimization replaces `any(key_lower.startswith(prefix) for prefix in sensitive_prefixes)` with the more efficient `key_lower.startswith(sensitive_prefixes)`. 

**Key Change:**
- **Direct tuple prefix checking**: Python's `str.startswith()` method natively accepts a tuple of prefixes, eliminating the need for a generator expression and `any()` function call.

**Why This is Faster:**
- **Eliminates generator overhead**: The original code creates a generator object and iterates through it with `any()`, which involves Python's iterator protocol overhead
- **Reduces function calls**: Instead of multiple `startswith()` calls wrapped in `any()`, there's a single `startswith()` call that handles the tuple internally in optimized C code
- **Better memory efficiency**: No temporary generator object creation

**Performance Impact:**
The line profiler shows the prefix checking line (`if key_lower.startswith...`) dropped from 65.3% of total runtime (13.57ms) to 23.6% (2.12ms) - a **~6.4x improvement** on this specific line. This optimization is particularly effective for:

- **Large header sets**: Test cases with 500-1000 headers show 200-350% speedups
- **Mixed sensitive/safe headers**: Cases with both types benefit most (45-90% faster)
- **Frequent prefix matching**: When many headers match sensitive prefixes, the reduced overhead compounds

The overall 212% speedup demonstrates how optimizing the most expensive operation (prefix checking) in a tight loop can dramatically improve performance across all test scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 03:38
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant