Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 24, 2025

📄 432% (4.32x) speedup for with_content_type in src/deepgram/core/file.py

⏱️ Runtime : 2.85 milliseconds 536 microseconds (best of 111 runs)

📝 Explanation and details

The optimization achieves a 432% speedup by eliminating two major performance bottlenecks:

1. Removed expensive cast() calls (74% of original runtime)
The original code used cast() for type hints on tuple unpacking, which added significant overhead:

  • Line with cast(Tuple[Optional[str], FileContent], file) took 2.98ms (26.7% of total time)
  • Line with cast(...Mapping[str, str]], file) took 5.27ms (47.3% of total time)

The optimized version directly unpacks tuples (filename, content = file), eliminating these costly function calls entirely.

2. Cached len(file) calculation
Instead of calling len(file) multiple times in the conditional chain, the optimization calculates it once and stores it in file_len. This reduces redundant length calculations.

Performance impact by test case:

  • Tuple-based operations see 200-400% speedups: Test cases involving 2-4 element tuples (most common usage) show the biggest gains since they benefit most from removing cast() overhead
  • Simple file cases show modest changes: Non-tuple cases (bytes, strings, IO objects) have minimal impact since they bypass the tuple processing entirely
  • Error cases improve slightly: Invalid tuple lengths benefit from the cached length calculation

The optimization is particularly effective for applications processing many file objects with metadata (filename, content-type, headers), which is the primary use case for this utility function.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 23 Passed
🌀 Generated Regression Tests 2073 Passed
⏪ Replay Tests 9 Passed
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
unit/test_core_file.py::TestFileTyping.test_various_file_content_types 1.18μs 1.18μs 0.000%✅
unit/test_core_file.py::TestWithContentType.test_four_element_tuple_with_headers 4.59μs 828ns 454%✅
unit/test_core_file.py::TestWithContentType.test_four_element_tuple_with_none_content_type 5.21μs 847ns 515%✅
unit/test_core_file.py::TestWithContentType.test_invalid_tuple_length 1.44μs 1.41μs 1.91%✅
unit/test_core_file.py::TestWithContentType.test_io_file_content 698ns 697ns 0.143%✅
unit/test_core_file.py::TestWithContentType.test_simple_file_content 554ns 562ns -1.42%⚠️
unit/test_core_file.py::TestWithContentType.test_single_element_tuple 1.33μs 1.31μs 1.68%✅
unit/test_core_file.py::TestWithContentType.test_string_file_content 587ns 543ns 8.10%✅
unit/test_core_file.py::TestWithContentType.test_three_element_tuple_with_content_type 3.41μs 774ns 340%✅
unit/test_core_file.py::TestWithContentType.test_three_element_tuple_with_none_content_type 3.01μs 761ns 296%✅
unit/test_core_file.py::TestWithContentType.test_two_element_tuple 3.14μs 741ns 323%✅
🌀 Generated Regression Tests and Runtime
from io import BytesIO
# function to test
# (copied from deepgram/core/file.py)
from typing import IO, Mapping, Optional, Tuple, Union, cast

# imports
import pytest  # used for our unit tests
from deepgram.core.file import with_content_type

FileContent = Union[IO[bytes], bytes, str]
File = Union[
    # file (or bytes)
    FileContent,
    # (filename, file (or bytes))
    Tuple[Optional[str], FileContent],
    # (filename, file (or bytes), content_type)
    Tuple[Optional[str], FileContent, Optional[str]],
    # (filename, file (or bytes), content_type, headers)
    Tuple[
        Optional[str],
        FileContent,
        Optional[str],
        Mapping[str, str],
    ],
]
from deepgram.core.file import with_content_type

# unit tests

# ----- BASIC TEST CASES -----

def test_bytes_file_basic():
    # Basic: file is bytes, no filename/content_type
    codeflash_output = with_content_type(file=b"abc", default_content_type="audio/wav"); result = codeflash_output # 605ns -> 712ns (15.0% slower)

def test_str_file_basic():
    # Basic: file is str, no filename/content_type
    codeflash_output = with_content_type(file="hello", default_content_type="text/plain"); result = codeflash_output # 583ns -> 605ns (3.64% slower)

def test_io_file_basic():
    # Basic: file is IO[bytes], no filename/content_type
    bio = BytesIO(b"data")
    codeflash_output = with_content_type(file=bio, default_content_type="application/octet-stream"); result = codeflash_output # 651ns -> 585ns (11.3% faster)

def test_tuple_filename_bytes():
    # Basic: file is (filename, bytes)
    codeflash_output = with_content_type(file=("file.wav", b"abc"), default_content_type="audio/wav"); result = codeflash_output # 2.49μs -> 753ns (231% faster)

def test_tuple_filename_str():
    # Basic: file is (filename, str)
    codeflash_output = with_content_type(file=("file.txt", "hello"), default_content_type="text/plain"); result = codeflash_output # 2.52μs -> 789ns (219% faster)

def test_tuple_filename_io():
    # Basic: file is (filename, IO[bytes])
    bio = BytesIO(b"abc")
    codeflash_output = with_content_type(file=("file.bin", bio), default_content_type="application/octet-stream"); result = codeflash_output # 2.44μs -> 748ns (226% faster)

def test_tuple_filename_bytes_content_type():
    # Basic: file is (filename, bytes, content_type)
    codeflash_output = with_content_type(file=("file.wav", b"abc", "audio/wav"), default_content_type="audio/mp3"); result = codeflash_output # 2.89μs -> 913ns (217% faster)

def test_tuple_filename_bytes_none_content_type():
    # Basic: file is (filename, bytes, None) -- should default
    codeflash_output = with_content_type(file=("file.wav", b"abc", None), default_content_type="audio/mp3"); result = codeflash_output # 2.86μs -> 876ns (227% faster)

def test_tuple_filename_bytes_empty_content_type():
    # Basic: file is (filename, bytes, "") -- should default
    codeflash_output = with_content_type(file=("file.wav", b"abc", ""), default_content_type="audio/mp3"); result = codeflash_output # 2.73μs -> 894ns (205% faster)

def test_tuple_filename_bytes_content_type_headers():
    # Basic: file is (filename, bytes, content_type, headers)
    headers = {"X-Test": "val"}
    codeflash_output = with_content_type(file=("file.wav", b"abc", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 4.22μs -> 911ns (363% faster)

def test_tuple_filename_bytes_none_content_type_headers():
    # Basic: file is (filename, bytes, None, headers) -- should default
    headers = {"X-Test": "val"}
    codeflash_output = with_content_type(file=("file.wav", b"abc", None, headers), default_content_type="audio/mp3"); result = codeflash_output # 3.76μs -> 892ns (322% faster)

def test_tuple_filename_bytes_empty_content_type_headers():
    # Basic: file is (filename, bytes, "", headers) -- should default
    headers = {"X-Test": "val"}
    codeflash_output = with_content_type(file=("file.wav", b"abc", "", headers), default_content_type="audio/mp3"); result = codeflash_output # 3.79μs -> 886ns (328% faster)

def test_tuple_none_filename_bytes():
    # Basic: file is (None, bytes)
    codeflash_output = with_content_type(file=(None, b"abc"), default_content_type="audio/wav"); result = codeflash_output # 2.39μs -> 797ns (200% faster)

def test_tuple_none_filename_bytes_content_type():
    # Basic: file is (None, bytes, content_type)
    codeflash_output = with_content_type(file=(None, b"abc", "audio/wav"), default_content_type="audio/mp3"); result = codeflash_output # 2.78μs -> 877ns (217% faster)

def test_tuple_none_filename_bytes_none_content_type():
    # Basic: file is (None, bytes, None)
    codeflash_output = with_content_type(file=(None, b"abc", None), default_content_type="audio/mp3"); result = codeflash_output # 2.78μs -> 866ns (221% faster)

def test_tuple_none_filename_bytes_content_type_headers():
    # Basic: file is (None, bytes, content_type, headers)
    headers = {"X-Test": "val"}
    codeflash_output = with_content_type(file=(None, b"abc", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 3.89μs -> 897ns (333% faster)

def test_tuple_none_filename_bytes_none_content_type_headers():
    # Basic: file is (None, b"abc", None, headers)
    headers = {"X-Test": "val"}
    codeflash_output = with_content_type(file=(None, b"abc", None, headers), default_content_type="audio/mp3"); result = codeflash_output # 3.75μs -> 861ns (336% faster)

# ----- EDGE TEST CASES -----

def test_tuple_length_1_raises():
    # Edge: tuple of length 1 is invalid
    with pytest.raises(ValueError) as excinfo:
        with_content_type(file=(b"abc",), default_content_type="audio/wav") # 1.38μs -> 1.34μs (3.06% faster)

def test_tuple_length_5_raises():
    # Edge: tuple of length 5 is invalid
    with pytest.raises(ValueError) as excinfo:
        with_content_type(file=("f", b"abc", "audio/wav", {"X": "Y"}, "extra"), default_content_type="audio/wav") # 1.36μs -> 1.27μs (7.50% faster)

def test_empty_bytes_file():
    # Edge: file is empty bytes
    codeflash_output = with_content_type(file=b"", default_content_type="audio/wav"); result = codeflash_output # 593ns -> 679ns (12.7% slower)

def test_empty_str_file():
    # Edge: file is empty string
    codeflash_output = with_content_type(file="", default_content_type="text/plain"); result = codeflash_output # 616ns -> 635ns (2.99% slower)

def test_empty_io_file():
    # Edge: file is empty IO[bytes]
    bio = BytesIO(b"")
    codeflash_output = with_content_type(file=bio, default_content_type="application/octet-stream"); result = codeflash_output # 658ns -> 617ns (6.65% faster)

def test_filename_is_empty_string():
    # Edge: filename is empty string
    codeflash_output = with_content_type(file=("", b"abc"), default_content_type="audio/wav"); result = codeflash_output # 2.66μs -> 812ns (228% faster)

def test_filename_is_none():
    # Edge: filename is None
    codeflash_output = with_content_type(file=(None, b"abc"), default_content_type="audio/wav"); result = codeflash_output # 2.45μs -> 779ns (214% faster)

def test_content_type_is_empty_string():
    # Edge: content_type is empty string, should default
    codeflash_output = with_content_type(file=("file.wav", b"abc", ""), default_content_type="audio/mp3"); result = codeflash_output # 2.80μs -> 910ns (208% faster)

def test_content_type_is_none():
    # Edge: content_type is None, should default
    codeflash_output = with_content_type(file=("file.wav", b"abc", None), default_content_type="audio/mp3"); result = codeflash_output # 2.76μs -> 878ns (214% faster)

def test_headers_are_empty_dict():
    # Edge: headers is empty dict
    codeflash_output = with_content_type(file=("file.wav", b"abc", "audio/wav", {}), default_content_type="audio/mp3"); result = codeflash_output # 4.17μs -> 922ns (352% faster)

def test_headers_are_large_dict():
    # Edge: headers is a large dict
    headers = {f"Key{i}": f"Val{i}" for i in range(100)}
    codeflash_output = with_content_type(file=("file.wav", b"abc", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 3.94μs -> 908ns (334% faster)

def test_file_is_str_with_unicode():
    # Edge: file is str with unicode
    codeflash_output = with_content_type(file="こんにちは", default_content_type="text/plain; charset=utf-8"); result = codeflash_output # 594ns -> 600ns (1.00% slower)

def test_file_is_bytes_with_non_ascii():
    # Edge: file is bytes with non-ascii
    codeflash_output = with_content_type(file=b"\xff\xfe\xfd", default_content_type="application/octet-stream"); result = codeflash_output # 571ns -> 538ns (6.13% faster)

# ----- LARGE SCALE TEST CASES -----

def test_large_bytes_file():
    # Large scale: file is large bytes
    large_bytes = b"a" * 1000
    codeflash_output = with_content_type(file=large_bytes, default_content_type="audio/wav"); result = codeflash_output # 571ns -> 594ns (3.87% slower)

def test_large_str_file():
    # Large scale: file is large string
    large_str = "a" * 1000
    codeflash_output = with_content_type(file=large_str, default_content_type="text/plain"); result = codeflash_output # 540ns -> 587ns (8.01% slower)

def test_large_io_file():
    # Large scale: file is large IO[bytes]
    bio = BytesIO(b"a" * 1000)
    codeflash_output = with_content_type(file=bio, default_content_type="application/octet-stream"); result = codeflash_output # 617ns -> 644ns (4.19% slower)

def test_large_headers_dict():
    # Large scale: file is (filename, bytes, content_type, large headers)
    headers = {f"X-Header-{i}": f"Value-{i}" for i in range(1000)}
    codeflash_output = with_content_type(file=("bigfile.wav", b"abc", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 4.44μs -> 938ns (373% faster)

def test_many_files_in_loop():
    # Large scale: test many files in a loop, all should resolve correctly
    for i in range(1000):
        filename = f"file{i}.wav"
        content = bytes([i % 256])
        codeflash_output = with_content_type(file=(filename, content), default_content_type="audio/wav"); result = codeflash_output # 967μs -> 216μs (346% faster)

def test_many_headers_in_loop():
    # Large scale: test many headers in a loop
    for i in range(1000):
        headers = {f"X-{i}": f"V-{i}"}
        codeflash_output = with_content_type(file=("file.wav", b"abc", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 1.68ms -> 245μs (584% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from io import BytesIO
# function to test
# (as provided, unchanged)
from typing import IO, Mapping, Optional, Tuple, Union, cast

# imports
import pytest  # used for our unit tests
from deepgram.core.file import with_content_type

FileContent = Union[IO[bytes], bytes, str]
File = Union[
    # file (or bytes)
    FileContent,
    # (filename, file (or bytes))
    Tuple[Optional[str], FileContent],
    # (filename, file (or bytes), content_type)
    Tuple[Optional[str], FileContent, Optional[str]],
    # (filename, file (or bytes), content_type, headers)
    Tuple[
        Optional[str],
        FileContent,
        Optional[str],
        Mapping[str, str],
    ],
]
from deepgram.core.file import with_content_type

# unit tests

# ---- Basic Test Cases ----

def test_bytes_content_as_file():
    # Basic: file is bytes, should wrap with None filename and default content type
    codeflash_output = with_content_type(file=b"abc", default_content_type="audio/wav"); result = codeflash_output # 874ns -> 872ns (0.229% faster)

def test_str_content_as_file():
    # Basic: file is str, should wrap with None filename and default content type
    codeflash_output = with_content_type(file="hello", default_content_type="text/plain"); result = codeflash_output # 638ns -> 668ns (4.49% slower)

def test_io_content_as_file():
    # Basic: file is IO[bytes], should wrap with None filename and default content type
    bio = BytesIO(b"data")
    codeflash_output = with_content_type(file=bio, default_content_type="application/octet-stream"); result = codeflash_output # 701ns -> 695ns (0.863% faster)

def test_tuple_filename_and_bytes():
    # Basic: file is (filename, bytes)
    codeflash_output = with_content_type(file=("file.wav", b"audio"), default_content_type="audio/wav"); result = codeflash_output # 2.72μs -> 840ns (224% faster)

def test_tuple_filename_and_str():
    # Basic: file is (filename, str)
    codeflash_output = with_content_type(file=("data.txt", "hello"), default_content_type="text/plain"); result = codeflash_output # 2.44μs -> 841ns (191% faster)

def test_tuple_filename_and_io():
    # Basic: file is (filename, IO[bytes])
    bio = BytesIO(b"abc")
    codeflash_output = with_content_type(file=("file.bin", bio), default_content_type="application/octet-stream"); result = codeflash_output # 2.47μs -> 819ns (201% faster)

def test_tuple_filename_content_contenttype():
    # Basic: file is (filename, bytes, content_type)
    codeflash_output = with_content_type(file=("file.wav", b"audio", "audio/wav"), default_content_type="audio/mp3"); result = codeflash_output # 2.97μs -> 905ns (228% faster)

def test_tuple_filename_content_contenttype_none():
    # Basic: file is (filename, bytes, None content_type)
    codeflash_output = with_content_type(file=("file.wav", b"audio", None), default_content_type="audio/mp3"); result = codeflash_output # 2.79μs -> 902ns (209% faster)

def test_tuple_filename_none_content_contenttype():
    # Basic: file is (None, bytes, content_type)
    codeflash_output = with_content_type(file=(None, b"audio", "audio/wav"), default_content_type="audio/mp3"); result = codeflash_output # 2.80μs -> 875ns (220% faster)

def test_tuple_filename_content_contenttype_empty_string():
    # Edge: file is (filename, bytes, empty string content_type)
    codeflash_output = with_content_type(file=("file.wav", b"audio", ""), default_content_type="audio/mp3"); result = codeflash_output # 2.76μs -> 885ns (212% faster)

def test_tuple_filename_content_contenttype_headers():
    # Basic: file is (filename, bytes, content_type, headers)
    headers = {"X-Header": "value"}
    codeflash_output = with_content_type(file=("file.wav", b"audio", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 4.23μs -> 928ns (356% faster)

def test_tuple_filename_content_none_headers():
    # Basic: file is (filename, bytes, None content_type, headers)
    headers = {"X-Header": "value"}
    codeflash_output = with_content_type(file=("file.wav", b"audio", None, headers), default_content_type="audio/mp3"); result = codeflash_output # 3.87μs -> 952ns (306% faster)

def test_tuple_filename_content_empty_contenttype_headers():
    # Edge: file is (filename, bytes, empty string content_type, headers)
    headers = {"X-Header": "value"}
    codeflash_output = with_content_type(file=("file.wav", b"audio", "", headers), default_content_type="audio/mp3"); result = codeflash_output # 3.63μs -> 891ns (307% faster)

def test_tuple_none_filename_content_contenttype_headers():
    # Edge: file is (None, bytes, content_type, headers)
    headers = {"X-Header": "value"}
    codeflash_output = with_content_type(file=(None, b"audio", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 3.69μs -> 891ns (315% faster)

def test_tuple_none_filename_content_none_contenttype_headers():
    # Edge: file is (None, bytes, None, headers)
    headers = {"X-Header": "value"}
    codeflash_output = with_content_type(file=(None, b"audio", None, headers), default_content_type="audio/mp3"); result = codeflash_output # 3.68μs -> 830ns (343% faster)

def test_tuple_none_filename_content_empty_contenttype_headers():
    # Edge: file is (None, bytes, "", headers)
    headers = {"X-Header": "value"}
    codeflash_output = with_content_type(file=(None, b"audio", "", headers), default_content_type="audio/mp3"); result = codeflash_output # 3.70μs -> 852ns (334% faster)

# ---- Edge Test Cases ----

def test_tuple_length_1_raises():
    # Edge: tuple of length 1 is invalid
    with pytest.raises(ValueError) as excinfo:
        with_content_type(file=("file.wav",), default_content_type="audio/wav") # 1.39μs -> 1.35μs (3.04% faster)

def test_tuple_length_0_raises():
    # Edge: tuple of length 0 is invalid
    with pytest.raises(ValueError) as excinfo:
        with_content_type(file=(), default_content_type="audio/wav") # 1.37μs -> 1.36μs (0.220% faster)

def test_tuple_length_5_raises():
    # Edge: tuple of length 5 is invalid
    with pytest.raises(ValueError) as excinfo:
        with_content_type(file=("file.wav", b"audio", "audio/wav", {}, "extra"), default_content_type="audio/wav") # 1.32μs -> 1.32μs (0.152% slower)

def test_content_type_is_none_and_default_is_none():
    # Edge: both content_type and default_content_type are None
    codeflash_output = with_content_type(file=("file.wav", b"audio", None), default_content_type=None); result = codeflash_output # 2.94μs -> 961ns (206% faster)

def test_content_type_is_empty_and_default_is_none():
    # Edge: content_type is empty string, default_content_type is None
    codeflash_output = with_content_type(file=("file.wav", b"audio", ""), default_content_type=None); result = codeflash_output # 2.81μs -> 887ns (217% faster)

def test_file_is_empty_bytes():
    # Edge: file is empty bytes
    codeflash_output = with_content_type(file=b"", default_content_type="audio/wav"); result = codeflash_output # 640ns -> 647ns (1.08% slower)

def test_file_is_empty_str():
    # Edge: file is empty str
    codeflash_output = with_content_type(file="", default_content_type="text/plain"); result = codeflash_output # 602ns -> 610ns (1.31% slower)

def test_file_is_empty_io():
    # Edge: file is empty IO[bytes]
    bio = BytesIO(b"")
    codeflash_output = with_content_type(file=bio, default_content_type="application/octet-stream"); result = codeflash_output # 626ns -> 626ns (0.000% faster)

def test_file_is_large_bytes():
    # Large: file is large bytes object (under 1000 bytes)
    data = b"x" * 999
    codeflash_output = with_content_type(file=data, default_content_type="audio/wav"); result = codeflash_output # 561ns -> 606ns (7.43% slower)

def test_file_is_large_str():
    # Large: file is large str object (under 1000 chars)
    data = "y" * 999
    codeflash_output = with_content_type(file=data, default_content_type="text/plain"); result = codeflash_output # 575ns -> 589ns (2.38% slower)

def test_file_is_large_io():
    # Large: file is large IO[bytes] object (under 1000 bytes)
    data = BytesIO(b"z" * 999)
    codeflash_output = with_content_type(file=data, default_content_type="application/octet-stream"); result = codeflash_output # 636ns -> 661ns (3.78% slower)

def test_tuple_filename_content_contenttype_large_headers():
    # Large: headers dict with many entries (under 1000)
    headers = {f"Header-{i}": str(i) for i in range(999)}
    codeflash_output = with_content_type(file=("file.wav", b"audio", "audio/wav", headers), default_content_type="audio/mp3"); result = codeflash_output # 4.59μs -> 915ns (402% faster)

def test_tuple_filename_content_none_contenttype_large_headers():
    # Large: headers dict with many entries, content_type None
    headers = {f"Header-{i}": str(i) for i in range(999)}
    codeflash_output = with_content_type(file=("file.wav", b"audio", None, headers), default_content_type="audio/mp3"); result = codeflash_output # 4.00μs -> 852ns (370% faster)

def test_tuple_none_filename_large_content_large_headers():
    # Large: None filename, large bytes, large headers
    data = b"x" * 999
    headers = {f"Header-{i}": str(i) for i in range(999)}
    codeflash_output = with_content_type(file=(None, data, None, headers), default_content_type="audio/wav"); result = codeflash_output # 3.90μs -> 891ns (338% faster)

def test_tuple_filename_large_str_content_large_headers():
    # Large: filename, large str, large headers
    data = "y" * 999
    headers = {f"Header-{i}": str(i) for i in range(999)}
    codeflash_output = with_content_type(file=("bigfile.txt", data, None, headers), default_content_type="text/plain"); result = codeflash_output # 3.75μs -> 908ns (313% faster)

def test_tuple_filename_large_io_content_large_headers():
    # Large: filename, large IO[bytes], large headers
    data = BytesIO(b"z" * 999)
    headers = {f"Header-{i}": str(i) for i in range(999)}
    codeflash_output = with_content_type(file=("bigfile.bin", data, None, headers), default_content_type="application/octet-stream"); result = codeflash_output # 3.93μs -> 880ns (347% faster)

# ---- Robustness/Mutation Testing ----

@pytest.mark.parametrize("file,default_content_type,expected", [
    # Basic bytes
    (b"abc", "audio/wav", (None, b"abc", "audio/wav")),
    # Basic str
    ("hello", "text/plain", (None, "hello", "text/plain")),
    # Basic IO
    (BytesIO(b"data"), "application/octet-stream", (None, BytesIO(b"data"), "application/octet-stream")),  # Note: BytesIO objects are not equal unless same instance
    # Tuple (filename, content)
    (("file.wav", b"audio"), "audio/wav", ("file.wav", b"audio", "audio/wav")),
    # Tuple (filename, content, content_type)
    (("file.wav", b"audio", "audio/wav"), "audio/mp3", ("file.wav", b"audio", "audio/wav")),
    # Tuple (filename, content, None)
    (("file.wav", b"audio", None), "audio/mp3", ("file.wav", b"audio", "audio/mp3")),
    # Tuple (filename, content, "", headers)
    (("file.wav", b"audio", "", {"X": "Y"}), "audio/mp3", ("file.wav", b"audio", "audio/mp3", {"X": "Y"})),
])
def test_parametrized_basic_cases(file, default_content_type, expected):
    # For BytesIO, compare type and .getvalue() if present
    codeflash_output = with_content_type(file=file, default_content_type=default_content_type); result = codeflash_output # 14.5μs -> 5.31μs (173% faster)
    if isinstance(file, BytesIO):
        pass
    elif isinstance(file, tuple) and len(file) > 1 and isinstance(file[1], BytesIO):
        if len(file) == 4:
            pass
    else:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from deepgram.core.file import with_content_type

def test_with_content_type():
    with_content_type(file=('', '', '', {}), default_content_type='')

def test_with_content_type_2():
    with_content_type(file=('', '', ''), default_content_type='')

def test_with_content_type_3():
    with_content_type(file=('', ''), default_content_type='')

def test_with_content_type_4():
    with_content_type(file='', default_content_type='')
⏪ Replay Tests and Runtime
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_7zeygj7s/tmpxrd1qf7d/test_concolic_coverage.py::test_with_content_type 4.16μs 901ns 362%✅
codeflash_concolic_7zeygj7s/tmpxrd1qf7d/test_concolic_coverage.py::test_with_content_type_2 2.87μs 866ns 231%✅
codeflash_concolic_7zeygj7s/tmpxrd1qf7d/test_concolic_coverage.py::test_with_content_type_3 2.48μs 768ns 223%✅
codeflash_concolic_7zeygj7s/tmpxrd1qf7d/test_concolic_coverage.py::test_with_content_type_4 565ns 586ns -3.58%⚠️

To edit these changes git checkout codeflash/optimize-with_content_type-mh4hq3mo and push.

Codeflash

The optimization achieves a **432% speedup** by eliminating two major performance bottlenecks:

**1. Removed expensive `cast()` calls (74% of original runtime)**
The original code used `cast()` for type hints on tuple unpacking, which added significant overhead:
- Line with `cast(Tuple[Optional[str], FileContent], file)` took 2.98ms (26.7% of total time)
- Line with `cast(...Mapping[str, str]], file)` took 5.27ms (47.3% of total time)

The optimized version directly unpacks tuples (`filename, content = file`), eliminating these costly function calls entirely.

**2. Cached `len(file)` calculation**
Instead of calling `len(file)` multiple times in the conditional chain, the optimization calculates it once and stores it in `file_len`. This reduces redundant length calculations.

**Performance impact by test case:**
- **Tuple-based operations see 200-400% speedups**: Test cases involving 2-4 element tuples (most common usage) show the biggest gains since they benefit most from removing `cast()` overhead
- **Simple file cases show modest changes**: Non-tuple cases (bytes, strings, IO objects) have minimal impact since they bypass the tuple processing entirely
- **Error cases improve slightly**: Invalid tuple lengths benefit from the cached length calculation

The optimization is particularly effective for applications processing many file objects with metadata (filename, content-type, headers), which is the primary use case for this utility function.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 24, 2025 06:49
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant