Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 24, 2025

📄 14% (0.14x) speedup for AudioTranscriptionRequestStream.serialize_model in src/mistralai/models/audiotranscriptionrequeststream.py

⏱️ Runtime : 307 microseconds 269 microseconds (best of 108 runs)

📝 Explanation and details

The optimized code achieves a 14% speedup by making three key data structure optimizations:

  1. Lists to Sets conversion: Changed optional_fields, nullable_fields, and null_default_fields from lists to sets. This converts membership tests (k in optional_fields) from O(n) linear searches to O(1) constant-time lookups, which is particularly beneficial when checking multiple field memberships in the loop.

  2. Direct set membership test: Replaced self.__pydantic_fields_set__.intersection({n}) with n in self.__pydantic_fields_set__. The original creates a temporary set and performs an intersection operation for each field, while the optimized version uses a direct membership test - much more efficient for single-element checks.

  3. Removed unnecessary pop operation: Eliminated serialized.pop(k, None) since the popped value wasn't used. This removes an O(n) dictionary operation per field iteration.

The test results show consistent improvements across all scenarios, with particularly strong gains (20-24%) in cases with multiple optional/nullable fields where the membership tests are performed more frequently. These optimizations are most effective for models with many optional fields, as the serialization logic performs multiple containment checks per field during the iteration loop.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 35 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from typing import List, Literal, Optional

import pydantic
# imports
import pytest
from mistralai.models.audiotranscriptionrequeststream import \
    AudioTranscriptionRequestStream
from mistralai.models.file import File
from mistralai.models.timestampgranularity import TimestampGranularity
from mistralai.types import UNSET, UNSET_SENTINEL, BaseModel, OptionalNullable
from mistralai.utils import (FieldMetadata, MultipartFormMetadata,
                             validate_const)
from pydantic import model_serializer
from pydantic.functional_validators import AfterValidator
from typing_extensions import Annotated


# Minimal stubs for dependencies for testing purposes
class File:
    def __init__(self, filename, content):
        self.filename = filename
        self.content = content

    def __eq__(self, other):
        return (
            isinstance(other, File)
            and self.filename == other.filename
            and self.content == other.content
        )

class TimestampGranularity(str):
    WORD = "word"
    SEGMENT = "segment"

    def __new__(cls, value):
        if value not in (cls.WORD, cls.SEGMENT):
            raise ValueError("Invalid granularity")
        return str.__new__(cls, value)

# UNSET/UNSET_SENTINEL for optional/nullable fields
class _UnsetType:
    pass
UNSET = _UnsetType()
UNSET_SENTINEL = UNSET

# OptionalNullable type
class OptionalNullable:
    def __init__(self, value):
        self.value = value
    def __eq__(self, other):
        if isinstance(other, OptionalNullable):
            return self.value == other.value
        return self.value == other

def validate_const(val):
    def inner(x):
        if x != val:
            raise ValueError("Must be constant %r" % val)
        return x
    return inner

class FieldMetadata:
    def __init__(self, multipart=False):
        self.multipart = multipart

class MultipartFormMetadata:
    def __init__(self, file=False):
        self.file = file
from mistralai.models.audiotranscriptionrequeststream import \
    AudioTranscriptionRequestStream


# Helper function for handler (Pydantic serialization)
def dummy_handler(obj):
    # Simulate pydantic serialization: convert model to dict, respecting aliases
    d = {}
    for name, field in obj.model_fields.items():
        alias = field.alias or name
        d[alias] = getattr(obj, name)
    return d

# ------------------- UNIT TESTS -------------------

# 1. BASIC TEST CASES

def test_basic_only_required_field():
    # Only required field 'model' is set
    ats = AudioTranscriptionRequestStream(model="test-model")
    codeflash_output = ats.serialize_model(dummy_handler); result = codeflash_output # 13.3μs -> 11.0μs (21.3% faster)



def test_basic_stream_alias():
    # 'STREAM' field should appear as 'stream' in output
    ats = AudioTranscriptionRequestStream(model="foo")
    codeflash_output = ats.serialize_model(dummy_handler); result = codeflash_output # 12.9μs -> 11.0μs (17.2% faster)

# 2. EDGE TEST CASES




def test_edge_timestamp_granularities_empty_and_none():
    # timestamp_granularities=None should not appear
    ats = AudioTranscriptionRequestStream(model="test", timestamp_granularities=None)
    codeflash_output = ats.serialize_model(dummy_handler); result = codeflash_output # 12.9μs -> 10.7μs (20.1% faster)

    # timestamp_granularities set to empty list should appear
    ats2 = AudioTranscriptionRequestStream(model="test", timestamp_granularities=[])
    codeflash_output = ats2.serialize_model(dummy_handler); result2 = codeflash_output # 6.40μs -> 5.26μs (21.7% faster)

def test_edge_stream_false_raises():
    # STREAM must be True due to validate_const
    with pytest.raises(ValueError):
        AudioTranscriptionRequestStream(model="test", STREAM=False)

def test_edge_unset_fields_not_included():
    # All optional fields left unset should not appear in output
    ats = AudioTranscriptionRequestStream(model="test")
    codeflash_output = ats.serialize_model(dummy_handler); result = codeflash_output # 10.2μs -> 8.62μs (18.3% faster)
    optional_keys = {"file", "file_url", "file_id", "language", "temperature", "timestamp_granularities"}
    for k in optional_keys:
        pass


def test_edge_required_field_missing_raises():
    # model is required; omitting it should raise a ValidationError
    with pytest.raises(pydantic.ValidationError):
        AudioTranscriptionRequestStream()

def test_edge_stream_alias_and_inheritance():
    # Test that 'stream' alias is always present and correct
    ats = AudioTranscriptionRequestStream(model="test")
    codeflash_output = ats.serialize_model(dummy_handler); result = codeflash_output # 12.6μs -> 10.6μs (19.2% faster)






#------------------------------------------------
from __future__ import annotations

from typing import List, Literal, Optional

import pydantic
# imports
import pytest  # used for our unit tests
from mistralai.models.audiotranscriptionrequeststream import \
    AudioTranscriptionRequestStream
from mistralai.models.file import File
from mistralai.models.timestampgranularity import TimestampGranularity
from mistralai.types import UNSET, UNSET_SENTINEL, BaseModel, OptionalNullable
from mistralai.utils import (FieldMetadata, MultipartFormMetadata,
                             validate_const)
from pydantic import model_serializer
from pydantic.functional_validators import AfterValidator
from typing_extensions import Annotated

# unit tests

# Helper function for model_serializer handler
def passthrough_handler(obj):
    # Simulate Pydantic's dict serialization
    return obj.model_dump(by_alias=True, exclude_unset=False)

# Helper for dummy File and TimestampGranularity
class DummyFile:
    def __init__(self, name):
        self.name = name
    def model_dump(self, **kwargs):
        return {"name": self.name}

class DummyTimestampGranularity:
    def __init__(self, value):
        self.value = value
    def model_dump(self, **kwargs):
        return self.value

# Patch File and TimestampGranularity for test purposes
AudioTranscriptionRequestStream.__annotations__['file'] = Optional[DummyFile]
AudioTranscriptionRequestStream.__annotations__['timestamp_granularities'] = Optional[List[DummyTimestampGranularity]]

# ----------- BASIC TEST CASES ------------

def test_serialize_model_basic_required_only():
    # Only required field (model) provided
    obj = AudioTranscriptionRequestStream(model="whisper-1")
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 38.6μs -> 35.1μs (9.99% faster)


def test_serialize_model_basic_some_optional_fields():
    # Some optional fields set, some left unset
    obj = AudioTranscriptionRequestStream(
        model="whisper-1",
        file_url="https://example.com/audio.mp3"
    )
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 36.9μs -> 33.6μs (9.91% faster)
    # file, file_id, language, temperature, timestamp_granularities should not be present
    for k in ["file", "file_id", "language", "temperature", "timestamp_granularities"]:
        pass

# ----------- EDGE TEST CASES ------------

def test_serialize_model_edge_none_values():
    # Explicitly set nullable fields to None
    obj = AudioTranscriptionRequestStream(
        model="whisper-1",
        file=None,
        file_url=None,
        file_id=None,
        language=None,
        temperature=None,
        timestamp_granularities=None
    )
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 18.1μs -> 14.6μs (23.6% faster)


def test_serialize_model_edge_empty_strings_and_lists():
    # Set some fields to empty string or empty list
    obj = AudioTranscriptionRequestStream(
        model="whisper-1",
        file_url="",
        file_id="",
        language="",
        temperature=0.0,
        timestamp_granularities=[]
    )
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 24.2μs -> 20.4μs (19.0% faster)

def test_serialize_model_edge_false_and_zero_values():
    # Set temperature to 0 and check it's serialized
    obj = AudioTranscriptionRequestStream(
        model="whisper-1",
        temperature=0
    )
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 31.4μs -> 28.5μs (10.2% faster)

def test_serialize_model_edge_stream_alias():
    # The STREAM field uses alias "stream"
    obj = AudioTranscriptionRequestStream(model="whisper-1")
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 30.3μs -> 26.6μs (14.0% faster)

def test_serialize_model_edge_timestamp_granularities_none():
    # timestamp_granularities explicitly None
    obj = AudioTranscriptionRequestStream(model="whisper-1", timestamp_granularities=None)
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 29.2μs -> 26.3μs (11.0% faster)

def test_serialize_model_edge_timestamp_granularities_empty():
    # timestamp_granularities empty list
    obj = AudioTranscriptionRequestStream(model="whisper-1", timestamp_granularities=[])
    codeflash_output = obj.serialize_model(passthrough_handler); result = codeflash_output # 29.7μs -> 26.4μs (12.4% faster)

# ----------- LARGE SCALE TEST CASES ------------




def test_serialize_model_mutation_required_field_missing():
    # If model is missing, should raise ValidationError
    with pytest.raises(pydantic.ValidationError):
        AudioTranscriptionRequestStream()

def test_serialize_model_mutation_wrong_type():
    # If wrong type for temperature, should raise ValidationError
    with pytest.raises(pydantic.ValidationError):
        AudioTranscriptionRequestStream(model="whisper-1", temperature="hot")

def test_serialize_model_mutation_wrong_type_timestamp_granularities():
    # If timestamp_granularities is not a list, should raise ValidationError
    with pytest.raises(pydantic.ValidationError):
        AudioTranscriptionRequestStream(model="whisper-1", timestamp_granularities="segment")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-AudioTranscriptionRequestStream.serialize_model-mh4lea0f and push.

Codeflash

The optimized code achieves a 14% speedup by making three key data structure optimizations:

1. **Lists to Sets conversion**: Changed `optional_fields`, `nullable_fields`, and `null_default_fields` from lists to sets. This converts membership tests (`k in optional_fields`) from O(n) linear searches to O(1) constant-time lookups, which is particularly beneficial when checking multiple field memberships in the loop.

2. **Direct set membership test**: Replaced `self.__pydantic_fields_set__.intersection({n})` with `n in self.__pydantic_fields_set__`. The original creates a temporary set and performs an intersection operation for each field, while the optimized version uses a direct membership test - much more efficient for single-element checks.

3. **Removed unnecessary pop operation**: Eliminated `serialized.pop(k, None)` since the popped value wasn't used. This removes an O(n) dictionary operation per field iteration.

The test results show consistent improvements across all scenarios, with particularly strong gains (20-24%) in cases with multiple optional/nullable fields where the membership tests are performed more frequently. These optimizations are most effective for models with many optional fields, as the serialization logic performs multiple containment checks per field during the iteration loop.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 24, 2025 08:32
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant