Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 30, 2025

📄 16% (0.16x) speedup for construct_type_unchecked in src/openai/_models.py

⏱️ Runtime : 29.1 milliseconds 25.2 milliseconds (best of 110 runs)

📝 Explanation and details

The optimized code achieves a 15% speedup through several key micro-optimizations that reduce redundant operations and function calls:

1. Eliminated redundant get_args() calls: The original code called get_args(type_) multiple times for dict processing (_, items_type = get_args(type_)). The optimized version stores the result once and directly accesses items_type = args[1], avoiding repeated tuple unpacking.

2. Added fast-path for empty containers: For both dict and list processing, the optimized code checks if not value: and returns empty containers immediately ({} or []), avoiding unnecessary comprehension overhead for empty inputs. This is particularly effective as shown in test cases like test_empty_dict() (15.3% faster) and test_empty_list() (12.8% faster).

3. Optimized model construction logic: Instead of repeatedly calling getattr(type_, "construct", None) within comprehensions, the optimized code fetches the construct method once and reuses it. It also reordered the expensive is_literal_type() check after the cheaper inspect.isclass() check.

4. Reduced attribute lookups: By caching function references and avoiding repeated dictionary/tuple access patterns, the code minimizes Python's attribute resolution overhead.

These optimizations are most effective for large-scale data processing scenarios (17-21% speedup on large lists/dicts with 1000+ elements) and container-heavy workloads where dict/list construction dominates runtime. The improvements are consistent across nested structures, making this particularly valuable for API response parsing and data serialization tasks typical in the OpenAI library.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 78 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from datetime import date, datetime, timezone
from typing import (Annotated, Any, Dict, List, Literal, Optional, Type,
                    TypeVar, Union)

# imports
import pytest
from openai._models import construct_type_unchecked
from pydantic import BaseModel

# function to test (from above)
_T = TypeVar("_T")
from openai._models import construct_type_unchecked

# ------------------- UNIT TESTS -------------------

# 1. Basic Test Cases
def test_int_to_float_basic():
    # Basic: int to float conversion
    codeflash_output = construct_type_unchecked(value=5, type_=float) # 8.17μs -> 7.28μs (12.3% faster)

def test_float_to_float_basic():
    # Basic: float to float, should be unchanged
    codeflash_output = construct_type_unchecked(value=3.14, type_=float) # 7.02μs -> 6.04μs (16.1% faster)

def test_list_of_ints_basic():
    # Basic: list of ints to list of floats
    codeflash_output = construct_type_unchecked(value=[1, 2, 3], type_=List[float]) # 16.6μs -> 15.3μs (8.38% faster)

def test_dict_basic():
    # Basic: dict of str->int to dict of str->float
    codeflash_output = construct_type_unchecked(value={"a": 1, "b": 2}, type_=Dict[str, float]) # 15.4μs -> 13.9μs (11.0% faster)

def test_literal_basic():
    # Basic: Literal type
    codeflash_output = construct_type_unchecked(value="foo", type_=Literal["foo", "bar"]) # 6.80μs -> 6.26μs (8.74% faster)
    codeflash_output = construct_type_unchecked(value="bar", type_=Literal["foo", "bar"]) # 2.79μs -> 2.69μs (3.91% faster)

def test_annotated_basic():
    # Basic: Annotated type (should act as its base type)
    codeflash_output = construct_type_unchecked(value=42, type_=Annotated[int, "meta"]) # 9.66μs -> 8.62μs (12.1% faster)


def test_union_basic():
    # Basic: Union type
    codeflash_output = construct_type_unchecked(value=1, type_=Union[int, str]) # 14.3μs -> 13.4μs (6.83% faster)
    codeflash_output = construct_type_unchecked(value="abc", type_=Union[int, str]) # 4.86μs -> 4.65μs (4.65% faster)

def test_datetime_basic():
    # Basic: datetime from string
    dt_str = "2020-01-01T12:00:00Z"
    codeflash_output = construct_type_unchecked(value=dt_str, type_=datetime); result = codeflash_output # 25.0μs -> 23.5μs (6.47% faster)

def test_date_basic():
    # Basic: date from string
    d_str = "2020-01-01"
    codeflash_output = construct_type_unchecked(value=d_str, type_=date); result = codeflash_output # 15.4μs -> 15.4μs (0.364% faster)

# 2. Edge Test Cases

def test_int_to_float_edge():
    # Edge: int that cannot be exactly represented as float
    val = 2**53 + 1
    codeflash_output = construct_type_unchecked(value=val, type_=float); result = codeflash_output # 8.12μs -> 7.84μs (3.49% faster)

def test_list_wrong_type_edge():
    # Edge: value is not a list, but type is list
    codeflash_output = construct_type_unchecked(value="notalist", type_=List[int]); result = codeflash_output # 7.37μs -> 6.96μs (5.84% faster)

def test_dict_wrong_type_edge():
    # Edge: value is not a dict, but type is dict
    codeflash_output = construct_type_unchecked(value="notadict", type_=Dict[str, int]); result = codeflash_output # 7.66μs -> 7.71μs (0.675% slower)

def test_literal_invalid_edge():
    # Edge: value not in Literal options
    codeflash_output = construct_type_unchecked(value="baz", type_=Literal["foo", "bar"]); result = codeflash_output # 6.28μs -> 6.03μs (4.26% faster)

def test_union_no_match_edge():
    # Edge: value does not match any union variant
    codeflash_output = construct_type_unchecked(value=None, type_=Union[int, str]); result = codeflash_output # 51.0μs -> 49.0μs (4.05% faster)

def test_datetime_invalid_edge():
    # Edge: invalid datetime string
    codeflash_output = construct_type_unchecked(value="notadatetime", type_=datetime); result = codeflash_output # 12.4μs -> 11.9μs (4.90% faster)

def test_date_invalid_edge():
    # Edge: invalid date string
    codeflash_output = construct_type_unchecked(value="notadate", type_=date); result = codeflash_output # 11.1μs -> 10.7μs (3.39% faster)


def test_nested_list_dict_edge():
    # Edge: deeply nested list/dict
    val = [{"a": [1, 2]}, {"a": [3, 4]}]
    type_ = List[Dict[str, List[float]]]
    codeflash_output = construct_type_unchecked(value=val, type_=type_); result = codeflash_output # 35.2μs -> 32.8μs (7.23% faster)

def test_annotated_with_metadata_edge():
    # Edge: Annotated type with metadata
    codeflash_output = construct_type_unchecked(value=123, type_=Annotated[int, "meta", "moremeta"]) # 9.32μs -> 8.28μs (12.6% faster)

# 3. Large Scale Test Cases

def test_large_list_of_ints():
    # Large: list of 1000 ints to floats
    vals = list(range(1000))
    codeflash_output = construct_type_unchecked(value=vals, type_=List[float]); result = codeflash_output # 2.07ms -> 1.75ms (18.1% faster)

def test_large_dict_of_ints():
    # Large: dict of 1000 str->int to str->float
    vals = {str(i): i for i in range(1000)}
    codeflash_output = construct_type_unchecked(value=vals, type_=Dict[str, float]); result = codeflash_output # 2.12ms -> 1.80ms (17.4% faster)

def test_large_nested_structure():
    # Large: nested structure (list of dicts of lists)
    vals = [{"a": list(range(10))} for _ in range(100)]
    type_ = List[Dict[str, List[float]]]
    codeflash_output = construct_type_unchecked(value=vals, type_=type_); result = codeflash_output # 2.56ms -> 2.18ms (17.0% faster)



def test_none_type():
    # Edge: value is None, type is Optional[int]
    codeflash_output = construct_type_unchecked(value=None, type_=Optional[int]) # 13.5μs -> 12.8μs (5.60% faster)

def test_empty_list():
    # Edge: empty list
    codeflash_output = construct_type_unchecked(value=[], type_=List[int]) # 8.36μs -> 7.41μs (12.8% faster)

def test_empty_dict():
    # Edge: empty dict
    codeflash_output = construct_type_unchecked(value={}, type_=Dict[str, int]) # 8.67μs -> 7.52μs (15.3% faster)

def test_list_of_none():
    # Edge: list of None values
    codeflash_output = construct_type_unchecked(value=[None, None], type_=List[Optional[int]]); result = codeflash_output # 17.3μs -> 16.6μs (4.31% faster)

def test_dict_of_none():
    # Edge: dict of None values
    codeflash_output = construct_type_unchecked(value={"a": None, "b": None}, type_=Dict[str, Optional[int]]); result = codeflash_output # 16.3μs -> 15.5μs (5.02% faster)

def test_nested_union():
    # Edge: nested union types
    type_ = Union[List[int], Dict[str, int]]
    codeflash_output = construct_type_unchecked(value=[1, 2, 3], type_=type_) # 10.9μs -> 10.4μs (4.78% faster)
    codeflash_output = construct_type_unchecked(value={"a": 1}, type_=type_) # 5.80μs -> 6.29μs (7.79% slower)

def test_union_with_literal():
    # Edge: Union with Literal
    type_ = Union[Literal["foo"], int]
    codeflash_output = construct_type_unchecked(value="foo", type_=type_) # 10.2μs -> 9.72μs (5.08% faster)
    codeflash_output = construct_type_unchecked(value=42, type_=type_) # 5.21μs -> 5.06μs (3.00% faster)

def test_datetime_from_unix_timestamp():
    # Edge: datetime from unix timestamp
    ts = 1609459200  # 2021-01-01T00:00:00Z
    codeflash_output = construct_type_unchecked(value=ts, type_=datetime); result = codeflash_output # 18.7μs -> 18.5μs (1.34% faster)

def test_date_from_unix_timestamp():
    # Edge: date from unix timestamp
    ts = 1609459200  # 2021-01-01T00:00:00Z
    codeflash_output = construct_type_unchecked(value=ts, type_=date); result = codeflash_output # 13.6μs -> 12.7μs (6.81% faster)


#------------------------------------------------
from datetime import date, datetime
from typing import (Annotated, Any, Dict, Generic, List, Literal, Optional,
                    TypeVar, Union)

# imports
import pytest
from openai._models import construct_type_unchecked
from typing_extensions import TypeAliasType

# --- Test Models for Pydantic scenarios ---
try:
    import pydantic

    class SimpleModel(pydantic.BaseModel):
        a: int
        b: str

    class NestedModel(pydantic.BaseModel):
        x: int
        y: SimpleModel

    class DiscriminatedFoo(pydantic.BaseModel):
        kind: Literal["foo"]
        value: str

    class DiscriminatedBar(pydantic.BaseModel):
        kind: Literal["bar"]
        value: int

    DiscriminatedUnion = Union[DiscriminatedFoo, DiscriminatedBar]

except ImportError:
    SimpleModel = None
    NestedModel = None
    DiscriminatedFoo = None
    DiscriminatedBar = None
    DiscriminatedUnion = None

# --- Basic Test Cases ---
def test_basic_int():
    # Should return value unchanged for int type
    codeflash_output = construct_type_unchecked(value=5, type_=int) # 8.31μs -> 7.86μs (5.77% faster)

def test_basic_float_from_int():
    # Should coerce int to float
    codeflash_output = construct_type_unchecked(value=7, type_=float) # 7.19μs -> 7.15μs (0.588% faster)

def test_basic_float_from_float():
    # Should return float unchanged
    codeflash_output = construct_type_unchecked(value=3.14, type_=float) # 6.87μs -> 6.35μs (8.30% faster)

def test_basic_str():
    # Should return string unchanged
    codeflash_output = construct_type_unchecked(value="hello", type_=str) # 6.56μs -> 6.04μs (8.51% faster)

def test_basic_list_of_int():
    # Should construct list of ints
    codeflash_output = construct_type_unchecked(value=[1, 2, 3], type_=List[int]) # 16.1μs -> 15.3μs (5.26% faster)

def test_basic_dict_of_str_int():
    # Should construct dict with int values
    codeflash_output = construct_type_unchecked(value={"a": 1, "b": 2}, type_=Dict[str, int]) # 15.1μs -> 14.4μs (4.42% faster)

def test_basic_union_int_str():
    # Should match first type in Union
    codeflash_output = construct_type_unchecked(value=42, type_=Union[int, str]) # 11.1μs -> 10.9μs (1.77% faster)
    codeflash_output = construct_type_unchecked(value="test", type_=Union[int, str]) # 4.87μs -> 4.57μs (6.61% faster)

def test_basic_literal():
    # Should accept only values in Literal
    codeflash_output = construct_type_unchecked(value="foo", type_=Literal["foo", "bar"]) # 6.24μs -> 5.72μs (9.09% faster)
    codeflash_output = construct_type_unchecked(value="bar", type_=Literal["foo", "bar"]) # 2.79μs -> 2.68μs (4.03% faster)
    codeflash_output = construct_type_unchecked(value="baz", type_=Literal["foo", "bar"]) # 2.24μs -> 2.08μs (7.84% faster)

def test_basic_annotated():
    # Should unwrap Annotated and treat as base type
    codeflash_output = construct_type_unchecked(value=123, type_=Annotated[int, "meta"]) # 9.46μs -> 8.52μs (11.1% faster)


def test_edge_empty_list():
    # Should handle empty list
    codeflash_output = construct_type_unchecked(value=[], type_=List[int]) # 7.56μs -> 6.70μs (12.8% faster)

def test_edge_empty_dict():
    # Should handle empty dict
    codeflash_output = construct_type_unchecked(value={}, type_=Dict[str, int]) # 8.01μs -> 6.73μs (19.0% faster)

def test_edge_list_wrong_type():
    # Should return value as-is if not a list
    codeflash_output = construct_type_unchecked(value="notalist", type_=List[int]) # 6.73μs -> 6.40μs (5.14% faster)

def test_edge_dict_wrong_type():
    # Should return value as-is if not a dict
    codeflash_output = construct_type_unchecked(value="notadict", type_=Dict[str, int]) # 6.91μs -> 6.42μs (7.50% faster)


def test_edge_float_from_non_int():
    # Should return value unchanged if not int or float
    codeflash_output = construct_type_unchecked(value="notanumber", type_=float) # 9.31μs -> 8.31μs (12.1% faster)

def test_edge_datetime_from_str():
    # Should parse valid ISO datetime string
    codeflash_output = construct_type_unchecked(value="2022-01-01T12:34:56Z", type_=datetime); dt = codeflash_output # 24.1μs -> 23.8μs (0.902% faster)

def test_edge_datetime_from_invalid_str():
    # Should return value as-is if string is not valid datetime
    codeflash_output = construct_type_unchecked(value="notadatetime", type_=datetime) # 11.7μs -> 11.1μs (5.47% faster)

def test_edge_date_from_str():
    # Should parse valid ISO date string
    codeflash_output = construct_type_unchecked(value="2022-01-01", type_=date); d = codeflash_output # 15.4μs -> 15.1μs (2.47% faster)

def test_edge_date_from_invalid_str():
    # Should return value as-is if string is not valid date
    codeflash_output = construct_type_unchecked(value="notadate", type_=date) # 11.0μs -> 10.4μs (6.25% faster)

def test_edge_pydantic_model_construct():
    # Should construct pydantic model from dict
    if SimpleModel is not None:
        codeflash_output = construct_type_unchecked(value={"a": 1, "b": "x"}, type_=SimpleModel); obj = codeflash_output # 10.6μs -> 9.26μs (14.9% faster)

def test_edge_pydantic_model_nested():
    # Should construct nested pydantic model
    if NestedModel is not None:
        codeflash_output = construct_type_unchecked(value={"x": 5, "y": {"a": 2, "b": "y"}}, type_=NestedModel); obj = codeflash_output # 9.68μs -> 8.45μs (14.6% faster)

def test_edge_union_discriminated():
    # Should construct correct discriminated union variant
    if DiscriminatedUnion is not None:
        codeflash_output = construct_type_unchecked(value={"kind": "foo", "value": "abc"}, type_=DiscriminatedUnion); foo_obj = codeflash_output # 22.0μs -> 21.3μs (3.03% faster)
        codeflash_output = construct_type_unchecked(value={"kind": "bar", "value": 123}, type_=DiscriminatedUnion); bar_obj = codeflash_output # 6.45μs -> 6.34μs (1.72% faster)

def test_edge_optional_type():
    # Should handle Optional types (Union with None)
    codeflash_output = construct_type_unchecked(value=None, type_=Optional[int]) # 8.41μs -> 8.22μs (2.25% faster)
    codeflash_output = construct_type_unchecked(value=7, type_=Optional[int]) # 3.57μs -> 3.62μs (1.24% slower)

def test_edge_list_of_dicts():
    # Should construct list of dicts
    value = [{"a": 1}, {"a": 2}]
    codeflash_output = construct_type_unchecked(value=value, type_=List[Dict[str, int]]); result = codeflash_output # 22.6μs -> 21.2μs (6.51% faster)

def test_edge_list_of_lists():
    # Should construct list of lists
    value = [[1, 2], [3, 4]]
    codeflash_output = construct_type_unchecked(value=value, type_=List[List[int]]); result = codeflash_output # 23.1μs -> 20.9μs (10.1% faster)

def test_edge_dict_of_lists():
    # Should construct dict of lists
    value = {"x": [1, 2], "y": [3, 4]}
    codeflash_output = construct_type_unchecked(value=value, type_=Dict[str, List[int]]); result = codeflash_output # 24.1μs -> 21.7μs (11.4% faster)

def test_edge_list_of_optional():
    # Should handle list of Optional[int]
    value = [1, None, 3]
    codeflash_output = construct_type_unchecked(value=value, type_=List[Optional[int]]); result = codeflash_output # 19.1μs -> 17.7μs (7.47% faster)

def test_edge_dict_with_non_str_keys():
    # Should return value as-is if dict keys are not strings
    value = {1: "a", 2: "b"}
    codeflash_output = construct_type_unchecked(value=value, type_=Dict[str, str]); result = codeflash_output # 14.6μs -> 13.3μs (10.3% faster)

# --- Large Scale Test Cases ---
def test_large_list_of_ints():
    # Should handle large lists efficiently
    large_list = list(range(1000))
    codeflash_output = construct_type_unchecked(value=large_list, type_=List[int]); result = codeflash_output # 1.98ms -> 1.65ms (20.4% faster)

def test_large_dict_of_ints():
    # Should handle large dicts efficiently
    large_dict = {str(i): i for i in range(1000)}
    codeflash_output = construct_type_unchecked(value=large_dict, type_=Dict[str, int]); result = codeflash_output # 2.04ms -> 1.73ms (17.8% faster)

def test_large_nested_structure():
    # Should handle large nested structures
    large_nested = [{"a": i, "b": str(i)} for i in range(1000)]
    if SimpleModel is not None:
        codeflash_output = construct_type_unchecked(value=large_nested, type_=List[SimpleModel]); result = codeflash_output # 3.10ms -> 2.56ms (21.2% faster)
        for i, obj in enumerate(result):
            pass

def test_large_list_of_lists():
    # Should handle large list of lists
    large_list_of_lists = [[i for i in range(10)] for _ in range(100)]
    codeflash_output = construct_type_unchecked(value=large_list_of_lists, type_=List[List[int]]); result = codeflash_output # 2.21ms -> 1.91ms (15.7% faster)

def test_large_dict_of_lists():
    # Should handle large dict of lists
    large_dict_of_lists = {str(i): [i, i+1] for i in range(500)}
    codeflash_output = construct_type_unchecked(value=large_dict_of_lists, type_=Dict[str, List[int]]); result = codeflash_output # 3.16ms -> 2.75ms (14.9% faster)

def test_large_list_of_dicts():
    # Should handle large list of dicts
    large_list_of_dicts = [{"x": i, "y": i + 1} for i in range(500)]
    codeflash_output = construct_type_unchecked(value=large_list_of_dicts, type_=List[Dict[str, int]]); result = codeflash_output # 3.26ms -> 2.77ms (17.6% faster)


def test_large_list_of_optional():
    # Should handle large list of Optional[int]
    large_list = [i if i % 2 == 0 else None for i in range(1000)]
    codeflash_output = construct_type_unchecked(value=large_list, type_=List[Optional[int]]); result = codeflash_output # 2.03ms -> 1.98ms (2.30% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-construct_type_unchecked-mhd0u03v and push.

Codeflash Static Badge

The optimized code achieves a **15% speedup** through several key micro-optimizations that reduce redundant operations and function calls:

**1. Eliminated redundant `get_args()` calls**: The original code called `get_args(type_)` multiple times for dict processing (`_, items_type = get_args(type_)`). The optimized version stores the result once and directly accesses `items_type = args[1]`, avoiding repeated tuple unpacking.

**2. Added fast-path for empty containers**: For both dict and list processing, the optimized code checks `if not value:` and returns empty containers immediately (`{}` or `[]`), avoiding unnecessary comprehension overhead for empty inputs. This is particularly effective as shown in test cases like `test_empty_dict()` (15.3% faster) and `test_empty_list()` (12.8% faster).

**3. Optimized model construction logic**: Instead of repeatedly calling `getattr(type_, "construct", None)` within comprehensions, the optimized code fetches the construct method once and reuses it. It also reordered the expensive `is_literal_type()` check after the cheaper `inspect.isclass()` check.

**4. Reduced attribute lookups**: By caching function references and avoiding repeated dictionary/tuple access patterns, the code minimizes Python's attribute resolution overhead.

These optimizations are most effective for **large-scale data processing scenarios** (17-21% speedup on large lists/dicts with 1000+ elements) and **container-heavy workloads** where dict/list construction dominates runtime. The improvements are consistent across nested structures, making this particularly valuable for API response parsing and data serialization tasks typical in the OpenAI library.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 30, 2025 06:06
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant