Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 5% (0.05x) speedup for UniversalBaseModel.json in src/cohere/core/pydantic_utilities.py

⏱️ Runtime : 6.01 milliseconds 5.71 milliseconds (best of 76 runs)

📝 Explanation and details

The optimization introduces two key performance improvements:

1. Cached default kwargs dictionary: Instead of recreating {"by_alias": True, "exclude_unset": True} on every call, the optimized version pre-creates this as a module-level constant _DEFAULT_KWARGS. This eliminates repeated dictionary allocation and initialization.

2. Fast path for common case: When no custom kwargs are provided (the most frequent scenario), the code bypasses dictionary merging entirely and directly calls the underlying Pydantic method with the cached defaults. This avoids the {**_DEFAULT_KWARGS, **kwargs} merge operation.

Why this leads to speedup: Dictionary operations in Python have overhead - both the allocation of new dict objects and the unpacking/merging process. The line profiler shows the original version spent significant time (5.5% + 2.3% + 2.4% + 2.2% = 12.4%) just on dictionary construction. The optimized version reduces this to near-zero for the common case.

Test case performance patterns: The optimization shows strongest gains (6-7%) for simple models with no custom parameters (like test_large_scale_many_instances, test_edge_all_optional_unset), where the fast path is taken most frequently. Cases requiring kwargs merging show minimal or slight regression, but these represent the minority of real-world usage patterns.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2089 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 75.0%
🌀 Generated Regression Tests and Runtime
import datetime as dt
import sys
from typing import Any, ClassVar

import pydantic
# imports
import pytest  # used for our unit tests
from cohere.core.pydantic_utilities import UniversalBaseModel
from typing_extensions import TypeAlias

# --- Function to test: UniversalBaseModel.json ---
# Copied from above, minimal external dependency
IS_PYDANTIC_V2 = pydantic.VERSION.startswith("2.")

def serialize_datetime(dt_obj: dt.datetime) -> str:
    # Simulate the serialize_datetime function from cohere.core.datetime_utils
    # ISO format with Z if UTC
    if dt_obj.tzinfo is not None and dt_obj.tzinfo.utcoffset(dt_obj) == dt.timedelta(0):
        return dt_obj.replace(tzinfo=None).isoformat() + "Z"
    return dt_obj.isoformat()
from cohere.core.pydantic_utilities import UniversalBaseModel

# --- Unit tests ---

# Helper model for basic tests
class Person(UniversalBaseModel):
    name: str
    age: int

# Helper model with alias and optional fields
class AliasModel(UniversalBaseModel):
    first_name: str
    last_name: str
    nickname: str = None

    class Config:
        fields = {'first_name': {'alias': 'firstName'}, 'last_name': {'alias': 'lastName'}}

# Helper model with datetime field
class Event(UniversalBaseModel):
    event_name: str
    event_time: dt.datetime

# Helper model for edge cases (empty, all optional)
class EmptyModel(UniversalBaseModel):
    pass

class OptionalModel(UniversalBaseModel):
    foo: int = None
    bar: str = None

# Helper model for nested models
class Address(UniversalBaseModel):
    street: str
    city: str

class User(UniversalBaseModel):
    username: str
    address: Address

# Helper model for large scale
def create_large_model_class(n_fields):
    # Dynamically create a model with n_fields integer fields
    namespace = {f"field_{i}": (int, ...) for i in range(n_fields)}
    return type("LargeModel", (UniversalBaseModel,), namespace)

# ---- BASIC TEST CASES ----

def test_basic_json_serialization():
    # Test basic serialization of required fields
    p = Person(name="Alice", age=30)
    codeflash_output = p.json(); json_str = codeflash_output # 9.78μs -> 9.99μs (2.06% slower)

def test_basic_exclude_unset():
    # Test that unset optional fields are excluded
    a = AliasModel(first_name="Bob", last_name="Smith")
    codeflash_output = a.json(); json_str = codeflash_output # 8.19μs -> 8.20μs (0.085% slower)

def test_basic_with_set_optional():
    # Optional field set, should appear
    a = AliasModel(first_name="Bob", last_name="Smith", nickname="Bobby")
    codeflash_output = a.json(); json_str = codeflash_output # 8.36μs -> 8.39μs (0.429% slower)

def test_basic_datetime_serialization():
    # Should serialize datetime using custom encoder
    dt_obj = dt.datetime(2023, 1, 1, 12, 0, 0, tzinfo=dt.timezone.utc)
    e = Event(event_name="Lunch", event_time=dt_obj)
    codeflash_output = e.json(); json_str = codeflash_output # 11.9μs -> 11.5μs (3.30% faster)


def test_edge_empty_model():
    # Model with no fields should serialize to {}
    e = EmptyModel()
    codeflash_output = e.json(); json_str = codeflash_output # 6.05μs -> 5.95μs (1.78% faster)

def test_edge_all_optional_unset():
    # All optional, none set, should be {}
    o = OptionalModel()
    codeflash_output = o.json(); json_str = codeflash_output # 6.52μs -> 6.14μs (6.19% faster)

def test_edge_all_optional_set():
    # All optional, all set, should include fields
    o = OptionalModel(foo=42, bar="baz")
    codeflash_output = o.json(); json_str = codeflash_output # 8.31μs -> 8.42μs (1.22% slower)

def test_edge_alias_and_by_alias_false():
    # by_alias False disables aliasing
    a = AliasModel(first_name="Bob", last_name="Smith", nickname="Bobby")
    codeflash_output = a.json(by_alias=False); json_str = codeflash_output # 8.04μs -> 8.20μs (1.95% slower)

def test_edge_exclude_unset_false():
    # exclude_unset False includes unset optionals as null
    a = AliasModel(first_name="Bob", last_name="Smith")
    codeflash_output = a.json(exclude_unset=False); json_str = codeflash_output # 8.75μs -> 8.98μs (2.61% slower)

def test_edge_datetime_non_utc():
    # Non-UTC datetime should not have Z
    dt_obj = dt.datetime(2023, 1, 1, 12, 0, 0)
    e = Event(event_name="Lunch", event_time=dt_obj)
    codeflash_output = e.json(); json_str = codeflash_output # 8.91μs -> 9.43μs (5.44% slower)

def test_edge_nested_exclude_unset_false():
    # Nested model, exclude_unset False includes unset optionals in nested
    class NestedOptional(UniversalBaseModel):
        address: Address = None
    n = NestedOptional()
    codeflash_output = n.json(exclude_unset=False); json_str = codeflash_output

def test_edge_invalid_type_raises():
    # Should raise error if invalid type passed
    with pytest.raises(pydantic.ValidationError):
        Person(name=123, age="thirty")

def test_edge_custom_kwargs():
    # Should allow custom kwargs to pass through
    p = Person(name="Alice", age=30)
    codeflash_output = p.json(indent=2); json_str = codeflash_output # 14.4μs -> 15.0μs (4.07% slower)

# ---- LARGE SCALE TEST CASES ----


def test_large_scale_many_instances():
    # Serialize many instances in a loop
    N = 100
    results = []
    for i in range(N):
        p = Person(name=f"Person{i}", age=i)
        codeflash_output = p.json(); json_str = codeflash_output # 255μs -> 239μs (6.62% faster)
        results.append(json_str)


def test_large_scale_datetime_list():
    # Large list of datetimes
    class DateListModel(UniversalBaseModel):
        dates: list[dt.datetime]
    N = 200
    dts = [dt.datetime(2022, 1, 1, i % 24, 0, 0, tzinfo=dt.timezone.utc) for i in range(N)]
    m = DateListModel(dates=dts)
    codeflash_output = m.json(); json_str = codeflash_output # 73.6μs -> 77.0μs (4.32% slower)
    # All datetimes should be serialized as Z
    for i in range(N):
        pass


#------------------------------------------------
import datetime as dt
import sys
from typing import Any, ClassVar

import pydantic
# imports
import pytest
from cohere.core.pydantic_utilities import UniversalBaseModel
from typing_extensions import TypeAlias

# function to test
# (The code block below is the provided function implementation)

# This file was auto-generated by Fern from our API Definition.

# nopycln: file


def serialize_datetime(value: dt.datetime) -> str:
    """Serialize a datetime object to ISO format with Z if UTC."""
    if value.tzinfo is not None and value.tzinfo.utcoffset(value) == dt.timedelta(0):
        return value.replace(tzinfo=None).isoformat() + "Z"
    return value.isoformat()

IS_PYDANTIC_V2 = pydantic.VERSION.startswith("2.")
from cohere.core.pydantic_utilities import UniversalBaseModel

if IS_PYDANTIC_V2:
    class V2RootModel(UniversalBaseModel, pydantic.RootModel):  # type: ignore[misc, name-defined, type-arg]
        pass
else:
    UniversalRootModel: TypeAlias = UniversalBaseModel  # type: ignore[misc, no-redef]

# unit tests

# Helper model for testing
class TestModel(UniversalBaseModel):
    id: int
    name: str
    created: dt.datetime = None
    value: float = None

    class Config:
        # Alias for 'name'
        fields = {'name': {'alias': 'username'}}

@pytest.mark.parametrize(
    "model_kwargs, expected_json, description",
    [
        # Basic: Only required fields
        (
            {"id": 1, "name": "Alice"},
            '{"id":1,"username":"Alice"}',
            "Basic: Only required fields, no optional"
        ),
        # Basic: All fields set
        (
            {"id": 2, "name": "Bob", "created": dt.datetime(2020, 1, 1, 12, 0), "value": 42.5},
            '{"id":2,"username":"Bob","created":"2020-01-01T12:00:00","value":42.5}',
            "Basic: All fields set"
        ),
        # Basic: Optional fields set to None (should be excluded)
        (
            {"id": 3, "name": "Charlie", "created": None, "value": None},
            '{"id":3,"username":"Charlie"}',
            "Basic: Optional fields set to None"
        ),
        # Edge: Datetime with UTC tzinfo
        (
            {"id": 4, "name": "UTCUser", "created": dt.datetime(2021, 5, 4, 10, 30, tzinfo=dt.timezone.utc)},
            '{"id":4,"username":"UTCUser","created":"2021-05-04T10:30:00Z"}',
            "Edge: Datetime with UTC tzinfo"
        ),
        # Edge: Datetime with non-UTC tzinfo
        (
            {"id": 5, "name": "TZUser", "created": dt.datetime(2021, 5, 4, 10, 30, tzinfo=dt.timezone(dt.timedelta(hours=2)))},
            '{"id":5,"username":"TZUser","created":"2021-05-04T10:30:00+02:00"}',
            "Edge: Datetime with non-UTC tzinfo"
        ),
        # Edge: Large float value
        (
            {"id": 6, "name": "BigFloat", "value": 1e100},
            '{"id":6,"username":"BigFloat","value":1e+100}',
            "Edge: Large float value"
        ),
        # Edge: Empty string name
        (
            {"id": 7, "name": ""},
            '{"id":7,"username":""}',
            "Edge: Empty string name"
        ),
        # Edge: Zero value float
        (
            {"id": 8, "name": "ZeroFloat", "value": 0.0},
            '{"id":8,"username":"ZeroFloat","value":0.0}',
            "Edge: Zero value float"
        ),
        # Edge: Negative id
        (
            {"id": -1, "name": "Negative"},
            '{"id":-1,"username":"Negative"}',
            "Edge: Negative id"
        ),
        # Edge: Unicode in name
        (
            {"id": 9, "name": "ユニコード"},
            '{"id":9,"username":"ユニコード"}',
            "Edge: Unicode in name"
        ),
    ]
)
def test_json_basic_and_edge(model_kwargs, expected_json, description):
    """Test basic and edge cases for the json method."""
    model = TestModel(**model_kwargs)
    codeflash_output = model.json(); result = codeflash_output # 101μs -> 102μs (0.675% slower)

def test_json_exclude_unset_behavior():
    """Test that unset optional fields are excluded from JSON."""
    model = TestModel(id=10, name="UnsetFields")
    codeflash_output = model.json(); result = codeflash_output # 8.32μs -> 8.07μs (3.10% faster)

def test_json_by_alias_true():
    """Test that alias is used in output JSON."""
    model = TestModel(id=11, name="AliasTest")
    codeflash_output = model.json(); result = codeflash_output # 8.00μs -> 7.66μs (4.47% faster)

def test_json_by_alias_false():
    """Test that by_alias=False disables alias in output JSON."""
    model = TestModel(id=12, name="AliasTest")
    codeflash_output = model.json(by_alias=False); result = codeflash_output # 7.91μs -> 8.10μs (2.34% slower)

def test_json_exclude_unset_false():
    """Test that exclude_unset=False includes unset fields as null."""
    model = TestModel(id=13, name="UnsetFalse")
    codeflash_output = model.json(exclude_unset=False); result = codeflash_output # 10.5μs -> 10.3μs (1.65% faster)

def test_json_extra_kwargs():
    """Test that extra kwargs are passed through."""
    model = TestModel(id=14, name="ExtraKwargs", value=3.14)
    # indent for pretty printing
    codeflash_output = model.json(indent=2); result = codeflash_output # 10.6μs -> 10.7μs (1.29% slower)

def test_json_invalid_kwargs():
    """Test that invalid kwargs raise TypeError."""
    model = TestModel(id=15, name="InvalidKwarg")
    with pytest.raises(TypeError):
        # pydantic will raise for unknown kwarg
        model.json(unknown_kwarg=True) # 2.57μs -> 2.70μs (4.86% slower)

def test_json_datetime_serialization():
    """Test that datetime is serialized correctly."""
    dt_obj = dt.datetime(2022, 2, 2, 2, 2, 2)
    model = TestModel(id=16, name="DTSer", created=dt_obj)
    codeflash_output = model.json(); result = codeflash_output # 11.6μs -> 11.7μs (0.674% slower)

def test_json_datetime_utc_serialization():
    """Test that UTC datetime is serialized with Z."""
    dt_obj = dt.datetime(2022, 2, 2, 2, 2, 2, tzinfo=dt.timezone.utc)
    model = TestModel(id=17, name="DTSerUTC", created=dt_obj)
    codeflash_output = model.json(); result = codeflash_output # 11.1μs -> 10.9μs (1.39% faster)

def test_json_large_scale():
    """Test large scale: 1000 models serialized and checked."""
    models = [
        TestModel(id=i, name=f"user{i}", value=float(i))
        for i in range(1000)
    ]
    # Serialize all and check correctness
    for i, model in enumerate(models):
        codeflash_output = model.json(); result = codeflash_output # 2.60ms -> 2.47ms (5.57% faster)
        expected = f'{{"id":{i},"username":"user{i}","value":{float(i)}}}'

def test_json_large_scale_performance():
    """Test that serializing 1000 models is performant (under 2s)."""
    import time
    models = [
        TestModel(id=i, name=f"user{i}", value=float(i))
        for i in range(1000)
    ]
    start = time.time()
    for model in models:
        codeflash_output = model.json(); _ = codeflash_output # 2.58ms -> 2.43ms (5.99% faster)
    duration = time.time() - start

def test_json_root_model_v2_or_alias():
    """Test that V2RootModel or UniversalRootModel works as expected."""
    if IS_PYDANTIC_V2:
        root_model = V2RootModel([1, 2, 3])
        codeflash_output = root_model.json(); result = codeflash_output
    else:
        root_model = UniversalRootModel(id=1, name="RootAlias")
        codeflash_output = root_model.json(); result = codeflash_output

def test_json_empty_model():
    """Test behavior when all fields are optional and unset."""
    class EmptyModel(UniversalBaseModel):
        a: int = None
        b: str = None
    model = EmptyModel()
    codeflash_output = model.json(); result = codeflash_output # 9.54μs -> 9.16μs (4.06% faster)


def test_json_list_field():
    """Test serialization of list fields."""
    class ListModel(UniversalBaseModel):
        items: list[int]
    model = ListModel(items=[1, 2, 3])
    codeflash_output = model.json(); result = codeflash_output # 9.80μs -> 9.56μs (2.51% faster)

def test_json_dict_field():
    """Test serialization of dict fields."""
    class DictModel(UniversalBaseModel):
        mapping: dict[str, int]
    model = DictModel(mapping={"a": 1, "b": 2})
    codeflash_output = model.json(); result = codeflash_output # 10.2μs -> 10.1μs (1.35% faster)

def test_json_bool_field():
    """Test serialization of boolean fields."""
    class BoolModel(UniversalBaseModel):
        flag: bool
    model = BoolModel(flag=True)
    codeflash_output = model.json(); result = codeflash_output # 8.88μs -> 9.23μs (3.71% slower)

def test_json_int_edge_cases():
    """Test serialization of edge-case integer values."""
    class IntModel(UniversalBaseModel):
        value: int
    for val in [0, -1000, 2**31-1, -2**31]:
        model = IntModel(value=val)
        codeflash_output = model.json(); result = codeflash_output # 18.0μs -> 16.7μs (7.19% faster)

def test_json_float_edge_cases():
    """Test serialization of edge-case float values."""
    class FloatModel(UniversalBaseModel):
        value: float
    for val in [0.0, -1e10, 1e-10, float("inf"), float("-inf")]:
        model = FloatModel(value=val)
        codeflash_output = model.json(); result = codeflash_output # 19.8μs -> 18.4μs (7.94% faster)
        # inf/-inf are serialized as null by pydantic
        if val in [float("inf"), float("-inf")]:
            pass
        else:
            pass

def test_json_invalid_type():
    """Test that invalid types raise validation error."""
    with pytest.raises(pydantic.ValidationError):
        TestModel(id="not-an-int", name="BadType")

def test_json_missing_required():
    """Test that missing required fields raises error."""
    with pytest.raises(pydantic.ValidationError):
        TestModel(name="MissingID")

def test_json_large_nested_structure():
    """Test serialization of large nested structures."""
    class Node(UniversalBaseModel):
        value: int
        children: list["Node"] = []

    # Create a tree of depth 3, branching factor 3
    def make_tree(depth, value=0):
        if depth == 0:
            return Node(value=value)
        return Node(
            value=value,
            children=[make_tree(depth-1, value*10+i) for i in range(3)]
        )
    root = make_tree(3)
    codeflash_output = root.json(); result = codeflash_output # 78.8μs -> 80.4μs (2.00% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from cohere.core.pydantic_utilities import UniversalBaseModel

def test_UniversalBaseModel_json():
    UniversalBaseModel.json(UniversalBaseModel())
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_yxtehl4j/tmpnigorcgz/test_concolic_coverage.py::test_UniversalBaseModel_json 7.17μs 7.33μs -2.28%⚠️

To edit these changes git checkout codeflash/optimize-UniversalBaseModel.json-mgzpt3mk and push.

Codeflash

The optimization introduces two key performance improvements:

**1. Cached default kwargs dictionary**: Instead of recreating `{"by_alias": True, "exclude_unset": True}` on every call, the optimized version pre-creates this as a module-level constant `_DEFAULT_KWARGS`. This eliminates repeated dictionary allocation and initialization.

**2. Fast path for common case**: When no custom kwargs are provided (the most frequent scenario), the code bypasses dictionary merging entirely and directly calls the underlying Pydantic method with the cached defaults. This avoids the `{**_DEFAULT_KWARGS, **kwargs}` merge operation.

**Why this leads to speedup**: Dictionary operations in Python have overhead - both the allocation of new dict objects and the unpacking/merging process. The line profiler shows the original version spent significant time (5.5% + 2.3% + 2.4% + 2.2% = 12.4%) just on dictionary construction. The optimized version reduces this to near-zero for the common case.

**Test case performance patterns**: The optimization shows strongest gains (6-7%) for simple models with no custom parameters (like `test_large_scale_many_instances`, `test_edge_all_optional_unset`), where the fast path is taken most frequently. Cases requiring kwargs merging show minimal or slight regression, but these represent the minority of real-world usage patterns.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 20, 2025 22:36
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant