Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 24, 2025

📄 25% (0.25x) speedup for validate_float in src/mistralai/utils/serializers.py

⏱️ Runtime : 2.25 milliseconds 1.80 milliseconds (best of 74 runs)

📝 Explanation and details

The optimization replaces isinstance(f, (float, Unset)) with isinstance(f, float) or f is Unset. This change achieves a 24% speedup by avoiding the costly tuple-based isinstance check.

Key Changes:

  • Split the isinstance check: Instead of checking if f is an instance of either float or Unset in a single call, the code now checks for float first, then uses identity comparison (is) for Unset.
  • Use identity comparison for Unset: Since Unset appears to be a singleton object (not a type), using f is Unset is both faster and more semantically correct than isinstance(f, Unset).

Why This is Faster:

  • Tuple isinstance overhead: isinstance(f, (float, Unset)) creates a tuple and performs more complex type checking logic internally.
  • Short-circuit evaluation: The or operator allows early exit when f is a float (the most common case), avoiding the Unset check entirely.
  • Identity vs isinstance: f is Unset is a simple pointer comparison, much faster than isinstance checking.

Performance Benefits by Test Case:

  • Float inputs: 21-28% faster (most common case benefits from short-circuiting)
  • String conversions: 25-69% faster (reduced overhead in the type checking path)
  • Invalid types: 49-60% faster (faster rejection path)
  • Large batches: 29-37% faster (consistent improvement across scale)

This optimization is particularly effective for workloads with many float inputs, as they benefit most from the short-circuit evaluation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 5481 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any

# imports
import pytest  # used for our unit tests
from mistralai.utils.serializers import validate_float


class Unset:
    """Dummy Unset class to mimic the original."""
    pass
from mistralai.utils.serializers import validate_float

# unit tests

# --- Basic Test Cases ---

def test_none_returns_none():
    # Test that None input returns None
    codeflash_output = validate_float(None) # 346ns -> 337ns (2.67% faster)

def test_float_returns_float():
    # Test that a float input returns the same float
    codeflash_output = validate_float(3.14) # 591ns -> 460ns (28.5% faster)


def test_valid_string_float():
    # Test that a valid string representing a float is converted correctly
    codeflash_output = validate_float("2.718") # 2.07μs -> 1.27μs (62.7% faster)

def test_valid_string_int():
    # Test that a valid string representing an int is converted to float
    codeflash_output = validate_float("42") # 1.54μs -> 1.01μs (53.1% faster)

def test_valid_string_with_spaces():
    # Test that a string with leading/trailing spaces is converted correctly
    codeflash_output = validate_float("  7.77  ") # 1.47μs -> 963ns (52.4% faster)

# --- Edge Test Cases ---

def test_invalid_type_raises():
    # Test that passing an int raises ValueError
    with pytest.raises(ValueError):
        validate_float(123) # 1.74μs -> 1.12μs (55.4% faster)

def test_invalid_type_list_raises():
    # Test that passing a list raises ValueError
    with pytest.raises(ValueError):
        validate_float([1.1, 2.2]) # 1.56μs -> 1.00μs (55.9% faster)

def test_invalid_type_dict_raises():
    # Test that passing a dict raises ValueError
    with pytest.raises(ValueError):
        validate_float({'a': 1.1}) # 1.56μs -> 981ns (59.4% faster)

def test_empty_string_raises():
    # Test that an empty string raises ValueError from float()
    with pytest.raises(ValueError):
        validate_float("") # 2.24μs -> 1.65μs (36.4% faster)

def test_non_numeric_string_raises():
    # Test that a non-numeric string raises ValueError from float()
    with pytest.raises(ValueError):
        validate_float("hello") # 2.70μs -> 2.17μs (24.9% faster)

def test_string_nan():
    # Test that "nan" string returns float('nan')
    codeflash_output = validate_float("nan"); result = codeflash_output # 1.49μs -> 1.07μs (40.1% faster)

def test_string_inf():
    # Test that "inf" string returns float('inf')
    codeflash_output = validate_float("inf"); result = codeflash_output # 1.30μs -> 862ns (50.9% faster)

def test_string_negative_inf():
    # Test that "-inf" string returns float('-inf')
    codeflash_output = validate_float("-inf"); result = codeflash_output # 1.39μs -> 929ns (49.1% faster)

def test_string_with_exponent():
    # Test scientific notation string
    codeflash_output = validate_float("1.23e4") # 1.61μs -> 1.12μs (43.8% faster)

def test_string_with_plus_sign():
    # Test string with explicit plus sign
    codeflash_output = validate_float("+7.5") # 1.49μs -> 987ns (50.5% faster)

def test_string_with_minus_sign():
    # Test string with explicit minus sign
    codeflash_output = validate_float("-8.5") # 1.44μs -> 973ns (47.8% faster)

def test_string_with_multiple_dots_raises():
    # Test string with multiple decimal points raises ValueError
    with pytest.raises(ValueError):
        validate_float("1.2.3") # 2.33μs -> 1.79μs (29.6% faster)

def test_string_with_comma_raises():
    # Test string with comma (not a valid float) raises ValueError
    with pytest.raises(ValueError):
        validate_float("1,23") # 2.30μs -> 1.72μs (33.6% faster)

def test_string_with_unicode_digits():
    # Test string with unicode digits (should fail)
    with pytest.raises(ValueError):
        validate_float("123.45") # 3.55μs -> 2.93μs (21.2% faster)

def test_bool_raises():
    # Test that passing a boolean raises ValueError
    with pytest.raises(ValueError):
        validate_float(True) # 1.88μs -> 1.34μs (40.9% faster)

def test_bytes_raises():
    # Test that passing bytes raises ValueError
    with pytest.raises(ValueError):
        validate_float(b"123") # 1.48μs -> 948ns (56.4% faster)

# --- Large Scale Test Cases ---

def test_many_valid_strings():
    # Test conversion of many valid float strings
    floats = [str(i + 0.5) for i in range(1000)]
    for i, s in enumerate(floats):
        codeflash_output = validate_float(s) # 339μs -> 247μs (37.2% faster)

def test_many_invalid_strings():
    # Test conversion of many invalid strings
    invalids = [f"not_a_float_{i}" for i in range(1000)]
    for s in invalids:
        with pytest.raises(ValueError):
            validate_float(s)


def test_many_nones():
    # Test that many None inputs return None
    for _ in range(1000):
        codeflash_output = validate_float(None) # 132μs -> 133μs (0.352% slower)

def test_large_float_values():
    # Test very large and very small float values as strings
    codeflash_output = validate_float(str(1e308)) # 2.97μs -> 2.17μs (37.1% faster)
    codeflash_output = validate_float(str(-1e308)) # 869ns -> 721ns (20.5% faster)
    codeflash_output = validate_float(str(1e-308)) # 1.11μs -> 988ns (12.4% faster)
    codeflash_output = validate_float(str(-1e-308)) # 755ns -> 640ns (18.0% faster)

def test_large_scale_mixed_types():
    # Test a mix of valid and invalid types in a large list
    mixed = ["1.0", None, Unset(), "bad", 2.0, "3.14e2", {}, [], "inf", "-inf"]
    expected = [1.0, None, mixed[2], ValueError, 2.0, 314.0, ValueError, ValueError, float('inf'), float('-inf')]
    for inp, exp in zip(mixed, expected):
        if exp is ValueError:
            with pytest.raises(ValueError):
                validate_float(inp)
        elif isinstance(inp, Unset):
            codeflash_output = validate_float(inp)
        else:
            codeflash_output = validate_float(inp)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any

# imports
import pytest
from mistralai.utils.serializers import validate_float


class UnsetType:
    """A dummy Unset type to mimic the original Unset."""
    pass

Unset = UnsetType()
from mistralai.utils.serializers import validate_float

# unit tests

# --------------------
# BASIC TEST CASES
# --------------------

def test_none_returns_none():
    # Test that None input returns None
    codeflash_output = validate_float(None) # 454ns -> 469ns (3.20% slower)

def test_float_returns_self():
    # Test that a float input returns the same float
    codeflash_output = validate_float(3.14) # 573ns -> 472ns (21.4% faster)
    codeflash_output = validate_float(-2.0) # 230ns -> 184ns (25.0% faster)
    codeflash_output = validate_float(0.0) # 176ns -> 158ns (11.4% faster)


def test_string_float_valid():
    # Test that valid float strings are parsed correctly
    codeflash_output = validate_float("3.14") # 2.05μs -> 1.27μs (60.8% faster)
    codeflash_output = validate_float("0.0") # 584ns -> 465ns (25.6% faster)
    codeflash_output = validate_float("-2.5") # 568ns -> 405ns (40.2% faster)
    codeflash_output = validate_float("1e2") # 538ns -> 403ns (33.5% faster)
    codeflash_output = validate_float("-1e-2") # 417ns -> 296ns (40.9% faster)
    codeflash_output = validate_float("  42.0 ") # 515ns -> 394ns (30.7% faster)

def test_string_int_valid():
    # Test that integer strings are parsed correctly as floats
    codeflash_output = validate_float("7") # 1.36μs -> 878ns (55.4% faster)
    codeflash_output = validate_float("-3") # 537ns -> 405ns (32.6% faster)
    codeflash_output = validate_float("0") # 412ns -> 334ns (23.4% faster)

# --------------------
# EDGE TEST CASES
# --------------------

def test_invalid_type_raises():
    # Test that input of invalid type raises ValueError
    with pytest.raises(ValueError):
        validate_float(42) # 1.72μs -> 1.15μs (49.5% faster)
    with pytest.raises(ValueError):
        validate_float([1.1]) # 1.02μs -> 636ns (60.1% faster)
    with pytest.raises(ValueError):
        validate_float({'a': 1.1}) # 714ns -> 449ns (59.0% faster)
    with pytest.raises(ValueError):
        validate_float((1.1,)) # 533ns -> 495ns (7.68% faster)
    with pytest.raises(ValueError):
        validate_float(object()) # 668ns -> 527ns (26.8% faster)

def test_invalid_string_raises():
    # Test that invalid float strings raise ValueError
    with pytest.raises(ValueError):
        validate_float("abc") # 2.75μs -> 2.35μs (17.1% faster)
    with pytest.raises(ValueError):
        validate_float("3.14.15") # 1.26μs -> 1.02μs (24.4% faster)
    with pytest.raises(ValueError):
        validate_float("") # 926ns -> 793ns (16.8% faster)
    with pytest.raises(ValueError):
        validate_float(" ") # 860ns -> 738ns (16.5% faster)
    with pytest.raises(ValueError):
        validate_float("1,000") # 793ns -> 691ns (14.8% faster)

def test_bool_type_raises():
    # Test that bools raise ValueError (since bool is not float or str)
    with pytest.raises(ValueError):
        validate_float(True) # 1.84μs -> 1.23μs (49.6% faster)
    with pytest.raises(ValueError):
        validate_float(False) # 794ns -> 591ns (34.3% faster)

def test_string_with_leading_trailing_whitespace():
    # Test that strings with whitespace are handled correctly
    codeflash_output = validate_float("   123.45   ") # 1.54μs -> 1.11μs (39.4% faster)

def test_string_nan_inf():
    # Test that 'nan', 'inf', and '-inf' strings are parsed as special floats
    codeflash_output = validate_float("nan") # 1.45μs -> 1.06μs (36.9% faster)
    codeflash_output = validate_float("inf") # 612ns -> 422ns (45.0% faster)
    codeflash_output = validate_float("-inf") # 481ns -> 401ns (20.0% faster)

def test_string_with_plus_sign():
    # Test that strings with explicit plus sign are handled
    codeflash_output = validate_float("+1.23") # 1.35μs -> 973ns (39.1% faster)
    codeflash_output = validate_float("+0") # 562ns -> 440ns (27.7% faster)

def test_string_with_large_exponent():
    # Test that strings with large exponents are handled
    codeflash_output = validate_float("1e308") # 2.64μs -> 2.19μs (20.6% faster)
    codeflash_output = validate_float("-1e308") # 883ns -> 813ns (8.61% faster)



def test_subclass_of_float():
    # Test that subclasses of float are accepted
    class MyFloat(float): pass
    mf = MyFloat(1.23)
    codeflash_output = validate_float(mf) # 713ns -> 586ns (21.7% faster)

def test_subclass_of_str():
    # Test that subclasses of str are accepted and parsed
    class MyStr(str): pass
    ms = MyStr("4.56")
    codeflash_output = validate_float(ms) # 2.25μs -> 1.33μs (69.2% faster)


def test_many_valid_string_floats():
    # Test a large number of valid float strings
    for i in range(1000):
        s = str(i + 0.5)
        codeflash_output = validate_float(s) # 344μs -> 250μs (37.5% faster)

def test_many_invalid_types():
    # Test a large number of invalid types
    for i in range(100):
        with pytest.raises(ValueError):
            validate_float([i])
        with pytest.raises(ValueError):
            validate_float({i: i})
        with pytest.raises(ValueError):
            validate_float((i,))
        with pytest.raises(ValueError):
            validate_float(set([i]))

def test_large_numbers_as_strings():
    # Test very large and very small numbers as strings
    codeflash_output = validate_float(str(1e308)) # 2.69μs -> 2.13μs (26.3% faster)
    codeflash_output = validate_float(str(-1e308)) # 987ns -> 825ns (19.6% faster)
    codeflash_output = validate_float(str(1e-308)) # 1.07μs -> 1.03μs (3.79% faster)
    codeflash_output = validate_float(str(-1e-308)) # 705ns -> 635ns (11.0% faster)

def test_performance_large_batch():
    # Test performance on a batch of mixed valid/invalid strings
    valid = [str(i * 0.1) for i in range(500)]
    invalid = ["foo", "bar", "baz", "", " ", "1,000"] * 83  # 83*6=498
    results = []
    for v in valid:
        results.append(validate_float(v)) # 206μs -> 158μs (29.9% faster)
    for iv in invalid:
        with pytest.raises(ValueError):
            validate_float(iv)

def test_mixed_types_large_batch():
    # Test a batch of mixed types: float, str, None, Unset, invalid
    inputs = []
    expected = []
    for i in range(200):
        if i % 4 == 0:
            inputs.append(float(i))
            expected.append(float(i))
        elif i % 4 == 1:
            inputs.append(str(i + 0.5))
            expected.append(i + 0.5)
        elif i % 4 == 2:
            inputs.append(None)
            expected.append(None)
        else:
            inputs.append(Unset)
            expected.append(Unset)
    # Validate all
    for inp, exp in zip(inputs, expected):
        codeflash_output = validate_float(inp)

def test_large_scale_edge_cases():
    # Test a batch with edge-case strings and types
    edge_cases = [
        "nan", "inf", "-inf", "+0", "-0", "0.0", "1e-308", "-1e-308", "1e308", "-1e308", " 42 ",
        "3.1415926535897932384626433832795028841971",  # long float string
    ]
    for s in edge_cases:
        try:
            expected = float(s)
            codeflash_output = validate_float(s); result = codeflash_output
            # nan special case
            if s == "nan":
                pass
            else:
                pass
        except ValueError:
            with pytest.raises(ValueError):
                validate_float(s)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-validate_float-mh4iyb8d and push.

Codeflash

The optimization replaces `isinstance(f, (float, Unset))` with `isinstance(f, float) or f is Unset`. This change achieves a **24% speedup** by avoiding the costly tuple-based isinstance check.

**Key Changes:**
- **Split the isinstance check**: Instead of checking if `f` is an instance of either `float` or `Unset` in a single call, the code now checks for `float` first, then uses identity comparison (`is`) for `Unset`.
- **Use identity comparison for Unset**: Since `Unset` appears to be a singleton object (not a type), using `f is Unset` is both faster and more semantically correct than `isinstance(f, Unset)`.

**Why This is Faster:**
- **Tuple isinstance overhead**: `isinstance(f, (float, Unset))` creates a tuple and performs more complex type checking logic internally.
- **Short-circuit evaluation**: The `or` operator allows early exit when `f` is a float (the most common case), avoiding the `Unset` check entirely.
- **Identity vs isinstance**: `f is Unset` is a simple pointer comparison, much faster than isinstance checking.

**Performance Benefits by Test Case:**
- **Float inputs**: 21-28% faster (most common case benefits from short-circuiting)
- **String conversions**: 25-69% faster (reduced overhead in the type checking path)
- **Invalid types**: 49-60% faster (faster rejection path)
- **Large batches**: 29-37% faster (consistent improvement across scale)

This optimization is particularly effective for workloads with many float inputs, as they benefit most from the short-circuit evaluation.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 24, 2025 07:23
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant