Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 31, 2025

📄 18% (0.18x) speedup for parse_datetime in src/openai/_utils/_compat.py

⏱️ Runtime : 30.6 milliseconds 25.9 milliseconds (best of 58 runs)

📝 Explanation and details

The optimized code achieves an 18% speedup by eliminating expensive dictionary operations in the datetime parsing path. The key optimization is replacing the original approach of creating a dictionary via match.groupdict() and then iterating over it with {k: int(v) for k, v in kw.items() if v is not None}, with direct field extraction and conversion.

Specific optimizations:

  1. Direct field extraction: Instead of kw = match.groupdict() followed by dictionary comprehension, the code directly accesses gd['year'], gd['month'], etc., eliminating the intermediate dictionary creation and iteration overhead.

  2. Conditional microsecond processing: The microsecond padding logic (ljust(6, "0")) now only executes when microseconds are actually present, avoiding unnecessary string operations in cases without microseconds.

  3. Inline integer conversions: Fields are converted to integers immediately upon extraction (int(gd['year'])) rather than through a dictionary comprehension, reducing function call overhead.

  4. Eliminated type annotation overhead: Removed the Dict[str, Union[None, int, timezone]] type annotation for the intermediate dictionary since it's no longer needed.

Why this is faster:
Dictionary operations in Python have significant overhead - creating dictionaries, iterating with .items(), and key lookups are all expensive. The optimization eliminates these entirely for the common datetime parsing case. The test results show consistent 15-25% improvements for ISO string parsing cases, which represent the most common use pattern where these dictionary operations were the bottleneck.

The optimizations are most effective for ISO string inputs (the majority of test cases showing 15-27% improvements), while having minimal impact on numeric timestamp inputs that bypass this parsing logic entirely.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 11077 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re
from datetime import datetime, timedelta, timezone
from typing import Dict, Optional, Union

# imports
import pytest
from openai._utils._compat import parse_datetime

EPOCH = datetime(1970, 1, 1)
MS_WATERSHED = int(2e10)
MAX_NUMBER = int(3e20)

# --- Unit Tests ---

# --- Basic Test Cases ---

def test_parse_datetime_with_datetime_object():
    # Should return the same datetime object unchanged
    dt = datetime(2023, 4, 5, 12, 30, 45, 123456)
    codeflash_output = parse_datetime(dt); result = codeflash_output # 807ns -> 859ns (6.05% slower)

def test_parse_datetime_iso_string_no_timezone():
    # Should parse ISO string without timezone
    s = "2023-04-05T12:30:45"
    expected = datetime(2023, 4, 5, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 12.5μs -> 10.0μs (25.2% faster)

def test_parse_datetime_iso_string_with_z_timezone():
    # Should parse ISO string with Z (UTC) timezone
    s = "2023-04-05T12:30:45Z"
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(s) # 10.7μs -> 8.67μs (23.2% faster)

def test_parse_datetime_iso_string_with_offset_timezone():
    # Should parse ISO string with +02:00 timezone
    s = "2023-04-05T12:30:45+02:00"
    tz = timezone(timedelta(hours=2))
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=tz)
    codeflash_output = parse_datetime(s) # 12.9μs -> 10.5μs (22.4% faster)

def test_parse_datetime_iso_string_with_offset_timezone_short():
    # Should parse ISO string with +0200 timezone (no colon)
    s = "2023-04-05T12:30:45+0200"
    tz = timezone(timedelta(hours=2))
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=tz)
    codeflash_output = parse_datetime(s) # 12.3μs -> 10.3μs (19.4% faster)

def test_parse_datetime_iso_string_with_negative_offset():
    # Should parse ISO string with -05:30 timezone
    s = "2023-04-05T12:30:45-05:30"
    tz = timezone(timedelta(hours=-5, minutes=-30))
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=tz)
    codeflash_output = parse_datetime(s) # 12.1μs -> 10.6μs (14.5% faster)

def test_parse_datetime_iso_string_with_space_separator():
    # Should parse ISO string with space instead of T
    s = "2023-04-05 12:30:45"
    expected = datetime(2023, 4, 5, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 9.79μs -> 7.82μs (25.3% faster)

def test_parse_datetime_iso_string_with_microseconds():
    # Should parse ISO string with microseconds
    s = "2023-04-05T12:30:45.123456"
    expected = datetime(2023, 4, 5, 12, 30, 45, 123456)
    codeflash_output = parse_datetime(s) # 10.8μs -> 8.72μs (24.0% faster)

def test_parse_datetime_iso_string_with_short_microseconds():
    # Should parse ISO string with short microseconds (pad to 6 digits)
    s = "2023-04-05T12:30:45.1"
    expected = datetime(2023, 4, 5, 12, 30, 45, 100000)
    codeflash_output = parse_datetime(s) # 10.7μs -> 8.39μs (27.7% faster)

def test_parse_datetime_bytes_input():
    # Should decode bytes and parse correctly
    s = b"2023-04-05T12:30:45"
    expected = datetime(2023, 4, 5, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 10.4μs -> 8.42μs (23.7% faster)

def test_parse_datetime_unix_seconds_int():
    # Should parse integer seconds since epoch
    ts = 0
    expected = datetime(1970, 1, 1, 0, 0, 0, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 6.64μs -> 6.55μs (1.24% faster)

def test_parse_datetime_unix_seconds_float():
    # Should parse float seconds since epoch
    ts = 1680695445.123456
    expected = EPOCH + timedelta(seconds=ts)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts); result = codeflash_output # 4.62μs -> 4.66μs (0.794% slower)

def test_parse_datetime_unix_milliseconds():
    # Should parse milliseconds since epoch (int)
    ts = 1680695445123
    expected = EPOCH + timedelta(seconds=ts/1000)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts); result = codeflash_output # 4.61μs -> 4.52μs (1.97% faster)

def test_parse_datetime_unix_seconds_as_string():
    # Should parse seconds since epoch as string
    ts = "1680695445"
    expected = EPOCH + timedelta(seconds=int(ts))
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts); result = codeflash_output # 5.12μs -> 5.11μs (0.157% faster)

def test_parse_datetime_unix_milliseconds_as_string():
    # Should parse milliseconds as string
    ts = "1680695445123"
    expected = EPOCH + timedelta(seconds=int(ts)/1000)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts); result = codeflash_output # 5.42μs -> 5.39μs (0.464% faster)

# --- Edge Test Cases ---

def test_parse_datetime_invalid_format_raises():
    # Should raise ValueError for completely invalid string
    with pytest.raises(ValueError):
        parse_datetime("not a date") # 4.90μs -> 4.90μs (0.102% slower)

def test_parse_datetime_empty_string_raises():
    # Should raise ValueError for empty string
    with pytest.raises(ValueError):
        parse_datetime("") # 3.57μs -> 3.46μs (3.29% faster)

def test_parse_datetime_invalid_date_values_raises():
    # Should raise ValueError for invalid date values (e.g. month 13)
    with pytest.raises(ValueError):
        parse_datetime("2023-13-05T12:30:45") # 12.3μs -> 10.1μs (21.6% faster)

def test_parse_datetime_invalid_time_values_raises():
    # Should raise ValueError for invalid time values (e.g. hour 25)
    with pytest.raises(ValueError):
        parse_datetime("2023-04-05T25:30:45") # 11.0μs -> 9.22μs (19.2% faster)

def test_parse_datetime_missing_time_part_raises():
    # Should raise ValueError for missing time part
    with pytest.raises(ValueError):
        parse_datetime("2023-04-05") # 3.62μs -> 3.52μs (2.90% faster)

def test_parse_datetime_missing_date_part_raises():
    # Should raise ValueError for missing date part
    with pytest.raises(ValueError):
        parse_datetime("12:30:45") # 3.47μs -> 3.46μs (0.231% faster)


def test_parse_datetime_invalid_numeric_string_raises():
    # Should raise ValueError for numeric string that can't be parsed as float
    with pytest.raises(ValueError):
        parse_datetime("123abc") # 4.64μs -> 4.93μs (5.83% slower)

def test_parse_datetime_unix_seconds_below_min():
    # Should return datetime.min for very low negative value
    ts = -MAX_NUMBER - 1
    codeflash_output = parse_datetime(ts); result = codeflash_output # 2.21μs -> 2.18μs (1.15% faster)

def test_parse_datetime_unix_seconds_above_max():
    # Should return datetime.max for very high value
    ts = MAX_NUMBER + 1
    codeflash_output = parse_datetime(ts); result = codeflash_output # 1.56μs -> 1.65μs (5.17% slower)

def test_parse_datetime_unix_seconds_near_ms_watershed():
    # Should parse correctly for value just below MS_WATERSHED
    ts = MS_WATERSHED - 1
    expected = EPOCH + timedelta(seconds=ts)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 4.01μs -> 4.12μs (2.74% slower)

def test_parse_datetime_unix_milliseconds_near_ms_watershed():
    # Should parse correctly for value just above MS_WATERSHED (treated as ms)
    ts = MS_WATERSHED + 1
    expected = EPOCH + timedelta(seconds=(ts/1000))
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 4.37μs -> 4.50μs (2.87% slower)

def test_parse_datetime_leading_trailing_spaces():
    # Should fail for leading/trailing spaces (not matched by regex)
    with pytest.raises(ValueError):
        parse_datetime(" 2023-04-05T12:30:45 ") # 4.51μs -> 4.60μs (1.96% slower)

def test_parse_datetime_time_only_with_timezone():
    # Should fail for time only with timezone
    with pytest.raises(ValueError):
        parse_datetime("12:30:45Z") # 3.69μs -> 3.78μs (2.20% slower)

def test_parse_datetime_partial_iso_string():
    # Should fail for partial ISO string
    with pytest.raises(ValueError):
        parse_datetime("2023-04-05T12") # 4.28μs -> 4.32μs (0.880% slower)

def test_parse_datetime_microsecond_padding():
    # Should pad microseconds to 6 digits
    s = "2023-04-05T12:30:45.123"
    expected = datetime(2023, 4, 5, 12, 30, 45, 123000)
    codeflash_output = parse_datetime(s) # 12.6μs -> 10.4μs (21.6% faster)

def test_parse_datetime_microsecond_max_digits():
    # Should parse microseconds with max 6 digits
    s = "2023-04-05T12:30:45.123456789"
    expected = datetime(2023, 4, 5, 12, 30, 45, 123456)
    codeflash_output = parse_datetime(s) # 10.6μs -> 8.81μs (20.3% faster)

def test_parse_datetime_negative_unix_seconds():
    # Should parse negative seconds since epoch
    ts = -1
    expected = EPOCH + timedelta(seconds=-1)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 4.11μs -> 4.21μs (2.38% slower)

def test_parse_datetime_negative_unix_milliseconds():
    # Should parse negative milliseconds since epoch
    ts = -1000
    expected = EPOCH + timedelta(seconds=-1)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 3.67μs -> 3.80μs (3.39% slower)

def test_parse_datetime_iso_string_with_zero_offset():
    # Should parse ISO string with +00:00 as UTC
    s = "2023-04-05T12:30:45+00:00"
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(s) # 14.6μs -> 12.6μs (15.9% faster)

def test_parse_datetime_iso_string_with_offset_minutes():
    # Should parse ISO string with +02:30 offset
    s = "2023-04-05T12:30:45+02:30"
    tz = timezone(timedelta(hours=2, minutes=30))
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=tz)
    codeflash_output = parse_datetime(s) # 12.5μs -> 10.2μs (23.3% faster)

def test_parse_datetime_iso_string_with_offset_minutes_short():
    # Should parse ISO string with +0230 offset (no colon)
    s = "2023-04-05T12:30:45+0230"
    tz = timezone(timedelta(hours=2, minutes=30))
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=tz)
    codeflash_output = parse_datetime(s) # 12.1μs -> 9.81μs (23.0% faster)

def test_parse_datetime_iso_string_with_offset_minutes_negative():
    # Should parse ISO string with -0230 offset
    s = "2023-04-05T12:30:45-0230"
    tz = timezone(timedelta(hours=-2, minutes=-30))
    expected = datetime(2023, 4, 5, 12, 30, 45, tzinfo=tz)
    codeflash_output = parse_datetime(s) # 12.2μs -> 10.1μs (20.6% faster)

# --- Large Scale Test Cases ---


def test_parse_datetime_large_list_of_iso_strings():
    # Should parse a large list of ISO strings correctly and efficiently
    results = []
    for i in range(1, 1001):
        s = f"2023-04-05T12:30:{i%60:02d}"
        expected = datetime(2023, 4, 5, 12, 30, i%60)
        codeflash_output = parse_datetime(s); dt = codeflash_output # 3.29ms -> 2.57ms (27.7% faster)
        results.append(dt)

def test_parse_datetime_large_list_of_milliseconds_strings():
    # Should parse a large list of millisecond strings correctly
    base = 1680695445000
    results = []
    for i in range(1000):
        ts = str(base + i * 1000)
        expected = EPOCH + timedelta(seconds=(int(ts)/1000))
        expected = expected.replace(tzinfo=timezone.utc)
        codeflash_output = parse_datetime(ts); dt = codeflash_output # 1.81ms -> 1.80ms (0.014% faster)
        results.append(dt)

def test_parse_datetime_large_list_of_bytes_iso_strings():
    # Should parse a large list of ISO strings in bytes
    results = []
    for i in range(1, 1001):
        s = f"2023-04-05T12:30:{i%60:02d}".encode()
        expected = datetime(2023, 4, 5, 12, 30, i%60)
        codeflash_output = parse_datetime(s); dt = codeflash_output # 3.33ms -> 2.62ms (27.2% faster)
        results.append(dt)

def test_parse_datetime_large_list_of_varied_inputs():
    # Should parse a large list of varied valid inputs
    inputs = []
    for i in range(250):
        inputs.append(f"2023-04-05T12:30:{i%60:02d}")
        inputs.append(1680695445 + i)
        inputs.append(str(1680695445 + i))
        inputs.append(b"2023-04-05T12:30:45")
    for inp in inputs:
        codeflash_output = parse_datetime(inp); dt = codeflash_output # 2.51ms -> 2.15ms (16.7% faster)
        # For bytes, always same expected
        if isinstance(inp, bytes):
            expected = datetime(2023, 4, 5, 12, 30, 45)
        elif isinstance(inp, str) and inp.isdigit():
            expected = EPOCH + timedelta(seconds=int(inp))
            expected = expected.replace(tzinfo=timezone.utc)
        elif isinstance(inp, int):
            expected = EPOCH + timedelta(seconds=inp)
            expected = expected.replace(tzinfo=timezone.utc)
        else:
            # ISO string
            m = re.match(r"2023-04-05T12:30:(\d{2})", inp)
            if m:
                sec = int(m.group(1))
                expected = datetime(2023, 4, 5, 12, 30, sec)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from datetime import datetime, timedelta, timezone

# imports
import pytest
from openai._utils._compat import parse_datetime

EPOCH = datetime(1970, 1, 1)
MS_WATERSHED = int(2e10)
MAX_NUMBER = int(3e20)

# unit tests

# ----------- BASIC TEST CASES -----------

def test_parse_datetime_from_datetime_object():
    # Should return the same object if input is already a datetime
    dt = datetime(2023, 5, 17, 12, 30, 45)
    codeflash_output = parse_datetime(dt) # 679ns -> 821ns (17.3% slower)

def test_parse_datetime_from_iso_string():
    # Basic ISO format, no timezone
    s = "2023-05-17T12:30:45"
    expected = datetime(2023, 5, 17, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 10.5μs -> 9.03μs (16.5% faster)

def test_parse_datetime_from_iso_string_with_space():
    # Accepts space as separator
    s = "2023-05-17 12:30:45"
    expected = datetime(2023, 5, 17, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 9.81μs -> 8.17μs (20.1% faster)

def test_parse_datetime_with_microseconds():
    # Microsecond parsing
    s = "2023-05-17T12:30:45.123"
    expected = datetime(2023, 5, 17, 12, 30, 45, 123000)
    codeflash_output = parse_datetime(s) # 10.8μs -> 9.09μs (19.3% faster)

def test_parse_datetime_with_full_microseconds():
    # 6-digit microsecond
    s = "2023-05-17T12:30:45.123456"
    expected = datetime(2023, 5, 17, 12, 30, 45, 123456)
    codeflash_output = parse_datetime(s) # 9.96μs -> 8.27μs (20.4% faster)

def test_parse_datetime_with_utc_timezone_z():
    # Z means UTC
    s = "2023-05-17T12:30:45Z"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(s) # 9.54μs -> 7.72μs (23.5% faster)

def test_parse_datetime_with_positive_timezone_offset():
    # +02:00 offset
    s = "2023-05-17T12:30:45+02:00"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=2)))
    codeflash_output = parse_datetime(s) # 12.4μs -> 10.9μs (14.5% faster)

def test_parse_datetime_with_negative_timezone_offset():
    # -05:30 offset
    s = "2023-05-17T12:30:45-05:30"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=-5, minutes=-30)))
    codeflash_output = parse_datetime(s) # 12.2μs -> 10.2μs (19.2% faster)

def test_parse_datetime_with_short_timezone_offset():
    # +02 offset (no minutes)
    s = "2023-05-17T12:30:45+02"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=2)))
    codeflash_output = parse_datetime(s) # 11.4μs -> 9.75μs (16.6% faster)

def test_parse_datetime_with_short_timezone_offset_negative():
    # -07 offset (no minutes)
    s = "2023-05-17T12:30:45-07"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=-7)))
    codeflash_output = parse_datetime(s) # 11.5μs -> 9.82μs (17.5% faster)

def test_parse_datetime_from_unix_seconds_int():
    # Unix timestamp (seconds)
    ts = 1684325445
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 6.33μs -> 6.11μs (3.72% faster)

def test_parse_datetime_from_unix_seconds_float():
    # Unix timestamp (seconds, float)
    ts = 1684325445.5
    expected = datetime(2023, 5, 17, 12, 30, 45, 500000, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 6.24μs -> 6.63μs (5.81% slower)

def test_parse_datetime_from_unix_milliseconds():
    # Unix timestamp (milliseconds)
    ts = 1684325445123
    expected = datetime(2023, 5, 17, 12, 30, 45, 123000, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 6.55μs -> 6.64μs (1.34% slower)

def test_parse_datetime_from_unix_seconds_as_str():
    # Unix timestamp as string
    ts = "1684325445"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 6.34μs -> 6.51μs (2.67% slower)

def test_parse_datetime_from_unix_milliseconds_as_str():
    # Unix timestamp in ms as string
    ts = "1684325445123"
    expected = datetime(2023, 5, 17, 12, 30, 45, 123000, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 7.07μs -> 7.26μs (2.71% slower)

def test_parse_datetime_from_bytes_iso_string():
    # ISO string as bytes
    s = b"2023-05-17T12:30:45"
    expected = datetime(2023, 5, 17, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 11.6μs -> 9.86μs (17.5% faster)

def test_parse_datetime_from_bytes_unix_seconds():
    # Unix timestamp as bytes
    s = b"1684325445"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone.utc)
    codeflash_output = parse_datetime(s) # 6.66μs -> 6.51μs (2.32% faster)

# ----------- EDGE TEST CASES -----------

def test_parse_datetime_invalid_format_raises():
    # Should raise ValueError for invalid format
    with pytest.raises(ValueError):
        parse_datetime("not-a-datetime") # 4.97μs -> 4.98μs (0.040% slower)

def test_parse_datetime_partial_date_raises():
    # Should raise ValueError for missing time
    with pytest.raises(ValueError):
        parse_datetime("2023-05-17") # 3.58μs -> 3.67μs (2.18% slower)

def test_parse_datetime_partial_time_raises():
    # Should raise ValueError for missing date
    with pytest.raises(ValueError):
        parse_datetime("12:30:45") # 3.61μs -> 3.68μs (2.09% slower)

def test_parse_datetime_invalid_numeric_type_raises():
    # Should raise TypeError for non-numeric, non-str types
    class Dummy: pass
    with pytest.raises(TypeError):
        parse_datetime(Dummy()) # 3.73μs -> 3.65μs (2.11% faster)

def test_parse_datetime_invalid_numeric_string_raises():
    # Should raise ValueError for string that can't be parsed as number or datetime
    with pytest.raises(ValueError):
        parse_datetime("notanumber") # 4.34μs -> 4.20μs (3.40% faster)

def test_parse_datetime_invalid_month_raises():
    # Month out of range
    with pytest.raises(ValueError):
        parse_datetime("2023-13-17T12:30:45") # 12.2μs -> 9.95μs (22.8% faster)

def test_parse_datetime_invalid_day_raises():
    # Day out of range
    with pytest.raises(ValueError):
        parse_datetime("2023-05-32T12:30:45") # 10.9μs -> 9.32μs (16.4% faster)

def test_parse_datetime_invalid_hour_raises():
    # Hour out of range
    with pytest.raises(ValueError):
        parse_datetime("2023-05-17T25:30:45") # 10.6μs -> 8.84μs (19.9% faster)

def test_parse_datetime_invalid_minute_raises():
    # Minute out of range
    with pytest.raises(ValueError):
        parse_datetime("2023-05-17T12:60:45") # 9.91μs -> 8.64μs (14.8% faster)

def test_parse_datetime_invalid_second_raises():
    # Second out of range
    with pytest.raises(ValueError):
        parse_datetime("2023-05-17T12:30:61") # 10.0μs -> 8.75μs (14.6% faster)

def test_parse_datetime_invalid_microsecond_length():
    # microsecond > 6 digits, should truncate
    s = "2023-05-17T12:30:45.123456789"
    # Only first 6 digits should be considered, rest ignored
    expected = datetime(2023, 5, 17, 12, 30, 45, 123456)
    codeflash_output = parse_datetime(s) # 10.7μs -> 9.11μs (17.3% faster)

def test_parse_datetime_max_number_returns_max():
    # Input above MAX_NUMBER returns datetime.max
    ts = MAX_NUMBER + 1
    codeflash_output = parse_datetime(ts) # 1.75μs -> 1.98μs (11.8% slower)

def test_parse_datetime_min_number_returns_min():
    # Input below -MAX_NUMBER returns datetime.min
    ts = -MAX_NUMBER - 1
    codeflash_output = parse_datetime(ts) # 1.74μs -> 1.98μs (12.3% slower)

def test_parse_datetime_ms_watershed_boundary():
    # Test the boundary between ms and s
    ts = MS_WATERSHED + 1
    # Should be treated as ms (converted to seconds)
    expected = EPOCH + timedelta(seconds=(MS_WATERSHED + 1) / 1000)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 4.90μs -> 4.86μs (0.781% faster)

def test_parse_datetime_negative_ms_watershed_boundary():
    # Negative ms watershed
    ts = -(MS_WATERSHED + 1)
    expected = EPOCH + timedelta(seconds=-(MS_WATERSHED + 1) / 1000)
    expected = expected.replace(tzinfo=timezone.utc)
    codeflash_output = parse_datetime(ts) # 4.49μs -> 4.68μs (4.00% slower)

def test_parse_datetime_leap_year():
    # Feb 29 on a leap year
    s = "2020-02-29T12:30:45"
    expected = datetime(2020, 2, 29, 12, 30, 45)
    codeflash_output = parse_datetime(s) # 11.5μs -> 9.27μs (24.1% faster)

def test_parse_datetime_non_leap_year_feb_29_raises():
    # Feb 29 on non-leap year should raise
    with pytest.raises(ValueError):
        parse_datetime("2019-02-29T12:30:45") # 10.5μs -> 9.06μs (16.0% faster)

def test_parse_datetime_zero_microseconds():
    # .0 microsecond should be parsed as 0
    s = "2023-05-17T12:30:45.0"
    expected = datetime(2023, 5, 17, 12, 30, 45, 0)
    codeflash_output = parse_datetime(s) # 10.9μs -> 8.91μs (22.5% faster)

def test_parse_datetime_microsecond_padding():
    # .1 should be 100000 microseconds
    s = "2023-05-17T12:30:45.1"
    expected = datetime(2023, 5, 17, 12, 30, 45, 100000)
    codeflash_output = parse_datetime(s) # 10.1μs -> 8.17μs (23.7% faster)

def test_parse_datetime_timezone_offset_no_colon():
    # +0200 instead of +02:00
    s = "2023-05-17T12:30:45+0200"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=2)))
    codeflash_output = parse_datetime(s) # 12.4μs -> 10.5μs (18.0% faster)

def test_parse_datetime_timezone_offset_no_colon_negative():
    # -0530 instead of -05:30
    s = "2023-05-17T12:30:45-0530"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=-5, minutes=-30)))
    codeflash_output = parse_datetime(s) # 12.2μs -> 10.2μs (19.9% faster)

def test_parse_datetime_timezone_offset_minutes_only():
    # +00:30 offset
    s = "2023-05-17T12:30:45+00:30"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(minutes=30)))
    codeflash_output = parse_datetime(s) # 11.8μs -> 9.63μs (22.7% faster)

def test_parse_datetime_timezone_offset_minutes_only_negative():
    # -00:45 offset
    s = "2023-05-17T12:30:45-00:45"
    expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(minutes=-45)))
    codeflash_output = parse_datetime(s) # 11.9μs -> 10.1μs (18.7% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_parse_datetime_large_batch_iso_strings():
    # Parse 1000 valid ISO strings, all different seconds
    base = datetime(2023, 5, 17, 12, 30, 0)
    for i in range(1000):
        s = f"2023-05-17T12:30:{i%60:02d}"
        expected = base.replace(second=i%60)
        codeflash_output = parse_datetime(s) # 3.25ms -> 2.55ms (27.8% faster)

def test_parse_datetime_large_batch_unix_timestamps():
    # Parse 1000 unix timestamps (seconds)
    base_ts = 1684325400  # 2023-05-17T12:30:00 UTC
    for i in range(1000):
        ts = base_ts + i
        expected = datetime(2023, 5, 17, 12, 30, 0, tzinfo=timezone.utc) + timedelta(seconds=i)
        codeflash_output = parse_datetime(ts) # 1.45ms -> 1.46ms (0.452% slower)

def test_parse_datetime_large_batch_milliseconds():
    # Parse 1000 unix timestamps (milliseconds)
    base_ts = 1684325400000  # 2023-05-17T12:30:00.000 UTC
    for i in range(1000):
        ts = base_ts + i
        expected = datetime(2023, 5, 17, 12, 30, 0, tzinfo=timezone.utc) + timedelta(milliseconds=i)
        codeflash_output = parse_datetime(ts) # 1.62ms -> 1.63ms (0.548% slower)

def test_parse_datetime_large_batch_bytes_iso_strings():
    # Parse 1000 ISO strings as bytes
    base = datetime(2023, 5, 17, 12, 30, 0)
    for i in range(1000):
        s = f"2023-05-17T12:30:{i%60:02d}".encode()
        expected = base.replace(second=i%60)
        codeflash_output = parse_datetime(s) # 3.32ms -> 2.62ms (26.6% faster)

def test_parse_datetime_large_batch_bytes_unix_timestamps():
    # Parse 1000 unix timestamps as bytes
    base_ts = 1684325400
    for i in range(1000):
        ts = str(base_ts + i).encode()
        expected = datetime(2023, 5, 17, 12, 30, 0, tzinfo=timezone.utc) + timedelta(seconds=i)
        codeflash_output = parse_datetime(ts) # 1.66ms -> 1.67ms (0.379% slower)

def test_parse_datetime_large_batch_varied_timezones():
    # Parse 1000 ISO strings with varying timezones
    for i in range(1000):
        hour_offset = (i % 25) - 12  # from -12 to +12
        s = f"2023-05-17T12:30:45{'+' if hour_offset>=0 else '-'}{abs(hour_offset):02d}:00"
        expected = datetime(2023, 5, 17, 12, 30, 45, tzinfo=timezone(timedelta(hours=hour_offset)))
        codeflash_output = parse_datetime(s) # 4.25ms -> 3.54ms (20.1% faster)

def test_parse_datetime_large_batch_microseconds():
    # Parse 1000 ISO strings with microseconds
    for i in range(1000):
        us = i % 1000000
        s = f"2023-05-17T12:30:45.{us:06d}"
        expected = datetime(2023, 5, 17, 12, 30, 45, us)
        codeflash_output = parse_datetime(s) # 3.46ms -> 2.72ms (27.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-parse_datetime-mhe3n0mk and push.

Codeflash Static Badge

The optimized code achieves an 18% speedup by eliminating expensive dictionary operations in the datetime parsing path. The key optimization is replacing the original approach of creating a dictionary via `match.groupdict()` and then iterating over it with `{k: int(v) for k, v in kw.items() if v is not None}`, with direct field extraction and conversion.

**Specific optimizations:**

1. **Direct field extraction**: Instead of `kw = match.groupdict()` followed by dictionary comprehension, the code directly accesses `gd['year']`, `gd['month']`, etc., eliminating the intermediate dictionary creation and iteration overhead.

2. **Conditional microsecond processing**: The microsecond padding logic (`ljust(6, "0")`) now only executes when microseconds are actually present, avoiding unnecessary string operations in cases without microseconds.

3. **Inline integer conversions**: Fields are converted to integers immediately upon extraction (`int(gd['year'])`) rather than through a dictionary comprehension, reducing function call overhead.

4. **Eliminated type annotation overhead**: Removed the `Dict[str, Union[None, int, timezone]]` type annotation for the intermediate dictionary since it's no longer needed.

**Why this is faster:**
Dictionary operations in Python have significant overhead - creating dictionaries, iterating with `.items()`, and key lookups are all expensive. The optimization eliminates these entirely for the common datetime parsing case. The test results show consistent 15-25% improvements for ISO string parsing cases, which represent the most common use pattern where these dictionary operations were the bottleneck.

The optimizations are most effective for ISO string inputs (the majority of test cases showing 15-27% improvements), while having minimal impact on numeric timestamp inputs that bypass this parsing logic entirely.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 31, 2025 00:12
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant