Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 31, 2025

📄 17% (0.17x) speedup for parse_date in src/openai/_utils/_compat.py

⏱️ Runtime : 15.4 milliseconds 13.1 milliseconds (best of 70 runs)

📝 Explanation and details

The optimization replaces the dictionary comprehension {k: int(v) for k, v in match.groupdict().items()} with direct group access using match.group(). Instead of creating a dictionary with named keys and then unpacking it with **kw, the code now directly extracts the year, month, and day values and passes them as positional arguments to the date() constructor.

This eliminates several performance bottlenecks:

  • Dictionary creation overhead: The original code creates a temporary dictionary with string keys
  • Dictionary iteration: match.groupdict().items() requires iterating over all matched groups
  • Keyword argument unpacking: **kw adds overhead when calling the constructor

The line profiler shows the optimization's impact: the original dictionary comprehension took 7.99ms (16.6% of total time), while the optimized version spreads this work across three simpler lines that total about 5.27ms (11.4% of total time) - a ~34% reduction in this hot code path.

The optimization is particularly effective for string parsing workloads, as shown in the test results where ISO date string parsing sees 24-35% speedups. The regex matching itself (date_re.match()) remains unchanged, so the benefit comes purely from more efficient post-processing of the match results.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9261 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from datetime import date, datetime, timedelta, timezone

# imports
import pytest
from openai._utils._compat import parse_date

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_parse_date_with_iso_string():
    # Standard ISO format string
    codeflash_output = parse_date("2023-06-15") # 7.60μs -> 6.11μs (24.6% faster)

def test_parse_date_with_single_digit_month_day():
    # Month and day with single digits
    codeflash_output = parse_date("2023-6-5") # 6.19μs -> 4.68μs (32.2% faster)

def test_parse_date_with_bytes():
    # Bytes input
    codeflash_output = parse_date(b"2023-06-15") # 6.80μs -> 5.16μs (31.8% faster)

def test_parse_date_with_date_object():
    # Already a date object
    d = date(2020, 1, 2)
    codeflash_output = parse_date(d) # 674ns -> 637ns (5.81% faster)

def test_parse_date_with_datetime_object():
    # Datetime object should return its date part
    dt = datetime(2020, 1, 2, 13, 14, 15)
    codeflash_output = parse_date(dt) # 960ns -> 991ns (3.13% slower)

def test_parse_date_with_unix_seconds_int():
    # Unix timestamp (seconds)
    ts = 1686825600  # 2023-06-15 UTC
    codeflash_output = parse_date(ts) # 7.57μs -> 8.23μs (8.08% slower)

def test_parse_date_with_unix_seconds_float():
    # Unix timestamp (float seconds)
    ts = 1686825600.0
    codeflash_output = parse_date(ts) # 6.97μs -> 7.08μs (1.55% slower)

def test_parse_date_with_unix_milliseconds():
    # Unix timestamp in milliseconds
    ts = 1686825600000  # 2023-06-15 UTC
    codeflash_output = parse_date(ts) # 7.09μs -> 7.02μs (1.05% faster)

def test_parse_date_with_unix_seconds_string():
    # Unix timestamp as string
    ts = "1686825600"
    codeflash_output = parse_date(ts) # 7.09μs -> 6.97μs (1.71% faster)

def test_parse_date_with_unix_milliseconds_string():
    # Unix timestamp in ms as string
    ts = "1686825600000"
    codeflash_output = parse_date(ts) # 7.17μs -> 7.14μs (0.294% faster)

def test_parse_date_with_unix_seconds_bytes():
    # Unix timestamp as bytes
    ts = b"1686825600"
    codeflash_output = parse_date(ts) # 6.52μs -> 6.71μs (2.77% slower)

# ------------------------
# Edge Test Cases
# ------------------------

def test_parse_date_invalid_format():
    # Invalid string format should raise ValueError
    with pytest.raises(ValueError):
        parse_date("15/06/2023") # 4.26μs -> 4.30μs (0.885% slower)

def test_parse_date_invalid_date_values():
    # Well-formatted but invalid date (e.g. Feb 30)
    with pytest.raises(ValueError):
        parse_date("2023-02-30") # 8.52μs -> 6.51μs (30.7% faster)

def test_parse_date_empty_string():
    # Empty string should raise ValueError
    with pytest.raises(ValueError):
        parse_date("") # 3.30μs -> 3.53μs (6.54% slower)

def test_parse_date_non_string_bytes():
    # Bytes that can't be decoded to a date
    with pytest.raises(ValueError):
        parse_date(b"notadate") # 4.52μs -> 4.74μs (4.64% slower)

def test_parse_date_non_numeric_string():
    # Non-numeric string not matching date format
    with pytest.raises(ValueError):
        parse_date("foobar") # 3.81μs -> 3.88μs (1.83% slower)



def test_parse_date_max_number():
    # Number > MAX_NUMBER returns datetime.max.date()
    max_number = int(3e21)
    codeflash_output = parse_date(max_number) # 2.53μs -> 2.62μs (3.32% slower)

def test_parse_date_min_number():
    # Number < -MAX_NUMBER returns datetime.min.date()
    min_number = -int(3e21)
    codeflash_output = parse_date(min_number) # 2.23μs -> 2.12μs (4.90% faster)

def test_parse_date_ms_watershed_boundary():
    # Test at the MS_WATERSHED boundary (should be treated as ms)
    ms_watershed = int(2e10)
    # This is in ms, so should be 1970-08-20
    codeflash_output = parse_date(ms_watershed) # 8.12μs -> 8.42μs (3.60% slower)
    # Just below boundary, treated as seconds
    codeflash_output = parse_date(ms_watershed - 1) # 2.58μs -> 2.62μs (1.49% slower)

def test_parse_date_leading_and_trailing_spaces():
    # Spaces should cause ValueError (not stripped)
    with pytest.raises(ValueError):
        parse_date(" 2023-06-15") # 4.69μs -> 4.70μs (0.191% slower)
    with pytest.raises(ValueError):
        parse_date("2023-06-15 ") # 2.80μs -> 2.83μs (1.13% slower)

def test_parse_date_string_with_extra_characters():
    # Extra characters after date should cause ValueError
    with pytest.raises(ValueError):
        parse_date("2023-06-15T00:00:00") # 3.83μs -> 3.93μs (2.54% slower)

def test_parse_date_invalid_month_and_day():
    # Month > 12 or day > 31 should raise ValueError
    with pytest.raises(ValueError):
        parse_date("2023-13-01") # 8.46μs -> 6.40μs (32.1% faster)
    with pytest.raises(ValueError):
        parse_date("2023-12-32") # 4.12μs -> 3.28μs (25.4% faster)

def test_parse_date_negative_unix_timestamp():
    # Negative unix timestamp (before epoch)
    # -1 is 1969-12-31
    codeflash_output = parse_date(-1) # 7.01μs -> 7.74μs (9.39% slower)

def test_parse_date_zero_unix_timestamp():
    # Zero unix timestamp is the epoch
    codeflash_output = parse_date(0) # 5.68μs -> 5.81μs (2.32% slower)


def test_parse_date_many_iso_strings():
    # Test parsing 1000 valid ISO date strings
    for i in range(1, 1001):
        s = f"2023-06-{i%28+1}"  # days 1-28
        expected = date(2023, 6, i%28+1)
        codeflash_output = parse_date(s) # 1.95ms -> 1.44ms (35.4% faster)

def test_parse_date_many_unix_timestamps():
    # Test parsing 1000 unix timestamps for consecutive days
    start_ts = int(datetime(2020, 1, 1, tzinfo=timezone.utc).timestamp())
    for i in range(1000):
        ts = start_ts + i*86400  # add i days
        expected = date(2020, 1, 1) + timedelta(days=i)
        codeflash_output = parse_date(ts) # 1.49ms -> 1.48ms (0.740% faster)

def test_parse_date_many_bytes_iso_strings():
    # Test parsing 500 valid ISO date strings as bytes
    for i in range(1, 501):
        s = f"2022-12-{i%28+1}".encode()
        expected = date(2022, 12, i%28+1)
        codeflash_output = parse_date(s) # 981μs -> 732μs (33.9% faster)

def test_parse_date_many_unix_milliseconds():
    # Test parsing 500 unix timestamps in ms for consecutive days
    start_dt = datetime(2019, 1, 1, tzinfo=timezone.utc)
    start_ts_ms = int(start_dt.timestamp() * 1000)
    for i in range(500):
        ts = start_ts_ms + i*86400*1000  # add i days in ms
        expected = date(2019, 1, 1) + timedelta(days=i)
        codeflash_output = parse_date(ts) # 801μs -> 798μs (0.421% faster)

def test_parse_date_performance_large_list():
    # Test performance on a list of 1000 mixed valid date strings and unix timestamps
    dates = []
    for i in range(500):
        dates.append(f"2021-07-{i%28+1}")
        dates.append(int(datetime(2021, 7, i%28+1, tzinfo=timezone.utc).timestamp()))
    for i, v in enumerate(dates):
        if isinstance(v, str):
            expected = date(2021, 7, int(v.split('-')[2]))
        else:
            expected = date(2021, 7, ((v - int(datetime(2021, 7, 1, tzinfo=timezone.utc).timestamp()))//86400)+1)
        codeflash_output = parse_date(v) # 1.76ms -> 1.52ms (16.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from datetime import date, datetime, timedelta, timezone

# imports
import pytest
from openai._utils._compat import parse_date

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_parse_date_with_iso_string():
    # Test standard ISO date string
    codeflash_output = parse_date("2023-06-15") # 8.28μs -> 6.52μs (26.9% faster)

def test_parse_date_with_single_digit_month_day():
    # Test single-digit month and day
    codeflash_output = parse_date("2023-6-7") # 6.94μs -> 5.23μs (32.8% faster)

def test_parse_date_with_bytes_iso_string():
    # Test bytes input
    codeflash_output = parse_date(b"2023-06-15") # 6.96μs -> 5.52μs (26.1% faster)

def test_parse_date_with_date_object():
    # Test direct date object
    d = date(2022, 1, 1)
    codeflash_output = parse_date(d) # 655ns -> 632ns (3.64% faster)

def test_parse_date_with_datetime_object():
    # Test direct datetime object
    dt = datetime(2022, 1, 1, 12, 30)
    codeflash_output = parse_date(dt) # 989ns -> 983ns (0.610% faster)

def test_parse_date_with_unix_seconds_int():
    # Test unix timestamp as int
    # 1970-01-02 00:00:00 UTC
    codeflash_output = parse_date(86400) # 6.97μs -> 7.51μs (7.15% slower)

def test_parse_date_with_unix_seconds_float():
    # Test unix timestamp as float
    # 1970-01-02 00:00:00 UTC
    codeflash_output = parse_date(86400.0) # 6.96μs -> 6.74μs (3.31% faster)

def test_parse_date_with_unix_seconds_string():
    # Test unix timestamp as string
    codeflash_output = parse_date("86400") # 6.82μs -> 6.95μs (1.89% slower)

def test_parse_date_with_unix_seconds_bytes():
    # Test unix timestamp as bytes
    codeflash_output = parse_date(b"86400") # 6.70μs -> 6.66μs (0.540% faster)

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_parse_date_with_invalid_date_string_format():
    # Test invalid date string format
    with pytest.raises(ValueError):
        parse_date("15-06-2023") # 4.32μs -> 4.09μs (5.63% faster)

def test_parse_date_with_non_date_string():
    # Test completely invalid string
    with pytest.raises(ValueError):
        parse_date("not-a-date") # 4.63μs -> 4.60μs (0.608% faster)

def test_parse_date_with_empty_string():
    # Test empty string
    with pytest.raises(ValueError):
        parse_date("") # 3.42μs -> 3.46μs (1.24% slower)

def test_parse_date_with_invalid_bytes():
    # Test invalid bytes
    with pytest.raises(ValueError):
        parse_date(b"not-a-date") # 4.42μs -> 4.51μs (2.04% slower)

def test_parse_date_with_invalid_numeric_string():
    # Test numeric string that isn't a valid date or timestamp
    with pytest.raises(ValueError):
        parse_date("9999-99-99") # 8.58μs -> 6.58μs (30.4% faster)

def test_parse_date_with_invalid_date_values():
    # Test impossible date
    with pytest.raises(ValueError):
        parse_date("2023-02-30") # 7.74μs -> 6.07μs (27.7% faster)

def test_parse_date_with_leap_year():
    # Test leap year date
    codeflash_output = parse_date("2020-02-29") # 6.68μs -> 5.09μs (31.4% faster)

def test_parse_date_with_non_leap_year_feb_29():
    # Test non-leap year Feb 29
    with pytest.raises(ValueError):
        parse_date("2021-02-29") # 7.56μs -> 5.69μs (32.9% faster)

def test_parse_date_with_negative_unix_timestamp():
    # Test negative unix timestamp (before epoch)
    codeflash_output = parse_date(-86400) # 7.89μs -> 8.00μs (1.34% slower)

def test_parse_date_with_large_unix_timestamp_seconds():
    # Test large unix timestamp (seconds)
    # 11th October 2603
    codeflash_output = parse_date(20000000000) # 6.38μs -> 6.51μs (1.94% slower)

def test_parse_date_with_large_unix_timestamp_milliseconds():
    # Test large unix timestamp (milliseconds)
    # 20th August 1970
    codeflash_output = parse_date(20000000000.0) # 6.55μs -> 6.67μs (1.84% slower)

def test_parse_date_with_max_number():
    # Test extremely large number returns datetime.max.date()
    from openai._utils._datetime_parse import MAX_NUMBER
    codeflash_output = parse_date(MAX_NUMBER + 1) # 1.82μs -> 1.80μs (1.45% faster)

def test_parse_date_with_min_number():
    # Test extremely small number returns datetime.min.date()
    from openai._utils._datetime_parse import MAX_NUMBER
    codeflash_output = parse_date(-(MAX_NUMBER + 1)) # 2.11μs -> 2.40μs (12.3% slower)

def test_parse_date_with_non_numeric_object():
    # Test passing a non-numeric, non-date, non-string object
    class Dummy: pass
    with pytest.raises(TypeError):
        parse_date(Dummy()) # 3.53μs -> 3.56μs (0.898% slower)

def test_parse_date_with_float_string():
    # Test float string as unix timestamp
    codeflash_output = parse_date("86400.0") # 8.48μs -> 8.55μs (0.795% slower)

def test_parse_date_with_leading_trailing_spaces():
    # Should fail: spaces not allowed in regex
    with pytest.raises(ValueError):
        parse_date(" 2023-06-15 ") # 4.35μs -> 4.29μs (1.45% faster)

def test_parse_date_with_non_ascii_bytes():
    # Should fail: non-ascii bytes
    with pytest.raises(ValueError):
        parse_date(b"\xff\xfe\xfd") # 8.94μs -> 8.41μs (6.33% faster)

def test_parse_date_with_none():
    # Should fail: None is not valid
    with pytest.raises(TypeError):
        parse_date(None) # 2.98μs -> 2.90μs (2.51% faster)

def test_parse_date_with_too_many_fields():
    # Should fail: too many fields in string
    with pytest.raises(ValueError):
        parse_date("2023-06-15-01") # 4.53μs -> 4.49μs (0.801% faster)

def test_parse_date_with_too_few_fields():
    # Should fail: too few fields in string
    with pytest.raises(ValueError):
        parse_date("2023-06") # 3.69μs -> 3.65μs (1.12% faster)

def test_parse_date_with_zero_month_day():
    # Should fail: month and day cannot be zero
    with pytest.raises(ValueError):
        parse_date("2023-00-00") # 8.22μs -> 6.45μs (27.5% faster)

def test_parse_date_with_negative_year():
    # Should fail: negative year
    with pytest.raises(ValueError):
        parse_date("-2023-06-15") # 3.96μs -> 3.82μs (3.80% faster)

def test_parse_date_with_float_input():
    # Test float input (unix timestamp)
    codeflash_output = parse_date(0.0) # 7.43μs -> 7.87μs (5.59% slower)

def test_parse_date_with_float_input_large():
    # Test large float input (milliseconds)
    codeflash_output = parse_date(20000000000.0) # 6.95μs -> 7.08μs (1.84% slower)

def test_parse_date_with_float_input_small():
    # Test small float input (seconds)
    codeflash_output = parse_date(20000000000) # 5.75μs -> 6.02μs (4.40% slower)

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_parse_date_bulk_iso_strings():
    # Test parsing 1000 ISO date strings
    for i in range(1, 1001):
        # All dates in 2023, month 1, day i % 28 + 1 (valid days)
        day = (i % 28) + 1
        s = f"2023-1-{day}"
        codeflash_output = parse_date(s) # 1.91ms -> 1.42ms (35.0% faster)

def test_parse_date_bulk_unix_timestamps():
    # Test parsing 1000 unix timestamps (days since epoch)
    for i in range(1000):
        ts = i * 86400  # days in seconds
        expected = date(1970, 1, 1) + timedelta(days=i)
        codeflash_output = parse_date(ts) # 1.46ms -> 1.46ms (0.460% faster)

def test_parse_date_bulk_bytes_iso_strings():
    # Test parsing 1000 bytes ISO date strings
    for i in range(1, 1001):
        day = (i % 28) + 1
        s = f"2023-1-{day}".encode()
        codeflash_output = parse_date(s) # 1.93ms -> 1.44ms (34.2% faster)

def test_parse_date_bulk_invalid_strings():
    # Test 1000 invalid strings
    for i in range(1000):
        s = f"invalid-{i}"
        with pytest.raises(ValueError):
            parse_date(s)


def test_parse_date_bulk_mixed_types():
    # Test 1000 mixed types: date, datetime, int, float, str
    for i in range(1, 201):
        # date
        d = date(2023, 1, (i % 28) + 1)
        codeflash_output = parse_date(d) # 50.9μs -> 51.6μs (1.32% slower)
        # datetime
        dt = datetime(2023, 1, (i % 28) + 1, 12, 0)
        codeflash_output = parse_date(dt)
        # int unix timestamp
        ts = (i - 1) * 86400 # 53.7μs -> 54.4μs (1.16% slower)
        expected = date(1970, 1, 1) + timedelta(days=i - 1)
        codeflash_output = parse_date(ts)
        # float unix timestamp
        codeflash_output = parse_date(float(ts))
        # ISO string
        s = f"2023-1-{(i % 28) + 1}" # 309μs -> 310μs (0.358% slower)
        codeflash_output = parse_date(s)
        # bytes ISO string
        codeflash_output = parse_date(s.encode()) # 319μs -> 318μs (0.133% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from openai._utils._compat import parse_date

def test_parse_date():
    parse_date(0)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_n962nf66/tmpu_9qok1p/test_concolic_coverage.py::test_parse_date 6.48μs 7.00μs -7.45%⚠️

To edit these changes git checkout codeflash/optimize-parse_date-mhe3deav and push.

Codeflash Static Badge

The optimization replaces the dictionary comprehension `{k: int(v) for k, v in match.groupdict().items()}` with direct group access using `match.group()`. Instead of creating a dictionary with named keys and then unpacking it with `**kw`, the code now directly extracts the year, month, and day values and passes them as positional arguments to the `date()` constructor.

This eliminates several performance bottlenecks:
- **Dictionary creation overhead**: The original code creates a temporary dictionary with string keys
- **Dictionary iteration**: `match.groupdict().items()` requires iterating over all matched groups
- **Keyword argument unpacking**: `**kw` adds overhead when calling the constructor

The line profiler shows the optimization's impact: the original dictionary comprehension took 7.99ms (16.6% of total time), while the optimized version spreads this work across three simpler lines that total about 5.27ms (11.4% of total time) - a ~34% reduction in this hot code path.

The optimization is particularly effective for string parsing workloads, as shown in the test results where ISO date string parsing sees 24-35% speedups. The regex matching itself (`date_re.match()`) remains unchanged, so the benefit comes purely from more efficient post-processing of the match results.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 31, 2025 00:05
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant