Skip to content

Conversation

@mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Nov 27, 2025

PR Type

Enhancement, Tests


Description

  • Add structured test diff generation

  • Capture pytest failures from stdout

  • Improve Levenshtein performance optimizations

  • Integrate mismatch handling in optimizer


Diagram Walkthrough

flowchart LR
  ParseStdout["Parse pytest stdout failures"] -- "annotate" --> TestResults["TestResults.test_failures"]
  Compare["compare_test_results returns (match, diffs)"] -- "used by" --> Optimizer["FunctionOptimizer.run_optimized_candidate"]
  Optimizer -- "on mismatch" --> Feedback["Handle >50% mismatches or attempt fix"]
  CSTExtract["Extract test source via libcst"] -- "enrich" --> Compare
  LevOpt["Levenshtein optimizations"] -- "faster distance" --> Core["Core utilities"]
Loading

File Walkthrough

Relevant files
Enhancement
functions_to_optimize.py
Optimize Levenshtein distance implementation                         

codeflash/discovery/functions_to_optimize.py

  • Add early exits for empty strings
  • Use lists for fast indexed access
  • Reuse arrays and local vars for cache locality
  • Simplify min computations and swap buffers
+30/-12 
models.py
Enrich models with test source and failures map                   

codeflash/models/models.py

  • Import libcst for CST parsing
  • Add methods to locate tests and get source
  • Extend TestResults with test_failures mapping
+29/-0   
function_optimizer.py
Integrate diff-aware comparison into optimizer flow           

codeflash/optimization/function_optimizer.py

  • Compare behavior with detailed diffs
  • Add helper to return unified mismatch Failure
  • Gate feedback loop by mismatch percentage
+21/-4   
equivalence.py
Produce structured diffs from test result comparison         

codeflash/verification/equivalence.py

  • Introduce TestDiffScope enum and TestDiff dataclass
  • Return (match, diffs) from comparison
  • Capture return/stdout/pass mismatches with context
  • Include pytest error and test source in diffs
+61/-24 
parse_test_output.py
Extract pytest failure details from stdout                             

codeflash/verification/parse_test_output.py

  • Parse pytest stdout to map test failures
  • Attach failures map to TestResults
  • Safe parsing with exception handling
+42/-0   
Tests
test_codeflash_capture.py
E2E test for structured diffs and mismatch handling           

tests/test_codeflash_capture.py

  • Add end-to-end test covering diff output
  • Validate non-matching results return diffs
  • Exercise optimizer-like flow with instrumented tests
+233/-0 

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Robustness

Accessing candidate_results.test_failures without guarding against None may raise an AttributeError; ensure test_failures is initialized or defaulted before use.

def compare_test_results(original_results: TestResults, candidate_results: TestResults) -> tuple[bool, list[TestDiff]]:
    # This is meant to be only called with test results for the first loop index
    if len(original_results) == 0 or len(candidate_results) == 0:
        return False, []  # empty test results are not equal
    original_recursion_limit = sys.getrecursionlimit()
    if original_recursion_limit < INCREASED_RECURSION_LIMIT:
        sys.setrecursionlimit(INCREASED_RECURSION_LIMIT)  # Increase recursion limit to avoid RecursionError
    test_ids_superset = original_results.get_all_unique_invocation_loop_ids().union(
        set(candidate_results.get_all_unique_invocation_loop_ids())
    )
    test_diffs: list[TestDiff] = []
    did_all_timeout: bool = True
    for test_id in test_ids_superset:
        original_test_result = original_results.get_by_unique_invocation_loop_id(test_id)
        cdd_test_result = candidate_results.get_by_unique_invocation_loop_id(test_id)
        candidate_pytest_error = candidate_results.test_failures.get(original_test_result.id.test_function_name)
        if cdd_test_result is not None and original_test_result is None:
            continue
API Change

compare_test_results now returns (match, diffs) but some other call sites may still expect a bool; verify all usages are updated to handle the tuple and the diffs list.

match, diffs = compare_test_results(baseline_results.behavior_test_results, candidate_behavior_results)
if match:
    logger.info("h3|Test results matched ✅")
    console.rule()
else:
    result_unmatched_perc = len(diffs) / len(candidate_behavior_results)
    if result_unmatched_perc > 0.5:
        # if the test unmatched percentage is greater than 50%, we can't fix it
        return self.get_results_not_matched_error()

    # with the parsed test results diff ask the llm to fix the candidate to match the test results of the original code, and run again
    # self.run_optimized_candidate(
    #     optimization_candidate_index=optimization_candidate_index,
    #     baseline_results=baseline_results,
    #     original_helper_code=original_helper_code,
    #     file_path_to_helper_classes=file_path_to_helper_classes,
    # )
    print(f"should try to fix it, diffs: {diffs}")
    return self.get_results_not_matched_error()
Parsing Errors

libcst-based get_src_code lacks error handling for parse failures and invalid files; consider try/except and returning None with logging to avoid crashes.

def get_src_code(self, test_path: Path) -> Optional[str]:
    test_src = test_path.read_text(encoding="utf-8")
    module_node = cst.parse_module(test_src)

    if self.test_class_name:
        for stmt in module_node.body:
            if isinstance(stmt, cst.ClassDef) and stmt.name.value == self.test_class_name:
                func_node = self.find_func_in_class(stmt, self.test_function_name)
                if func_node:
                    return module_node.code_for_node(func_node).strip()
        # class not found
        return None

    # Otherwise, look for a top level function
    for stmt in module_node.body:
        if isinstance(stmt, cst.FunctionDef) and stmt.name.value == self.test_function_name:
            return module_node.code_for_node(stmt).strip()
    return None

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Safely access optional mapping

Guard access to test_failures since it can be None and avoid AttributeError. Also
handle missing keys safely to keep comparison robust when no failures were parsed.

codeflash/verification/equivalence.py [43]

-candidate_pytest_error = candidate_results.test_failures.get(original_test_result.id.test_function_name)
+candidate_pytest_error = None
+if getattr(candidate_results, "test_failures", None):
+    candidate_pytest_error = candidate_results.test_failures.get(original_test_result.id.test_function_name)
Suggestion importance[1-10]: 8

__

Why: test_failures is declared Optional in TestResults, so direct .get can raise if None; guarding prevents an AttributeError and aligns with new parsing logic.

Medium
Ensure recursion limit restoration

Preserve the recursion limit restoration even on early returns to avoid leaving the
process with a higher limit. Move recursion limit increase before any early return
or ensure restoration in all paths.

codeflash/verification/equivalence.py [30-35]

 if len(original_results) == 0 or len(candidate_results) == 0:
-    return False, []  # empty test results are not equal
+    return False, []
+original_recursion_limit = sys.getrecursionlimit()
+try:
+    if original_recursion_limit < INCREASED_RECURSION_LIMIT:
+        sys.setrecursionlimit(INCREASED_RECURSION_LIMIT)
+    # ... rest of the function body unchanged ...
+finally:
+    sys.setrecursionlimit(original_recursion_limit)
Suggestion importance[1-10]: 6

__

Why: Early return before saving/restoring the recursion limit can skip restoration if that logic ever moves; wrapping with try/finally improves robustness though current early return happens before any change.

Low
General
Use logger instead of print

Replace print with the existing logger to keep consistent output handling and avoid
noisy stdout in library code. Log the exception with traceback for better
diagnostics.

codeflash/verification/equivalence.py [76-87]

 try:
-    print(
-        f"File Name: {original_test_result.file_name}\n"
-        f"Test Type: {original_test_result.test_type}\n"
-        f"Verification Type: {original_test_result.verification_type}\n"
-        f"Invocation ID: {original_test_result.id}\n"
-        f"Original return value: {original_test_result.return_value}\n"
-        f"Candidate return value: {cdd_test_result.return_value}\n"
+    logger.debug(
+        "File Name: %s\nTest Type: %s\nVerification Type: %s\nInvocation ID: %s\nOriginal return value: %r\nCandidate return value: %r",
+        original_test_result.file_name,
+        original_test_result.test_type,
+        original_test_result.verification_type,
+        original_test_result.id,
+        original_test_result.return_value,
+        cdd_test_result.return_value,
     )
-except Exception as e:
-    logger.error(e)
+except Exception:
+    logger.exception("Failed to log return value comparison details")
 break
Suggestion importance[1-10]: 7

__

Why: Replacing print with logger.debug/exception keeps output consistent and avoids noisy stdout; the improved code accurately mirrors the existing block’s intent with better diagnostics.

Medium

@mohammedahmed18 mohammedahmed18 marked this pull request as draft November 27, 2025 14:27
Comment on lines 338 to 343
x = prev[index1]
y = prev[index1 + 1]
z = curr[index1]
min_xy = min(x, y)
min_xyz = min(z, min_xy)
curr[index1 + 1] = 1 + min_xyz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 73% (0.73x) speedup for levenshtein_distance in codeflash/discovery/functions_to_optimize.py

⏱️ Runtime : 2.04 seconds 1.18 seconds (best of 8 runs)

📝 Explanation and details

The optimized version achieves a 73% speedup by eliminating Python's built-in min() function calls and replacing them with direct comparisons. This is a targeted micro-optimization that addresses one of the most expensive operations in the Levenshtein distance algorithm.

Key optimization:

  • Replaced min() calls with direct comparisons: The original code used min(x, y) and min(z, min_xy) which create temporary tuples and invoke Python's generic minimum function. The optimized version uses nested if statements to find the minimum value directly, avoiding function call overhead and tuple creation.

Why this provides a speedup:

  • The min() function in Python has significant overhead for small numbers of arguments, especially when called millions of times in nested loops
  • Direct comparisons (if x < y) are primitive operations that execute much faster than function calls
  • Eliminates temporary tuple creation that min() uses internally
  • Reduces the call stack depth in the inner loop

Performance impact by test case type:

  • Identical/similar strings: 55-65% faster - benefits from reduced overhead in character matching paths
  • Completely different strings: 109-121% faster - maximizes benefit since every character comparison triggers the min() replacement logic
  • Large strings with many differences: 83-93% faster - compounds the per-operation savings across many iterations
  • Small strings: 15-50% faster - still benefits but overhead reduction is less pronounced

The optimization is particularly effective for the Levenshtein algorithm because the min() operation occurs in the innermost loop that executes O(n×m) times, making even small per-call improvements significant when multiplied across all iterations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 148 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 96.6%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest  # used for our unit tests

from codeflash.discovery.functions_to_optimize import levenshtein_distance

# unit tests

# 1. Basic Test Cases


def test_identical_strings():
    # Levenshtein distance between identical strings should be 0
    codeflash_output = levenshtein_distance("kitten", "kitten")  # 15.8μs -> 9.90μs (60.0% faster)
    codeflash_output = levenshtein_distance("", "")  # 480ns -> 441ns (8.84% faster)
    codeflash_output = levenshtein_distance("a", "a")  # 2.09μs -> 1.95μs (7.22% faster)


def test_single_insertion():
    # Inserting one character
    codeflash_output = levenshtein_distance("kitten", "kitte")  # 13.0μs -> 8.69μs (49.4% faster)
    codeflash_output = levenshtein_distance("kitte", "kitten")  # 10.1μs -> 6.12μs (65.6% faster)
    codeflash_output = levenshtein_distance("", "a")  # 421ns -> 421ns (0.000% faster)
    codeflash_output = levenshtein_distance("a", "")  # 360ns -> 361ns (0.277% slower)


def test_single_deletion():
    # Deleting one character
    codeflash_output = levenshtein_distance("kitten", "kittn")  # 12.6μs -> 8.52μs (48.5% faster)
    codeflash_output = levenshtein_distance("kittn", "kitten")  # 10.0μs -> 5.96μs (67.9% faster)


def test_single_substitution():
    # Substituting one character
    codeflash_output = levenshtein_distance("kitten", "sitten")  # 14.8μs -> 9.26μs (60.2% faster)
    codeflash_output = levenshtein_distance("kitten", "kitteb")  # 11.7μs -> 6.81μs (72.4% faster)
    codeflash_output = levenshtein_distance("a", "b")  # 2.22μs -> 1.89μs (17.5% faster)


def test_multiple_operations():
    # Multiple edits required
    codeflash_output = levenshtein_distance("kitten", "sitting")  # 16.4μs -> 10.3μs (58.8% faster)
    codeflash_output = levenshtein_distance("flaw", "lawn")  # 6.70μs -> 4.47μs (50.0% faster)


def test_empty_and_nonempty():
    # One string empty, one non-empty
    codeflash_output = levenshtein_distance("", "abc")  # 751ns -> 751ns (0.000% faster)
    codeflash_output = levenshtein_distance("abc", "")  # 431ns -> 451ns (4.43% slower)


# 2. Edge Test Cases


def test_both_empty():
    # Both strings are empty
    codeflash_output = levenshtein_distance("", "")  # 781ns -> 761ns (2.63% faster)


def test_one_char_vs_empty():
    # One string is a single character, other is empty
    codeflash_output = levenshtein_distance("a", "")  # 771ns -> 781ns (1.28% slower)
    codeflash_output = levenshtein_distance("", "z")  # 431ns -> 441ns (2.27% slower)


def test_case_sensitivity():
    # Case should matter
    codeflash_output = levenshtein_distance("abc", "Abc")  # 7.70μs -> 5.87μs (31.1% faster)
    codeflash_output = levenshtein_distance("ABC", "abc")  # 5.14μs -> 3.73μs (37.9% faster)


def test_unicode_characters():
    # Unicode characters
    codeflash_output = levenshtein_distance("café", "cafe")  # 9.39μs -> 6.81μs (37.8% faster)
    codeflash_output = levenshtein_distance("naïve", "naive")  # 9.85μs -> 5.75μs (71.3% faster)
    codeflash_output = levenshtein_distance("你好", "你")  # 3.12μs -> 2.81μs (10.7% faster)
    codeflash_output = levenshtein_distance("你好", "您好")  # 3.10μs -> 2.71μs (14.5% faster)


def test_completely_different_strings():
    # No characters in common
    codeflash_output = levenshtein_distance("abc", "xyz")  # 7.45μs -> 5.61μs (32.9% faster)
    codeflash_output = levenshtein_distance("123", "abc")  # 5.14μs -> 3.46μs (48.7% faster)


def test_prefix_and_suffix():
    # One string is a prefix or suffix of the other
    codeflash_output = levenshtein_distance("abc", "abcd")  # 7.88μs -> 6.11μs (29.0% faster)
    codeflash_output = levenshtein_distance("abcd", "abc")  # 5.18μs -> 3.78μs (37.1% faster)
    codeflash_output = levenshtein_distance("abc", "zabc")  # 5.23μs -> 3.41μs (53.6% faster)
    codeflash_output = levenshtein_distance("abc", "abcz")  # 4.87μs -> 3.19μs (52.8% faster)


def test_repeated_characters():
    # Strings with repeated characters
    codeflash_output = levenshtein_distance("aaa", "aaaa")  # 4.89μs -> 4.79μs (2.11% faster)
    codeflash_output = levenshtein_distance("aaaa", "aaa")  # 2.92μs -> 3.06μs (4.89% slower)
    codeflash_output = levenshtein_distance("aaa", "bbb")  # 5.54μs -> 3.56μs (55.7% faster)


def test_numbers_and_symbols():
    # Strings with digits and symbols
    codeflash_output = levenshtein_distance("1234", "1243")  # 8.68μs -> 6.73μs (28.9% faster)
    codeflash_output = levenshtein_distance("!@#$", "!@#")  # 5.76μs -> 4.13μs (39.6% faster)
    codeflash_output = levenshtein_distance("!@#$", "$#@!")  # 6.25μs -> 4.45μs (40.5% faster)


def test_long_identical_strings():
    # Long identical strings (edge, but also performance)
    s = "a" * 100
    codeflash_output = levenshtein_distance(s, s)  # 519μs -> 535μs (2.86% slower)


def test_long_strings_one_difference():
    # Long strings with one difference at the end
    s1 = "a" * 999 + "b"
    s2 = "a" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 60.1ms -> 59.3ms (1.27% faster)
    codeflash_output = levenshtein_distance(s2, s1)  # 60.3ms -> 59.7ms (1.11% faster)


def test_long_strings_completely_different():
    # Long completely different strings
    s1 = "a" * 500
    s2 = "b" * 500
    codeflash_output = levenshtein_distance(s1, s2)  # 67.1ms -> 30.4ms (121% faster)


# 3. Large Scale Test Cases


def test_large_equal_strings():
    # Large identical strings
    s = "abcde" * 200  # length 1000
    codeflash_output = levenshtein_distance(s, s)  # 242ms -> 114ms (111% faster)


def test_large_one_insertion():
    # Large string with one insertion
    s1 = "a" * 500 + "b" + "a" * 499  # length 1000
    s2 = "a" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 58.2ms -> 56.2ms (3.59% faster)


def test_large_one_substitution():
    # Large string with one substitution in the middle
    s1 = "a" * 499 + "b" + "a" * 500
    s2 = "a" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 57.9ms -> 57.2ms (1.16% faster)


def test_large_completely_different():
    # Large strings, all substitutions
    s1 = "a" * 1000
    s2 = "b" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 274ms -> 129ms (112% faster)


def test_large_half_and_half():
    # Half the string is the same, half is different
    s1 = "a" * 500 + "b" * 500
    s2 = "a" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 171ms -> 93.5ms (83.5% faster)


def test_large_with_unicode():
    # Large string with unicode characters
    s1 = "你" * 500 + "好" * 500
    s2 = "你" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 174ms -> 96.3ms (81.0% faster)


# 4. Additional Robustness Cases


@pytest.mark.parametrize(
    "s1,s2,expected",
    [
        ("", "", 0),
        ("", "abc", 3),
        ("abc", "", 3),
        ("abc", "abc", 0),
        ("abc", "ab", 1),
        ("a", "b", 1),
        ("", "a", 1),
        ("a", "", 1),
        ("kitten", "sitting", 3),
        ("flaw", "lawn", 2),
        ("intention", "execution", 5),
        ("distance", "difference", 5),
        ("abcdef", "azced", 3),
        ("short", "ports", 3),
    ],
)
def test_various_cases(s1, s2, expected):
    # Parametrized test for various scenarios
    codeflash_output = levenshtein_distance(s1, s2)  # 130μs -> 85.5μs (52.5% faster)


# 5. Commutativity property (Levenshtein distance is symmetric)
def test_commutativity():
    pairs = [
        ("kitten", "sitting"),
        ("flaw", "lawn"),
        ("abc", "xyz"),
        ("", "abc"),
        ("a" * 500, "b" * 500),
        ("abcde" * 100, "edcba" * 100),
    ]
    for s1, s2 in pairs:
        codeflash_output = levenshtein_distance(s1, s2)
        d1 = codeflash_output  # 126ms -> 58.6ms (116% faster)
        codeflash_output = levenshtein_distance(s2, s1)
        d2 = codeflash_output  # 126ms -> 58.8ms (115% faster)


# 6. Triangle inequality property
def test_triangle_inequality():
    # For Levenshtein distance, d(x,z) <= d(x,y) + d(y,z)
    triples = [("kitten", "sitting", "sittin"), ("abc", "abd", "ab"), ("a" * 100, "a" * 99 + "b", "a" * 99 + "c")]
    for x, y, z in triples:
        codeflash_output = levenshtein_distance(x, z)
        d_xz = codeflash_output  # 557μs -> 537μs (3.89% faster)
        codeflash_output = levenshtein_distance(x, y)
        d_xy = codeflash_output  # 553μs -> 532μs (3.98% faster)
        codeflash_output = levenshtein_distance(y, z)
        d_yz = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

# imports
import pytest  # used for our unit tests

from codeflash.discovery.functions_to_optimize import levenshtein_distance

# unit tests


# 1. Basic Test Cases
def test_identical_strings():
    # Identical strings should have distance 0
    codeflash_output = levenshtein_distance("kitten", "kitten")  # 14.4μs -> 9.29μs (55.1% faster)
    codeflash_output = levenshtein_distance("", "")  # 611ns -> 521ns (17.3% faster)
    codeflash_output = levenshtein_distance("a", "a")  # 2.03μs -> 1.98μs (2.52% faster)


def test_single_insertion():
    # One insertion required
    codeflash_output = levenshtein_distance("kitten", "kittena")  # 16.1μs -> 9.74μs (65.7% faster)
    codeflash_output = levenshtein_distance("abc", "abcd")  # 5.73μs -> 3.86μs (48.6% faster)


def test_single_deletion():
    # One deletion required
    codeflash_output = levenshtein_distance("kitten", "kittn")  # 12.9μs -> 8.69μs (49.0% faster)
    codeflash_output = levenshtein_distance("abcd", "abc")  # 5.71μs -> 4.03μs (41.8% faster)


def test_single_substitution():
    # One substitution required
    codeflash_output = levenshtein_distance("kitten", "kittan")  # 14.5μs -> 9.22μs (57.3% faster)
    codeflash_output = levenshtein_distance("abc", "adc")  # 4.67μs -> 3.47μs (34.7% faster)


def test_multiple_operations():
    # Multiple operations needed
    codeflash_output = levenshtein_distance("kitten", "sitting")  # 16.6μs -> 10.1μs (65.1% faster)
    codeflash_output = levenshtein_distance("flaw", "lawn")  # 6.70μs -> 4.50μs (49.0% faster)
    codeflash_output = levenshtein_distance("gumbo", "gambol")  # 10.7μs -> 6.22μs (72.6% faster)


def test_case_sensitivity():
    # Should be case-sensitive
    codeflash_output = levenshtein_distance("a", "A")  # 4.12μs -> 3.55μs (16.1% faster)
    codeflash_output = levenshtein_distance("Python", "python")  # 13.1μs -> 7.71μs (69.8% faster)


def test_completely_different_strings():
    # All characters different
    codeflash_output = levenshtein_distance("abc", "xyz")  # 7.57μs -> 5.60μs (35.2% faster)
    codeflash_output = levenshtein_distance("aaa", "bbb")  # 4.95μs -> 3.26μs (52.0% faster)


# 2. Edge Test Cases


def test_empty_strings():
    # One or both strings empty
    codeflash_output = levenshtein_distance("", "abc")  # 822ns -> 751ns (9.45% faster)
    codeflash_output = levenshtein_distance("abc", "")  # 441ns -> 460ns (4.13% slower)
    codeflash_output = levenshtein_distance("", "")  # 290ns -> 321ns (9.66% slower)


def test_one_character_strings():
    # Single character to/from empty or another char
    codeflash_output = levenshtein_distance("a", "")  # 742ns -> 771ns (3.76% slower)
    codeflash_output = levenshtein_distance("", "a")  # 431ns -> 411ns (4.87% faster)
    codeflash_output = levenshtein_distance("a", "b")  # 3.80μs -> 3.29μs (15.5% faster)


def test_unicode_strings():
    # Unicode and multi-byte characters
    codeflash_output = levenshtein_distance("café", "cafe")  # 9.28μs -> 6.86μs (35.2% faster)
    codeflash_output = levenshtein_distance("你好", "你们好")  # 4.51μs -> 3.69μs (22.3% faster)
    codeflash_output = levenshtein_distance("🙂", "🙃")  # 2.33μs -> 2.08μs (12.0% faster)
    codeflash_output = levenshtein_distance("a🙂b", "a🙃b")  # 4.81μs -> 3.54μs (36.0% faster)


def test_whitespace_and_special_chars():
    # Strings with whitespace and special characters
    codeflash_output = levenshtein_distance("a b", "ab")  # 6.26μs -> 5.17μs (21.1% faster)
    codeflash_output = levenshtein_distance("a_b", "a-b")  # 5.12μs -> 3.48μs (47.3% faster)
    codeflash_output = levenshtein_distance("hello!", "hello")  # 10.1μs -> 5.99μs (68.2% faster)


def test_long_repeated_chars():
    # Strings with repeated characters
    codeflash_output = levenshtein_distance("aaaaa", "aaaa")  # 5.47μs -> 5.39μs (1.48% faster)
    codeflash_output = levenshtein_distance("aaaaa", "bbbbb")  # 10.9μs -> 6.39μs (71.0% faster)


def test_palindromes_and_reverses():
    # Palindrome and reversed strings
    codeflash_output = levenshtein_distance("abcde", "edcba")  # 11.9μs -> 7.68μs (54.8% faster)


def test_large_difference_in_length():
    # One string much longer than the other
    codeflash_output = levenshtein_distance("a", "a" * 100)  # 25.4μs -> 25.7μs (1.09% slower)
    codeflash_output = levenshtein_distance("b" * 100, "b")  # 23.3μs -> 23.4μs (0.474% slower)


def test_strings_with_numbers():
    # Strings with numbers
    codeflash_output = levenshtein_distance("abc123", "abc124")  # 14.5μs -> 9.02μs (60.9% faster)
    codeflash_output = levenshtein_distance("12345", "54321")  # 9.13μs -> 5.82μs (56.8% faster)


# 3. Large Scale Test Cases


def test_large_identical_strings():
    # Large identical strings should have distance 0
    s = "a" * 500
    codeflash_output = levenshtein_distance(s, s)  # 13.9ms -> 13.5ms (2.37% faster)


def test_large_one_insertion():
    # Large string with one insertion
    s1 = "a" * 499
    s2 = "a" * 250 + "b" + "a" * 249
    codeflash_output = levenshtein_distance(s1, s2)  # 13.8ms -> 13.6ms (1.61% faster)


def test_large_one_deletion():
    # Large string with one deletion
    s1 = "a" * 500
    s2 = "a" * 499
    codeflash_output = levenshtein_distance(s1, s2)  # 13.7ms -> 13.5ms (1.69% faster)


def test_large_one_substitution():
    # Large string with one substitution in the middle
    s1 = "a" * 250 + "b" + "a" * 249
    s2 = "a" * 500
    codeflash_output = levenshtein_distance(s1, s2)  # 13.9ms -> 13.5ms (2.27% faster)


def test_large_completely_different():
    # Large strings, all characters different
    s1 = "a" * 500
    s2 = "b" * 500
    codeflash_output = levenshtein_distance(s1, s2)  # 67.2ms -> 30.7ms (119% faster)


def test_large_partial_overlap():
    # Large strings with partial overlap
    s1 = "a" * 250 + "b" * 250
    s2 = "a" * 200 + "b" * 300
    # 50 a's replaced with b's
    codeflash_output = levenshtein_distance(s1, s2)  # 41.7ms -> 21.7ms (92.6% faster)


def test_large_strings_with_unicode():
    # Large strings with unicode characters
    s1 = "é" * 500
    s2 = "e" * 500
    codeflash_output = levenshtein_distance(s1, s2)  # 67.2ms -> 30.4ms (121% faster)


def test_large_strings_with_alternating_chars():
    # Alternating characters
    s1 = "ab" * 250
    s2 = "ba" * 250
    # Each position is different except for the middle if even length
    codeflash_output = levenshtein_distance(s1, s2)  # 41.5ms -> 21.5ms (92.9% faster)


# 4. Additional Edge Cases


def test_nonequivalent_lengths_and_content():
    # Both length and content differ
    codeflash_output = levenshtein_distance("abcdefg", "xyz")  # 12.9μs -> 8.40μs (53.8% faster)


def test_substring():
    # One string is a substring of the other
    codeflash_output = levenshtein_distance("abcdef", "abc")  # 9.93μs -> 7.42μs (33.7% faster)
    codeflash_output = levenshtein_distance("abc", "abcdef")  # 7.66μs -> 4.98μs (53.7% faster)


def test_strings_with_tabs_and_newlines():
    # Special whitespace characters
    codeflash_output = levenshtein_distance("abc\tdef", "abcdef")  # 16.8μs -> 10.3μs (62.8% faster)
    codeflash_output = levenshtein_distance("abc\ndef", "abcdef")  # 13.7μs -> 7.80μs (76.0% faster)


def test_zero_length_and_long_string():
    # One empty, one long
    codeflash_output = levenshtein_distance("", "a" * 999)  # 912ns -> 811ns (12.5% faster)
    codeflash_output = levenshtein_distance("b" * 999, "")  # 631ns -> 541ns (16.6% faster)


# 5. Determinism and Symmetry


@pytest.mark.parametrize(
    "s1,s2",
    [
        ("kitten", "sitting"),
        ("flaw", "lawn"),
        ("", "abc"),
        ("abc", ""),
        ("abc", "cba"),
        ("abc", "abc"),
        ("", ""),
        ("a", "b"),
        ("abc123", "abc124"),
        ("a" * 500, "a" * 500),
    ],
)
def test_symmetry(s1, s2):
    # Levenshtein distance is symmetric
    codeflash_output = levenshtein_distance(s1, s2)  # 13.8ms -> 13.5ms (1.90% faster)


# 6. Type robustness


def test_non_string_inputs():
    # Should raise TypeError if input is not string
    with pytest.raises(TypeError):
        levenshtein_distance(123, "abc")
    with pytest.raises(TypeError):
        levenshtein_distance("abc", None)
    with pytest.raises(TypeError):
        levenshtein_distance(["a", "b"], "ab")
    with pytest.raises(TypeError):
        levenshtein_distance("ab", ["a", "b"])


# 7. Stress test: Large but feasible within constraints


def test_large_strings_max_size():
    # Both strings at the upper limit (1000 chars)
    s1 = "a" * 1000
    s2 = "b" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 272ms -> 130ms (109% faster)


def test_large_strings_one_char_difference():
    # 999 identical, 1 different
    s1 = "a" * 999 + "b"
    s2 = "a" * 1000
    codeflash_output = levenshtein_distance(s1, s2)  # 58.4ms -> 57.5ms (1.56% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr945-2025-11-27T14.39.26

Click to see suggested changes
Suggested change
x = prev[index1]
y = prev[index1 + 1]
z = curr[index1]
min_xy = min(x, y)
min_xyz = min(z, min_xy)
curr[index1 + 1] = 1 + min_xyz
# Avoid min() function call overhead by using direct comparisons
x = prev[index1]
y = prev[index1 + 1]
z = curr[index1]
if x < y:
if x < z:
curr[index1 + 1] = 1 + x
else:
curr[index1 + 1] = 1 + z
elif y < z:
curr[index1 + 1] = 1 + y
else:
curr[index1 + 1] = 1 + z

Static Badge

The optimized code achieves a **15% speedup** through several targeted micro-optimizations that reduce computational overhead in the parsing loop:

**Key Optimizations:**

1. **Single-pass boundary search**: Instead of checking both conditions (`start_line != -1 and end_line != -1`) on every iteration, the optimized version uses `None` values and breaks immediately when both markers are found, eliminating redundant condition checks.

2. **Fast-path string matching**: Before calling the expensive `.startswith("_______")` method, it first checks if `line[0] == "_"`, avoiding the method call for most lines that don't start with underscores.

3. **Method lookup optimization**: Pulls `current_failure_lines.append` into a local variable to avoid repeated attribute lookups in the hot loop where failure lines are processed.

4. **Memory-efficient list management**: Uses `current_failure_lines.clear()` instead of creating new list objects (`current_failure_lines = []`), reducing object allocation pressure.

**Performance Impact:**
The optimizations show the most significant gains in large-scale scenarios:
- **Large failure sets**: 14.2% faster with 500 failures, 14.0% faster with 999 failures  
- **Large output**: 29.2% faster for single failures with 1000 lines of output
- **Complex scenarios**: 22.3% faster with 50 cases having 10 lines each

**Hot Path Context:**
Based on the function reference, `parse_test_failures_from_stdout` is called from `parse_test_results`, which appears to be part of a test optimization pipeline. The function processes pytest stdout to extract failure information, making it performance-critical when dealing with large test suites or verbose test outputs. The 15% improvement becomes meaningful when processing hundreds of test failures in CI/CD environments or during iterative code optimization workflows.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 27, 2025

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for parse_test_failures_from_stdout in codeflash/verification/parse_test_output.py

⏱️ Runtime : 2.76 milliseconds 2.39 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch feat/feedback-loop-for-unmatched-test-results).

Static Badge

…25-11-27T14.49.01

⚡️ Speed up function `parse_test_failures_from_stdout` by 16% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 27, 2025

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 27, 2025

⚡️ Codeflash found optimizations for this PR

📄 655% (6.55x) speedup for compare_test_results in codeflash/verification/equivalence.py

⏱️ Runtime : 90.0 milliseconds 11.9 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch feat/feedback-loop-for-unmatched-test-results).

Static Badge

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Dec 2, 2025

This PR is now faster! 🚀 mohammed ahmed accepted my code suggestion above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants