Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented May 19, 2025

⚡️ This pull request contains optimizations for PR #217

If you approve this dependent PR, these changes will be merged into the original PR branch proper-cleanup.

This PR will be automatically closed if the original PR is merged.


📄 116% (1.16x) speedup for _strip_ansi in codeflash/code_utils/tabulate.py

⏱️ Runtime : 4.98 milliseconds 2.30 milliseconds (best of 164 runs)

📝 Explanation and details

Here is an optimized version of _strip_ansi focusing on runtime speed. The main bottleneck is the regular expression replacement in re.sub, specifically returning the (possibly empty) group 4 (link text in hyperlinks) or, if it doesn’t match, an empty string for ANSI codes.
This can be significantly sped up by avoiding the costly r"\4" (which always triggers group resolution machinery in the regex engine), and instead using a faster replacer callback.
Since every match will be either an ANSI escape code (where group 4 is None) or a hyperlink (where group 4 contains the visible link text), we can handle both cases in one simple function.

Optimized code.

Performance notes:

  • This avoids all regex group substitution machinery for the common ANSI case.
  • No change to visible/functional behavior.
  • No changes to external function names or signatures.
  • String and bytes cases are handled separately, so no unnecessary type checks inside tight loops.

Comment:
No original comments were changed or removed.
No changes made to the public interface or expected output.
All logic concerning group 4 and escape sequence removal is preserved.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 58 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage
🌀 Generated Regression Tests Details
import re

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.tabulate import _strip_ansi

# function to test
_esc = r"\x1b"
_csi = rf"{_esc}\["
_osc = rf"{_esc}\]"
_st = rf"{_esc}\\"

_ansi_escape_pat = rf"""
    (
        # terminal colors, etc
        {_csi}        # CSI
        [\x30-\x3f]*  # parameter bytes
        [\x20-\x2f]*  # intermediate bytes
        [\x40-\x7e]   # final byte
    |
        # terminal hyperlinks
        {_osc}8;        # OSC opening
        (\w+=\w+:?)*    # key=value params list (submatch 2)
        ;               # delimiter
        ([^{_esc}]+)    # URI - anything but ESC (submatch 3)
        {_st}           # ST
        ([^{_esc}]+)    # link text - anything but ESC (submatch 4)
        {_osc}8;;{_st}  # "closing" OSC sequence
    )
"""
_ansi_codes = re.compile(_ansi_escape_pat, re.VERBOSE)
_ansi_codes_bytes = re.compile(_ansi_escape_pat.encode("utf8"), re.VERBOSE)
from codeflash.code_utils.tabulate import _strip_ansi

# unit tests

# ----------------------
# Basic Test Cases
# ----------------------

def test_strip_ansi_no_ansi():
    # No ANSI codes, should return unchanged
    codeflash_output = _strip_ansi("hello world")

def test_strip_ansi_simple_color():
    # Simple color code
    codeflash_output = _strip_ansi("\x1b[31mred\x1b[0m")

def test_strip_ansi_multiple_codes():
    # Multiple ANSI codes in one string
    s = "\x1b[1mBold\x1b[0m and \x1b[4mUnderline\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_surrounding_text():
    # ANSI codes surrounded by normal text
    s = "before \x1b[32mgreen\x1b[0m after"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_bytes_input():
    # Accepts bytes input
    s = b"\x1b[31mred\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_mixed_codes():
    # Mixed color and style codes
    s = "\x1b[1;31mBoldRed\x1b[22;39m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_multiple_adjacent_codes():
    # Multiple adjacent ANSI codes
    s = "\x1b[31m\x1b[1mredbold\x1b[0m"
    codeflash_output = _strip_ansi(s)

# ----------------------
# Edge Test Cases
# ----------------------

def test_strip_ansi_empty_string():
    # Empty string input
    codeflash_output = _strip_ansi("")
    codeflash_output = _strip_ansi(b"")

def test_strip_ansi_only_ansi():
    # String is only ANSI code(s)
    codeflash_output = _strip_ansi("\x1b[31m\x1b[0m")
    codeflash_output = _strip_ansi(b"\x1b[31m\x1b[0m")

def test_strip_ansi_incomplete_ansi():
    # Incomplete/invalid ANSI code should not be stripped
    s = "\x1b[31red"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_osc_hyperlink():
    # OSC 8 hyperlink sequence
    s = "\x1b]8;;https://example.com\x1b\\Link text\x1b]8;;\x1b\\"
    # Should output "Link text"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_osc_hyperlink_with_params():
    # OSC 8 hyperlink with params
    s = "\x1b]8;id=123;https://example.com\x1b\\Link\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_osc_hyperlink_bytes():
    # OSC 8 hyperlink as bytes
    s = b"\x1b]8;;https://example.com\x1b\\Link text\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_non_ascii_text():
    # Non-ASCII text should be preserved
    s = "\x1b[31mGr\xc3\xbc\xdf\x65\x1b[0m"
    # The non-ASCII bytes are not valid in str, so let's use unicode
    s = "\x1b[31mGrüße\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_nested_ansi_codes():
    # Nested ANSI codes (not really possible, but test for robustness)
    s = "A\x1b[31mB\x1b[1mC\x1b[0mD\x1b[0mE"
    # Should remove all ANSI codes, leave text
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_unsupported_escape_sequences():
    # Escape sequences that are not ANSI should not be stripped
    s = "\x1bXNotAnAnsi"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_partial_osc_sequence():
    # Incomplete OSC sequence should not be stripped
    s = "\x1b]8;;https://example.com\x1b\\Link text"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_osc_hyperlink_with_esc_in_text():
    # Link text containing ESC should not be matched by the regex, so not stripped
    s = "\x1b]8;;https://example.com\x1b\\Link\x1b[31mText\x1b]8;;\x1b\\"
    # Only the inner ANSI code should be stripped, not the hyperlink
    expected = "\x1b]8;;https://example.com\x1b\\LinkText\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_bytes_with_non_ascii():
    # Bytes input with non-ASCII content
    s = b"\x1b[32m\xc3\xbcber\x1b[0m"
    codeflash_output = _strip_ansi(s)


def test_strip_ansi_large_string():
    # Large string with many ANSI codes
    text = "word"
    ansi = "\x1b[31m"
    reset = "\x1b[0m"
    # Create a string of 1000 words, each wrapped in ANSI code
    s = "".join(f"{ansi}{text}{reset} " for _ in range(1000))
    expected = " ".join([text] * 1000) + " "
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_large_bytes():
    # Large bytes input with many ANSI codes
    text = b"word"
    ansi = b"\x1b[31m"
    reset = b"\x1b[0m"
    s = b"".join([ansi + text + reset + b" " for _ in range(1000)])
    expected = b" ".join([text] * 1000) + b" "
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_long_osc_hyperlink():
    # Very long OSC 8 hyperlink
    uri = "https://example.com/" + "a" * 900
    link_text = "L" * 50
    s = f"\x1b]8;;{uri}\x1b\\{link_text}\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_many_hyperlinks():
    # Many OSC 8 hyperlinks in one string
    s = "".join(f"\x1b]8;;https://x.com/{i}\x1b\\Link{i}\x1b]8;;\x1b\\" for i in range(100))
    expected = "".join(f"Link{i}" for i in range(100))
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_large_mixed_content():
    # Large string with interleaved ANSI, OSC, plain, and bytes
    ansi = "\x1b[31m"
    reset = "\x1b[0m"
    osc = "\x1b]8;;https://example.com\x1b\\"
    osc_close = "\x1b]8;;\x1b\\"
    s = ""
    expected = ""
    for i in range(500):
        s += f"{ansi}red{i}{reset} plain{i} {osc}Link{i}{osc_close}"
        expected += f"red{i} plain{i} Link{i}"
    codeflash_output = _strip_ansi(s)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import re

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.tabulate import _strip_ansi

# function to test
_esc = r"\x1b"
_csi = rf"{_esc}\["
_osc = rf"{_esc}\]"
_st = rf"{_esc}\\"

_ansi_escape_pat = rf"""
    (
        # terminal colors, etc
        {_csi}        # CSI
        [\x30-\x3f]*  # parameter bytes
        [\x20-\x2f]*  # intermediate bytes
        [\x40-\x7e]   # final byte
    |
        # terminal hyperlinks
        {_osc}8;        # OSC opening
        (\w+=\w+:?)*    # key=value params list (submatch 2)
        ;               # delimiter
        ([^{_esc}]+)    # URI - anything but ESC (submatch 3)
        {_st}           # ST
        ([^{_esc}]+)    # link text - anything but ESC (submatch 4)
        {_osc}8;;{_st}  # "closing" OSC sequence
    )
"""
_ansi_codes = re.compile(_ansi_escape_pat, re.VERBOSE)
_ansi_codes_bytes = re.compile(_ansi_escape_pat.encode("utf8"), re.VERBOSE)
from codeflash.code_utils.tabulate import _strip_ansi

# unit tests

# -----------------------
# Basic Test Cases
# -----------------------

def test_strip_ansi_no_ansi():
    # String with no ANSI codes should remain unchanged
    codeflash_output = _strip_ansi("Hello, World!")

def test_strip_ansi_simple_csi():
    # Basic ANSI color code should be removed
    codeflash_output = _strip_ansi("\x1b[31mRed Text\x1b[0m")

def test_strip_ansi_multiple_csi():
    # Multiple ANSI codes in one string
    s = "\x1b[32mGreen\x1b[0m and \x1b[34mBlue\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_csi_in_middle():
    # ANSI code in the middle of text
    s = "Start\x1b[1mBold\x1b[0mEnd"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_bytes_input():
    # Bytes input with ANSI codes
    s = b"\x1b[31mRed\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_non_ascii_characters():
    # String with Unicode and ANSI
    s = "\x1b[31mR\u00e9d\x1b[0m"
    codeflash_output = _strip_ansi(s)

# -----------------------
# Edge Test Cases
# -----------------------

def test_strip_ansi_empty_string():
    # Empty string input
    codeflash_output = _strip_ansi("")

def test_strip_ansi_empty_bytes():
    # Empty bytes input
    codeflash_output = _strip_ansi(b"")

def test_strip_ansi_only_ansi():
    # String that is only ANSI code
    codeflash_output = _strip_ansi("\x1b[0m")

def test_strip_ansi_only_ansi_bytes():
    # Bytes that are only ANSI code
    codeflash_output = _strip_ansi(b"\x1b[0m")

def test_strip_ansi_incomplete_ansi_sequence():
    # Incomplete ANSI sequence should not be removed
    # (not matching the full pattern)
    s = "abc\x1b[31abc"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_malformed_ansi_sequence():
    # Malformed sequence, not matching the regex
    s = "bad\x1b[badmtext"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_nested_sequences():
    # Nested or repeated ANSI codes
    s = "\x1b[1m\x1b[31mRed\x1b[0m\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_adjacent_sequences():
    # Adjacent ANSI codes
    s = "\x1b[1m\x1b[31mRed\x1b[0m"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_terminal_hyperlink():
    # OSC 8 hyperlink sequence should be replaced with link text
    s = "\x1b]8;;https://example.com\x1b\\Example Link\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_terminal_hyperlink_and_text():
    # OSC 8 with text before and after
    s = "Click \x1b]8;;https://example.com\x1b\\here\x1b]8;;\x1b\\ please"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_osc_params():
    # OSC 8 with params
    s = "\x1b]8;key=val;https://example.com\x1b\\Link\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_osc_params_and_text():
    # OSC 8 with params and surrounding text
    s = "Go to \x1b]8;key=val;https://example.com\x1b\\site\x1b]8;;\x1b\\!"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_bytes_with_hyperlink():
    # Bytes input with OSC 8 hyperlink
    s = b"\x1b]8;;https://example.com\x1b\\Link\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_escapes_in_text():
    # Text containing literal \x1b, but not an ANSI sequence
    s = "ESC char: \x1b not a code"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_similar_but_invalid_sequences():
    # Sequence that looks like an OSC but isn't valid
    s = "\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_multiple_hyperlinks():
    # Multiple hyperlinks in the same string
    s = "\x1b]8;;https://a.com\x1b\\A\x1b]8;;\x1b\\ and \x1b]8;;https://b.com\x1b\\B\x1b]8;;\x1b\\"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_non_printable_chars():
    # String with non-printable characters and ANSI
    s = "\x1b[31mRed\x1b[0m\x07"
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_with_newlines_and_tabs():
    # String with newlines and tabs and ANSI
    s = "\x1b[32mGreen\nText\tHere\x1b[0m"
    codeflash_output = _strip_ansi(s)

# -----------------------
# Large Scale Test Cases
# -----------------------

def test_strip_ansi_long_string_no_ansi():
    # Very long string without ANSI should be unchanged
    s = "A" * 1000
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_long_string_with_ansi_everywhere():
    # Very long string with ANSI code every 10 characters
    base = "abcdefghij"
    ansi = "\x1b[31m"
    s = "".join(ansi + base for _ in range(100))
    expected = base * 100
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_long_string_with_hyperlinks():
    # Long string with many hyperlinks
    link = "\x1b]8;;https://example.com\x1b\\L\x1b]8;;\x1b\\"
    s = link * 500
    expected = "L" * 500
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_large_bytes_with_ansi():
    # Large bytes object with many ANSI codes
    ansi = b"\x1b[32m"
    base = b"0123456789"
    s = b"".join(ansi + base for _ in range(100))
    expected = base * 100
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_large_bytes_with_hyperlinks():
    # Large bytes object with many OSC 8 hyperlinks
    link = b"\x1b]8;;https://example.com\x1b\\L\x1b]8;;\x1b\\"
    s = link * 500
    expected = b"L" * 500
    codeflash_output = _strip_ansi(s)

def test_strip_ansi_large_mixed_content():
    # Large string mixing ANSI, OSC, and normal text
    ansi = "\x1b[31m"
    osc = "\x1b]8;;https://x.com\x1b\\X\x1b]8;;\x1b\\"
    chunk = ansi + "A" * 5 + osc + "B" * 5
    s = chunk * 100
    expected = "A" * 5 + "X" + "B" * 5
    expected = expected * 100
    codeflash_output = _strip_ansi(s)

# -----------------------
# Determinism and Type Preservation
# -----------------------

def test_strip_ansi_type_preservation_str():
    # Output should be str if input is str
    s = "\x1b[31mRed\x1b[0m"
    codeflash_output = _strip_ansi(s); out = codeflash_output

def test_strip_ansi_type_preservation_bytes():
    # Output should be bytes if input is bytes
    s = b"\x1b[31mRed\x1b[0m"
    codeflash_output = _strip_ansi(s); out = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr217-2025-05-19T04.39.46 and push.

Codeflash

KRRT7 and others added 6 commits May 18, 2025 23:22
Here is an optimized version of `_strip_ansi` focusing on runtime speed. The main bottleneck is the regular expression replacement in `re.sub`, specifically returning the (possibly empty) group 4 (link text in hyperlinks) or, if it doesn’t match, an empty string for ANSI codes.  
This can be significantly sped up by avoiding the costly `r"\4"` (which always triggers group resolution machinery in the regex engine), and instead using a faster replacer callback.  
Since every match will be either an ANSI escape code (where group 4 is `None`) or a hyperlink (where group 4 contains the visible link text), we can handle both cases in one simple function.

Optimized `code`.



**Performance notes:**  
- This avoids all regex group substitution machinery for the common ANSI case.
- No change to visible/functional behavior.
- No changes to external function names or signatures.
- String and bytes cases are handled separately, so no unnecessary type checks inside tight loops.

**Comment:**  
No original comments were changed or removed.  
No changes made to the public interface or expected output.  
All logic concerning group 4 and escape sequence removal is preserved.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 19, 2025
@KRRT7 KRRT7 force-pushed the proper-cleanup branch 3 times, most recently from 48716a1 to 0ba52ea Compare May 21, 2025 01:40
Base automatically changed from proper-cleanup to main May 21, 2025 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants