Skip to content

Conversation

@Saga4
Copy link
Contributor

@Saga4 Saga4 commented Jun 4, 2025

User description

  • check large diffs with black, and skipp formatting in such case (after optimizing)
  • new line
  • better log messages
  • remove unnecessary check
  • new line
  • remove unused comment
  • ** the max lines for formatting changes to 100**
  • refactoring
  • refactoring and improvements
  • added black as dev dependency
  • made some refactor changes that codeflash suggested
  • remove unused function
  • formatting & using internal black dep
  • fix black import issue
  • handle formatting files with no formatting issues
  • fix_duplication_suggestion_issue

PR Type

Enhancement, Tests


Description

  • Add Black dev dependency and safe formatting checks

  • Add sample files for formatting scenarios (none/few/many)

  • Add tests for skip/apply formatting scenarios

  • Improve function blocklist skipping logic


Changes walkthrough 📝

Relevant files
Tests
4 files
few_formatting_errors.py
Add sample file with few formatting errors                             
+47/-0   
many_formatting_errors.py
Add sample file with many formatting errors                           
+147/-0 
no_formatting_errors.py
Add sample file with no formatting errors                               
+71/-0   
test_formatter.py
Add tests for formatting diff scenarios                                   
+74/-0   
Enhancement
1 files
formatter.py
Add diff-based safe formatting checks                                       
+43/-4   
Bug fix
1 files
functions_to_optimize.py
Improve blocklist skipping logic                                                 
+3/-1     
Dependencies
1 files
pyproject.toml
Add Black as development dependency                                           
+1/-0     

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented Jun 4, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Function

    The formatter.py module does not define sort_imports which is imported in tests, causing import errors.

    from typing import TYPE_CHECKING, Optional
    
    import isort
    
    from codeflash.cli_cmds.console import console, logger
    
    if TYPE_CHECKING:
    Syntax Error

    The line self. address = address has invalid attribute syntax and will cause a parse error.

    self.address=address;self.city=city;self.state=state;self.zip_code=zip_code
    Indentation

    In the blocklist skipping logic, comment and code indentation may be misaligned, potentially altering control flow.

    if (
        function.file_path.name in blocklist_funcs
        and function.qualified_name in blocklist_funcs[function.file_path.name]
    ):
        # This function is in blocklist, we can skip it
        blocklist_funcs_removed_count += 1
        continue
    # This function is NOT in blocklist. we can keep it
    functions_tmp.append(function)

    @github-actions
    Copy link

    github-actions bot commented Jun 4, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix attribute assignment syntax

    Remove the unexpected spaces in the attribute assignment to fix the syntax error and
    correctly assign to self.address.

    code_to_optimize/few_formatting_errors.py [19]

    -self.   address = address
    +self.address = address
    Suggestion importance[1-10]: 9

    __

    Why: The extra spaces in self. address = address cause a syntax error preventing the class from loading.

    High
    Propagate missing formatter error

    Re-raise the caught FileNotFoundError so that missing formatter errors propagate and
    match expected test behavior.

    codeflash/code_utils/formatter.py [74-75]

     except FileNotFoundError as e:
         from rich.panel import Panel
    +    raise
    Suggestion importance[1-10]: 8

    __

    Why: Without re-raising, a FileNotFoundError from a missing formatter is swallowed, breaking tests that expect this exception.

    Medium
    General
    Restrict safe check to Black

    Restrict the is_safe_to_format check to only apply for the Black formatter, allowing
    other formatters (like exit) to still be executed.

    codeflash/code_utils/formatter.py [60-61]

    -if formatter_name == "disabled" or not is_safe_to_format(filepath=str(path), content=file_content):
    +if formatter_name == "disabled" or (formatter_name == "black" and not is_safe_to_format(filepath=str(path), content=file_content)):
         return file_content
    Suggestion importance[1-10]: 7

    __

    Why: The current guard blocks all formatter commands, but it should only prevent Black formatting, so non-Black formatters like ruff can still run.

    Medium

    @Saga4 Saga4 closed this Jun 4, 2025
    codeflash-ai bot added a commit that referenced this pull request Jun 4, 2025
    …/fix_duplication_suggestion_issue`)
    
    Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance.
    
    You do **not** need to build the list of lines; simply scan and count qualifying lines in a single pass.
    
    Optimized version.
    
    
    
    **Changes made:**
    - Inlined the `is_diff_line` logic for better performance (function call overhead avoided).
    - Used a running integer (`count`) instead of a list, so memory use and processing are reduced.
    - Avoided creating an unnecessary list and removed the nested function.
    
    **This is as fast and memory-efficient as this logic gets in idiomatic Python.**
    Comment on lines +27 to +35
    lines = diff_output.split("\n")

    def is_diff_line(line: str) -> bool:
    return line.startswith(("+", "-")) and not line.startswith(("+++", "---"))

    diff_lines = [line for line in lines if is_diff_line(line)]
    return len(diff_lines)


    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚡️Codeflash found 16% (0.16x) speedup for get_diff_lines_count in codeflash/code_utils/formatter.py

    ⏱️ Runtime : 2.65 milliseconds 2.29 milliseconds (best of 263 runs)

    📝 Explanation and details

    Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance.

    You do not need to build the list of lines; simply scan and count qualifying lines in a single pass.

    Optimized version.

    Changes made:

    • Inlined the is_diff_line logic for better performance (function call overhead avoided).
    • Used a running integer (count) instead of a list, so memory use and processing are reduced.
    • Avoided creating an unnecessary list and removed the nested function.

    This is as fast and memory-efficient as this logic gets in idiomatic Python.

    Correctness verification report:

    Test Status
    ⚙️ Existing Unit Tests 🔘 None Found
    🌀 Generated Regression Tests 61 Passed
    ⏪ Replay Tests 🔘 None Found
    🔎 Concolic Coverage Tests 1 Passed
    📊 Tests Coverage 100.0%
    🌀 Generated Regression Tests Details
    from __future__ import annotations
    
    # imports
    import pytest  # used for our unit tests
    from codeflash.code_utils.formatter import get_diff_lines_count
    
    # unit tests
    
    # ------------------------
    # Basic Test Cases
    # ------------------------
    
    def test_empty_string_returns_zero():
        # No lines at all
        codeflash_output = get_diff_lines_count("")
    
    def test_no_diff_lines():
        # No lines starting with + or -
        diff = " context line 1\n context line 2"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_single_addition_line():
        # One line added
        diff = "+added line"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_single_deletion_line():
        # One line deleted
        diff = "-deleted line"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_mixed_add_and_delete():
        # Mixed + and - lines
        diff = "+added\n-context\n+another add\n-context2\n-deleted"
        codeflash_output = get_diff_lines_count(diff)  # 2 additions, 2 deletions
    
    def test_ignores_diff_headers():
        # Ignore lines like --- and +++
        diff = "--- a/file.txt\n+++ b/file.txt\n+added\n-context\n-deleted"
        codeflash_output = get_diff_lines_count(diff)  # +added, -deleted
    
    def test_only_diff_headers():
        # Only diff headers, no actual changes
        diff = "--- a/file.txt\n+++ b/file.txt"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_context_lines_with_leading_spaces():
        # Lines with spaces before + or - are not counted
        diff = " +not counted\n -not counted\n+counted\n-counted"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_multiple_changes_and_headers():
        # Headers and multiple diff lines
        diff = (
            "--- a/foo.txt\n"
            "+++ b/foo.txt\n"
            "@@ -1,3 +1,4 @@\n"
            " line 1\n"
            "+added 1\n"
            " line 2\n"
            "-deleted 1\n"
            " line 3\n"
            "+added 2\n"
            "-deleted 2"
        )
        codeflash_output = get_diff_lines_count(diff)  # +added 1, -deleted 1, +added 2, -deleted 2
    
    # ------------------------
    # Edge Test Cases
    # ------------------------
    
    def test_lines_with_plus_and_minus_in_middle():
        # Lines with + or - not at start should not be counted
        diff = "context + not counted\ncontext - not counted"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_only_plus_and_minus():
        # Lines that are just "+" or "-" are counted
        diff = "+\n-\n---\n+++"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_trailing_spaces():
        # Lines with trailing spaces after + or - are counted
        diff = "+added \n-deleted \n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_multiple_plus_or_minus():
        # Lines starting with ++ or -- but not +++ or ---
        diff = "++double plus\n--double minus\n+++triple plus\n---triple minus"
        codeflash_output = get_diff_lines_count(diff)  # only ++double plus, --double minus
    
    def test_lines_with_tabs():
        # Lines with tabs after + or - are counted
        diff = "+\tadded with tab\n-\tdeleted with tab"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_unicode_characters():
        # Lines with unicode characters
        diff = "+añadido\n-удалено\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_empty_lines_between():
        # Empty lines should not affect the count
        diff = "+add1\n\n-add2\n\n\n+add3"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_only_newlines():
        # Only newlines, no diff lines
        diff = "\n\n\n"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_mixed_line_endings():
        # Test with mixed \n and \r\n endings
        diff = "+add1\r\n-context\r\n-add2\n+add3"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_leading_and_trailing_whitespace():
        # Only lines that start with + or - (no spaces before) are counted
        diff = " +not counted\n- counted\n+counted\n-counted"
        codeflash_output = get_diff_lines_count(diff)
    
    # ------------------------
    # Large Scale Test Cases
    # ------------------------
    
    def test_large_number_of_additions():
        # 1000 addition lines
        diff = "\n".join(["+line{}".format(i) for i in range(1000)])
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_number_of_deletions():
        # 1000 deletion lines
        diff = "\n".join(["-line{}".format(i) for i in range(1000)])
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_mixed_diff():
        # 500 additions, 500 deletions, 500 context lines, 2 headers
        additions = ["+add{}".format(i) for i in range(500)]
        deletions = ["-del{}".format(i) for i in range(500)]
        context = [" context{}".format(i) for i in range(500)]
        headers = ["--- a/file.txt", "+++ b/file.txt"]
        diff = "\n".join(headers + additions + deletions + context)
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_diff_with_headers_and_noise():
        # 400 additions, 400 deletions, 100 context, 100 lines starting with +++ or ---
        additions = ["+add{}".format(i) for i in range(400)]
        deletions = ["-del{}".format(i) for i in range(400)]
        context = [" context{}".format(i) for i in range(100)]
        noise = ["+++ noise{}".format(i) for i in range(50)] + ["--- noise{}".format(i) for i in range(50)]
        diff = "\n".join(noise + additions + deletions + context)
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_diff_with_varied_line_types():
        # 300 additions, 300 deletions, 200 context, 100 lines with leading spaces
        additions = ["+add{}".format(i) for i in range(300)]
        deletions = ["-del{}".format(i) for i in range(300)]
        context = [" context{}".format(i) for i in range(200)]
        leading_spaces = [" +notcounted{}".format(i) for i in range(50)] + [" -notcounted{}".format(i) for i in range(50)]
        diff = "\n".join(additions + deletions + context + leading_spaces)
        codeflash_output = get_diff_lines_count(diff)
    
    # ------------------------
    # Additional Edge Cases for Robustness
    # ------------------------
    
    def test_diff_with_only_headers_and_context():
        # Only headers and context lines
        diff = "--- a/file.txt\n+++ b/file.txt\n context1\n context2"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_plus_and_minus_in_middle_of_line():
        # + or - in the middle should not count
        diff = "context + not counted\ncontext - not counted\n+counted\n-counted"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_empty_lines_and_whitespace():
        # Empty lines and whitespace-only lines
        diff = "\n \n\t\n+add\n-del\n"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_non_ascii_and_binary_like_lines():
        # Non-ASCII and binary-like content
        diff = "+\x00\x01\n-\x02\x03\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_at_at_lines():
        # Lines starting with @@ should not be counted
        diff = "@@ -1,3 +1,4 @@\n+add\n-del"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_tricky_headers():
        # Lines like ---- or ++++ (more than 3) are not headers and should be counted
        diff = "---- not header\n++++ not header\n+++ header\n--- header"
        codeflash_output = get_diff_lines_count(diff)  # ---- not header, ++++ not header
    
    def test_diff_with_mixed_case_headers():
        # Headers are always lowercase (per diff), so +++A is not a header
        diff = "+++A\n---B\n+add\n-del"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_windows_line_endings():
        # Windows CRLF endings
        diff = "+add1\r\n-context\r\n-add2\r\n+add3\r\n"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_leading_newlines():
        # Leading newlines before diff lines
        diff = "\n\n+add\n-del"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_with_trailing_newlines():
        # Trailing newlines after diff lines
        diff = "+add\n-del\n\n\n"
        codeflash_output = get_diff_lines_count(diff)
    # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
    
    from __future__ import annotations
    
    # imports
    import pytest  # used for our unit tests
    from codeflash.code_utils.formatter import get_diff_lines_count
    
    # unit tests
    
    # -------------------- BASIC TEST CASES --------------------
    
    def test_empty_string_returns_zero():
        # No diff lines in empty input
        codeflash_output = get_diff_lines_count("")
    
    def test_no_diff_lines():
        # Input contains lines but none are diff lines
        diff = "context line\nanother context line"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_single_added_line():
        # Single added line
        diff = "+added line"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_single_removed_line():
        # Single removed line
        diff = "-removed line"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_mixed_diff_lines():
        # Mixed added and removed lines
        diff = "+added\n-removed\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_ignores_diff_headers():
        # Should not count diff headers like '+++', '---'
        diff = "--- a/file.txt\n+++ b/file.txt\n+added\n-removed\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_multiple_diff_lines_with_context():
        # Multiple diff lines interleaved with context
        diff = (
            " context1\n"
            "+added1\n"
            " context2\n"
            "-removed1\n"
            "+added2\n"
            " context3\n"
            "-removed2"
        )
        codeflash_output = get_diff_lines_count(diff)
    
    def test_diff_lines_with_leading_spaces():
        # Only lines starting *exactly* with '+' or '-' count
        diff = " +not counted\n-added\n -not counted\n+added"
        codeflash_output = get_diff_lines_count(diff)
    
    # -------------------- EDGE TEST CASES --------------------
    
    def test_only_diff_headers():
        # Only diff headers, should return 0
        diff = "--- a/file\n+++ b/file"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_multiple_plus_minus():
        # Lines starting with multiple '+' or '-' but not '+++' or '---'
        diff = "++not header\n--not header\n+++ header\n--- header"
        # '++not header' and '--not header' should count, '+++ header' and '--- header' should not
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_only_plus_minus():
        # Lines that are just '+' or '-'
        diff = "+\n-\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_tabs_and_spaces():
        # Tabs or spaces after + or - are fine, but not before
        diff = "+\tadded with tab\n- removed with space"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_unicode_characters():
        # Unicode should not affect counting
        diff = "+äöü\n-你好\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_trailing_newlines():
        # Trailing newlines should not affect count
        diff = "+added\n-removed\n"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_only_newlines():
        # Only newlines, no diff lines
        diff = "\n\n\n"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_mixed_line_endings():
        # Mixed \n and \r\n endings
        diff = "+added\r\n-removed\n context\r\n"
        codeflash_output = get_diff_lines_count(diff.replace('\r\n', '\n'))
    
    def test_lines_with_just_plus_or_minus_and_spaces():
        # Lines like '+ ' or '- ' should count
        diff = "+ \n- \n context"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_plus_minus_in_middle():
        # Only lines starting with + or - count
        diff = "context +added\ncontext -removed"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_plus_minus_but_not_at_start():
        # Should not count if + or - is not at the start
        diff = " context+added\n context-removed"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_plus_minus_but_too_short_for_header():
        # Lines like '++', '--' should count, but '+++' and '---' should not
        diff = "++\n--\n+++\n---"
        codeflash_output = get_diff_lines_count(diff)
    
    def test_lines_with_plus_minus_and_numbers():
        # Should count as diff lines
        diff = "+1\n-2\n context"
        codeflash_output = get_diff_lines_count(diff)
    
    # -------------------- LARGE SCALE TEST CASES --------------------
    
    def test_large_number_of_added_lines():
        # 1000 added lines
        diff = "\n".join(["+line{}".format(i) for i in range(1000)])
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_number_of_removed_lines():
        # 1000 removed lines
        diff = "\n".join(["-line{}".format(i) for i in range(1000)])
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_mixed_diff_and_context():
        # 500 added, 500 removed, 500 context lines
        added = ["+a{}".format(i) for i in range(500)]
        removed = ["-r{}".format(i) for i in range(500)]
        context = [" context{}".format(i) for i in range(500)]
        # Interleave them
        lines = []
        for i in range(500):
            lines.append(added[i])
            lines.append(removed[i])
            lines.append(context[i])
        diff = "\n".join(lines)
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_with_headers_and_diff_lines():
        # 100 diff headers, 800 diff lines, 100 context lines
        headers = ["--- a/file{}".format(i) for i in range(50)] + ["+++ b/file{}".format(i) for i in range(50)]
        diff_lines = ["+add{}".format(i) for i in range(400)] + ["-rem{}".format(i) for i in range(400)]
        context = [" context{}".format(i) for i in range(100)]
        lines = headers + diff_lines + context
        diff = "\n".join(lines)
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_all_headers():
        # 1000 header lines, should return 0
        headers = ["+++ file{}".format(i) for i in range(500)] + ["--- file{}".format(i) for i in range(500)]
        diff = "\n".join(headers)
        codeflash_output = get_diff_lines_count(diff)
    
    def test_large_randomized_diff_lines():
        # Mix of diff, context, and header lines
        lines = []
        for i in range(333):
            lines.append("+add{}".format(i))
            lines.append(" context{}".format(i))
            lines.append("--- file{}".format(i))
            lines.append("-rem{}".format(i))
            lines.append("+++ file{}".format(i))
        diff = "\n".join(lines)
        # Only +add and -rem lines count, so 333*2 = 666
        codeflash_output = get_diff_lines_count(diff)
    # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
    
    from codeflash.code_utils.formatter import get_diff_lines_count
    
    def test_get_diff_lines_count():
        get_diff_lines_count('')

    To test or edit this optimization locally git merge codeflash/optimize-pr284-2025-06-04T20.10.23

    Suggested change
    lines = diff_output.split("\n")
    def is_diff_line(line: str) -> bool:
    return line.startswith(("+", "-")) and not line.startswith(("+++", "---"))
    diff_lines = [line for line in lines if is_diff_line(line)]
    return len(diff_lines)
    # Count only the diff lines as per is_diff_line logic, but avoid unnecessary function call and list allocation.
    count = 0
    for line in diff_output.split("\n"):
    if line and line[0] in "+-" and not (line.startswith("+++") or line.startswith("---")):
    count += 1
    return count

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants