saga4/fix duplication suggestion issue #284

Saga4 · 2025-06-04T20:02:05Z

User description

check large diffs with black, and skipp formatting in such case (after optimizing)
new line
better log messages
remove unnecessary check
new line
remove unused comment
** the max lines for formatting changes to 100**
refactoring
refactoring and improvements
added black as dev dependency
made some refactor changes that codeflash suggested
remove unused function
formatting & using internal black dep
fix black import issue
handle formatting files with no formatting issues
fix_duplication_suggestion_issue

PR Type

Enhancement, Tests

Description

Add Black dev dependency and safe formatting checks
Add sample files for formatting scenarios (none/few/many)
Add tests for skip/apply formatting scenarios
Improve function blocklist skipping logic

Changes walkthrough 📝

Relevant files

Tests

4 files

few_formatting_errors.py `Add sample file with few formatting errors`	+47/-0
many_formatting_errors.py `Add sample file with many formatting errors`	+147/-0
no_formatting_errors.py `Add sample file with no formatting errors`	+71/-0
test_formatter.py `Add tests for formatting diff scenarios`	+74/-0

Enhancement

1 files

formatter.py `Add diff-based safe formatting checks`	+43/-4

Bug fix

1 files

functions_to_optimize.py `Improve blocklist skipping logic`	+3/-1

Dependencies

1 files

pyproject.toml `Add Black as development dependency`	+1/-0

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

…r optimizing)

github-actions · 2025-06-04T20:03:21Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing Function The `formatter.py` module does not define `sort_imports` which is imported in tests, causing import errors. from typing import TYPE_CHECKING, Optional import isort from codeflash.cli_cmds.console import console, logger if TYPE_CHECKING: Syntax Error The line `self. address = address` has invalid attribute syntax and will cause a parse error. self.address=address;self.city=city;self.state=state;self.zip_code=zip_code Indentation In the blocklist skipping logic, comment and code indentation may be misaligned, potentially altering control flow. if ( function.file_path.name in blocklist_funcs and function.qualified_name in blocklist_funcs[function.file_path.name] ): # This function is in blocklist, we can skip it blocklist_funcs_removed_count += 1 continue # This function is NOT in blocklist. we can keep it functions_tmp.append(function)

github-actions · 2025-06-04T20:04:35Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix attribute assignment syntax Remove the unexpected spaces in the attribute assignment to fix the syntax error and correctly assign to `self.address`. code_to_optimize/few_formatting_errors.py [19] -self. address = address +self.address = address Suggestion importance[1-10]: 9 __ Why: The extra spaces in `self. address = address` cause a syntax error preventing the class from loading.	High
Possible issue	Propagate missing formatter error Re-raise the caught `FileNotFoundError` so that missing formatter errors propagate and match expected test behavior. codeflash/code_utils/formatter.py [74-75] except FileNotFoundError as e: from rich.panel import Panel + raise Suggestion importance[1-10]: 8 __ Why: Without re-raising, a `FileNotFoundError` from a missing formatter is swallowed, breaking tests that expect this exception.	Medium
General	Restrict safe check to Black Restrict the `is_safe_to_format` check to only apply for the Black formatter, allowing other formatters (like `exit`) to still be executed. codeflash/code_utils/formatter.py [60-61] -if formatter_name == "disabled" or not is_safe_to_format(filepath=str(path), content=file_content): +if formatter_name == "disabled" or (formatter_name == "black" and not is_safe_to_format(filepath=str(path), content=file_content)): return file_content Suggestion importance[1-10]: 7 __ Why: The current guard blocks all formatter commands, but it should only prevent Black formatting, so non-Black formatters like `ruff` can still run.	Medium

…/fix_duplication_suggestion_issue`) Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance. You do **not** need to build the list of lines; simply scan and count qualifying lines in a single pass. Optimized version. **Changes made:** - Inlined the `is_diff_line` logic for better performance (function call overhead avoided). - Used a running integer (`count`) instead of a list, so memory use and processing are reduced. - Avoided creating an unnecessary list and removed the nested function. **This is as fast and memory-efficient as this logic gets in idiomatic Python.**

codeflash-ai · 2025-06-04T20:10:29Z

codeflash/code_utils/formatter.py

+    lines = diff_output.split("\n")
+
+    def is_diff_line(line: str) -> bool:
+        return line.startswith(("+", "-")) and not line.startswith(("+++", "---"))
+
+    diff_lines = [line for line in lines if is_diff_line(line)]
+    return len(diff_lines)
+
+


⚡️Codeflash found 16% (0.16x) speedup for get_diff_lines_count in codeflash/code_utils/formatter.py

⏱️ Runtime : 2.65 milliseconds → 2.29 milliseconds (best of 263 runs)

📝 Explanation and details

Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance.

You do not need to build the list of lines; simply scan and count qualifying lines in a single pass.

Optimized version.

Changes made:

Inlined the is_diff_line logic for better performance (function call overhead avoided).

Used a running integer (count) instead of a list, so memory use and processing are reduced.

Avoided creating an unnecessary list and removed the nested function.

This is as fast and memory-efficient as this logic gets in idiomatic Python.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 61 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests ✅ 1 Passed

📊 Tests Coverage 100.0%

🌀 Generated Regression Tests Details

from __future__ import annotations # imports import pytest # used for our unit tests from codeflash.code_utils.formatter import get_diff_lines_count # unit tests # ------------------------ # Basic Test Cases # ------------------------ def test_empty_string_returns_zero(): # No lines at all codeflash_output = get_diff_lines_count("") def test_no_diff_lines(): # No lines starting with + or - diff = " context line 1\n context line 2" codeflash_output = get_diff_lines_count(diff) def test_single_addition_line(): # One line added diff = "+added line" codeflash_output = get_diff_lines_count(diff) def test_single_deletion_line(): # One line deleted diff = "-deleted line" codeflash_output = get_diff_lines_count(diff) def test_mixed_add_and_delete(): # Mixed + and - lines diff = "+added\n-context\n+another add\n-context2\n-deleted" codeflash_output = get_diff_lines_count(diff) # 2 additions, 2 deletions def test_ignores_diff_headers(): # Ignore lines like --- and +++ diff = "--- a/file.txt\n+++ b/file.txt\n+added\n-context\n-deleted" codeflash_output = get_diff_lines_count(diff) # +added, -deleted def test_only_diff_headers(): # Only diff headers, no actual changes diff = "--- a/file.txt\n+++ b/file.txt" codeflash_output = get_diff_lines_count(diff) def test_context_lines_with_leading_spaces(): # Lines with spaces before + or - are not counted diff = " +not counted\n -not counted\n+counted\n-counted" codeflash_output = get_diff_lines_count(diff) def test_multiple_changes_and_headers(): # Headers and multiple diff lines diff = ( "--- a/foo.txt\n" "+++ b/foo.txt\n" "@@ -1,3 +1,4 @@\n" " line 1\n" "+added 1\n" " line 2\n" "-deleted 1\n" " line 3\n" "+added 2\n" "-deleted 2" ) codeflash_output = get_diff_lines_count(diff) # +added 1, -deleted 1, +added 2, -deleted 2 # ------------------------ # Edge Test Cases # ------------------------ def test_lines_with_plus_and_minus_in_middle(): # Lines with + or - not at start should not be counted diff = "context + not counted\ncontext - not counted" codeflash_output = get_diff_lines_count(diff) def test_lines_with_only_plus_and_minus(): # Lines that are just "+" or "-" are counted diff = "+\n-\n---\n+++" codeflash_output = get_diff_lines_count(diff) def test_lines_with_trailing_spaces(): # Lines with trailing spaces after + or - are counted diff = "+added \n-deleted \n context" codeflash_output = get_diff_lines_count(diff) def test_lines_with_multiple_plus_or_minus(): # Lines starting with ++ or -- but not +++ or --- diff = "++double plus\n--double minus\n+++triple plus\n---triple minus" codeflash_output = get_diff_lines_count(diff) # only ++double plus, --double minus def test_lines_with_tabs(): # Lines with tabs after + or - are counted diff = "+\tadded with tab\n-\tdeleted with tab" codeflash_output = get_diff_lines_count(diff) def test_lines_with_unicode_characters(): # Lines with unicode characters diff = "+añadido\n-удалено\n context" codeflash_output = get_diff_lines_count(diff) def test_lines_with_empty_lines_between(): # Empty lines should not affect the count diff = "+add1\n\n-add2\n\n\n+add3" codeflash_output = get_diff_lines_count(diff) def test_lines_with_only_newlines(): # Only newlines, no diff lines diff = "\n\n\n" codeflash_output = get_diff_lines_count(diff) def test_lines_with_mixed_line_endings(): # Test with mixed \n and \r\n endings diff = "+add1\r\n-context\r\n-add2\n+add3" codeflash_output = get_diff_lines_count(diff) def test_lines_with_leading_and_trailing_whitespace(): # Only lines that start with + or - (no spaces before) are counted diff = " +not counted\n- counted\n+counted\n-counted" codeflash_output = get_diff_lines_count(diff) # ------------------------ # Large Scale Test Cases # ------------------------ def test_large_number_of_additions(): # 1000 addition lines diff = "\n".join(["+line{}".format(i) for i in range(1000)]) codeflash_output = get_diff_lines_count(diff) def test_large_number_of_deletions(): # 1000 deletion lines diff = "\n".join(["-line{}".format(i) for i in range(1000)]) codeflash_output = get_diff_lines_count(diff) def test_large_mixed_diff(): # 500 additions, 500 deletions, 500 context lines, 2 headers additions = ["+add{}".format(i) for i in range(500)] deletions = ["-del{}".format(i) for i in range(500)] context = [" context{}".format(i) for i in range(500)] headers = ["--- a/file.txt", "+++ b/file.txt"] diff = "\n".join(headers + additions + deletions + context) codeflash_output = get_diff_lines_count(diff) def test_large_diff_with_headers_and_noise(): # 400 additions, 400 deletions, 100 context, 100 lines starting with +++ or --- additions = ["+add{}".format(i) for i in range(400)] deletions = ["-del{}".format(i) for i in range(400)] context = [" context{}".format(i) for i in range(100)] noise = ["+++ noise{}".format(i) for i in range(50)] + ["--- noise{}".format(i) for i in range(50)] diff = "\n".join(noise + additions + deletions + context) codeflash_output = get_diff_lines_count(diff) def test_large_diff_with_varied_line_types(): # 300 additions, 300 deletions, 200 context, 100 lines with leading spaces additions = ["+add{}".format(i) for i in range(300)] deletions = ["-del{}".format(i) for i in range(300)] context = [" context{}".format(i) for i in range(200)] leading_spaces = [" +notcounted{}".format(i) for i in range(50)] + [" -notcounted{}".format(i) for i in range(50)] diff = "\n".join(additions + deletions + context + leading_spaces) codeflash_output = get_diff_lines_count(diff) # ------------------------ # Additional Edge Cases for Robustness # ------------------------ def test_diff_with_only_headers_and_context(): # Only headers and context lines diff = "--- a/file.txt\n+++ b/file.txt\n context1\n context2" codeflash_output = get_diff_lines_count(diff) def test_diff_with_plus_and_minus_in_middle_of_line(): # + or - in the middle should not count diff = "context + not counted\ncontext - not counted\n+counted\n-counted" codeflash_output = get_diff_lines_count(diff) def test_diff_with_empty_lines_and_whitespace(): # Empty lines and whitespace-only lines diff = "\n \n\t\n+add\n-del\n" codeflash_output = get_diff_lines_count(diff) def test_diff_with_non_ascii_and_binary_like_lines(): # Non-ASCII and binary-like content diff = "+\x00\x01\n-\x02\x03\n context" codeflash_output = get_diff_lines_count(diff) def test_diff_with_at_at_lines(): # Lines starting with @@ should not be counted diff = "@@ -1,3 +1,4 @@\n+add\n-del" codeflash_output = get_diff_lines_count(diff) def test_diff_with_tricky_headers(): # Lines like ---- or ++++ (more than 3) are not headers and should be counted diff = "---- not header\n++++ not header\n+++ header\n--- header" codeflash_output = get_diff_lines_count(diff) # ---- not header, ++++ not header def test_diff_with_mixed_case_headers(): # Headers are always lowercase (per diff), so +++A is not a header diff = "+++A\n---B\n+add\n-del" codeflash_output = get_diff_lines_count(diff) def test_diff_with_windows_line_endings(): # Windows CRLF endings diff = "+add1\r\n-context\r\n-add2\r\n+add3\r\n" codeflash_output = get_diff_lines_count(diff) def test_diff_with_leading_newlines(): # Leading newlines before diff lines diff = "\n\n+add\n-del" codeflash_output = get_diff_lines_count(diff) def test_diff_with_trailing_newlines(): # Trailing newlines after diff lines diff = "+add\n-del\n\n\n" codeflash_output = get_diff_lines_count(diff) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. from __future__ import annotations # imports import pytest # used for our unit tests from codeflash.code_utils.formatter import get_diff_lines_count # unit tests # -------------------- BASIC TEST CASES -------------------- def test_empty_string_returns_zero(): # No diff lines in empty input codeflash_output = get_diff_lines_count("") def test_no_diff_lines(): # Input contains lines but none are diff lines diff = "context line\nanother context line" codeflash_output = get_diff_lines_count(diff) def test_single_added_line(): # Single added line diff = "+added line" codeflash_output = get_diff_lines_count(diff) def test_single_removed_line(): # Single removed line diff = "-removed line" codeflash_output = get_diff_lines_count(diff) def test_mixed_diff_lines(): # Mixed added and removed lines diff = "+added\n-removed\n context" codeflash_output = get_diff_lines_count(diff) def test_ignores_diff_headers(): # Should not count diff headers like '+++', '---' diff = "--- a/file.txt\n+++ b/file.txt\n+added\n-removed\n context" codeflash_output = get_diff_lines_count(diff) def test_multiple_diff_lines_with_context(): # Multiple diff lines interleaved with context diff = ( " context1\n" "+added1\n" " context2\n" "-removed1\n" "+added2\n" " context3\n" "-removed2" ) codeflash_output = get_diff_lines_count(diff) def test_diff_lines_with_leading_spaces(): # Only lines starting *exactly* with '+' or '-' count diff = " +not counted\n-added\n -not counted\n+added" codeflash_output = get_diff_lines_count(diff) # -------------------- EDGE TEST CASES -------------------- def test_only_diff_headers(): # Only diff headers, should return 0 diff = "--- a/file\n+++ b/file" codeflash_output = get_diff_lines_count(diff) def test_lines_with_multiple_plus_minus(): # Lines starting with multiple '+' or '-' but not '+++' or '---' diff = "++not header\n--not header\n+++ header\n--- header" # '++not header' and '--not header' should count, '+++ header' and '--- header' should not codeflash_output = get_diff_lines_count(diff) def test_lines_with_only_plus_minus(): # Lines that are just '+' or '-' diff = "+\n-\n context" codeflash_output = get_diff_lines_count(diff) def test_lines_with_tabs_and_spaces(): # Tabs or spaces after + or - are fine, but not before diff = "+\tadded with tab\n- removed with space" codeflash_output = get_diff_lines_count(diff) def test_lines_with_unicode_characters(): # Unicode should not affect counting diff = "+äöü\n-你好\n context" codeflash_output = get_diff_lines_count(diff) def test_lines_with_trailing_newlines(): # Trailing newlines should not affect count diff = "+added\n-removed\n" codeflash_output = get_diff_lines_count(diff) def test_lines_with_only_newlines(): # Only newlines, no diff lines diff = "\n\n\n" codeflash_output = get_diff_lines_count(diff) def test_lines_with_mixed_line_endings(): # Mixed \n and \r\n endings diff = "+added\r\n-removed\n context\r\n" codeflash_output = get_diff_lines_count(diff.replace('\r\n', '\n')) def test_lines_with_just_plus_or_minus_and_spaces(): # Lines like '+ ' or '- ' should count diff = "+ \n- \n context" codeflash_output = get_diff_lines_count(diff) def test_lines_with_plus_minus_in_middle(): # Only lines starting with + or - count diff = "context +added\ncontext -removed" codeflash_output = get_diff_lines_count(diff) def test_lines_with_plus_minus_but_not_at_start(): # Should not count if + or - is not at the start diff = " context+added\n context-removed" codeflash_output = get_diff_lines_count(diff) def test_lines_with_plus_minus_but_too_short_for_header(): # Lines like '++', '--' should count, but '+++' and '---' should not diff = "++\n--\n+++\n---" codeflash_output = get_diff_lines_count(diff) def test_lines_with_plus_minus_and_numbers(): # Should count as diff lines diff = "+1\n-2\n context" codeflash_output = get_diff_lines_count(diff) # -------------------- LARGE SCALE TEST CASES -------------------- def test_large_number_of_added_lines(): # 1000 added lines diff = "\n".join(["+line{}".format(i) for i in range(1000)]) codeflash_output = get_diff_lines_count(diff) def test_large_number_of_removed_lines(): # 1000 removed lines diff = "\n".join(["-line{}".format(i) for i in range(1000)]) codeflash_output = get_diff_lines_count(diff) def test_large_mixed_diff_and_context(): # 500 added, 500 removed, 500 context lines added = ["+a{}".format(i) for i in range(500)] removed = ["-r{}".format(i) for i in range(500)] context = [" context{}".format(i) for i in range(500)] # Interleave them lines = [] for i in range(500): lines.append(added[i]) lines.append(removed[i]) lines.append(context[i]) diff = "\n".join(lines) codeflash_output = get_diff_lines_count(diff) def test_large_with_headers_and_diff_lines(): # 100 diff headers, 800 diff lines, 100 context lines headers = ["--- a/file{}".format(i) for i in range(50)] + ["+++ b/file{}".format(i) for i in range(50)] diff_lines = ["+add{}".format(i) for i in range(400)] + ["-rem{}".format(i) for i in range(400)] context = [" context{}".format(i) for i in range(100)] lines = headers + diff_lines + context diff = "\n".join(lines) codeflash_output = get_diff_lines_count(diff) def test_large_all_headers(): # 1000 header lines, should return 0 headers = ["+++ file{}".format(i) for i in range(500)] + ["--- file{}".format(i) for i in range(500)] diff = "\n".join(headers) codeflash_output = get_diff_lines_count(diff) def test_large_randomized_diff_lines(): # Mix of diff, context, and header lines lines = [] for i in range(333): lines.append("+add{}".format(i)) lines.append(" context{}".format(i)) lines.append("--- file{}".format(i)) lines.append("-rem{}".format(i)) lines.append("+++ file{}".format(i)) diff = "\n".join(lines) # Only +add and -rem lines count, so 333*2 = 666 codeflash_output = get_diff_lines_count(diff) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. from codeflash.code_utils.formatter import get_diff_lines_count def test_get_diff_lines_count(): get_diff_lines_count('')

To test or edit this optimization locally git merge codeflash/optimize-pr284-2025-06-04T20.10.23

Suggested change

lines = diff_output.split("\n")

def is_diff_line(line: str) -> bool:

return line.startswith(("+", "-")) and not line.startswith(("+++", "---"))

diff_lines = [line for line in lines if is_diff_line(line)]

return len(diff_lines)

# Count only the diff lines as per is_diff_line logic, but avoid unnecessary function call and list allocation.

count = 0

for line in diff_output.split("\n"):

if line and line[0] in "+-" and not (line.startswith("+++") or line.startswith("---")):

count += 1

return count

mohammedahmed18 and others added 16 commits June 3, 2025 11:56

check large diffs with black, and skipp formatting in such case (afte…

c75bbf6

…r optimizing)

new line

5cd13ad

better log messages

1522227

remove unnecessary check

d3ca1cb

new line

dcb084a

remove unused comment

689a2d9

the max lines for formatting changes to 100

44c0f85

refactoring

73ef518

refactoring and improvements

a5343fd

added black as dev dependency

395855d

made some refactor changes that codeflash suggested

822d6cc

remove unused function

ce15022

formatting & using internal black dep

d2a8711

fix black import issue

f46b368

handle formatting files with no formatting issues

6504cc4

fix_duplication_suggestion_issue

d924b31

github-actions bot added the Review effort 3/5 label Jun 4, 2025

Merge branch 'main' into saga4/fix_duplication_suggestion_issue

620ca38

Saga4 closed this Jun 4, 2025

codeflash-ai bot reviewed Jun 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

saga4/fix duplication suggestion issue #284

saga4/fix duplication suggestion issue #284

Uh oh!

Saga4 commented Jun 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

codeflash-ai bot Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 61 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 1 Passed
📊 Tests Coverage	100.0%

-    lines = diff_output.split("\n")
-    def is_diff_line(line: str) -> bool:
-        return line.startswith(("+", "-")) and not line.startswith(("+++", "---"))
-    diff_lines = [line for line in lines if is_diff_line(line)]
-    return len(diff_lines)
+    # Count only the diff lines as per is_diff_line logic, but avoid unnecessary function call and list allocation.
+    count = 0
+    for line in diff_output.split("\n"):
+        if line and line[0] in "+-" and not (line.startswith("+++") or line.startswith("---")):
+            count += 1
+    return count

saga4/fix duplication suggestion issue #284

saga4/fix duplication suggestion issue #284

Uh oh!

Conversation

Saga4 commented Jun 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 4, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Jun 4, 2025

PR Code Suggestions ✨

Uh oh!

codeflash-ai bot Jun 4, 2025

Choose a reason for hiding this comment

⚡️Codeflash found 16% (0.16x) speedup for get_diff_lines_count in codeflash/code_utils/formatter.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Saga4 commented Jun 4, 2025 •

edited by github-actions bot

Loading

⚡️Codeflash found 16% (0.16x) speedup for `get_diff_lines_count` in `codeflash/code_utils/formatter.py`