Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant whitespace changes in Python are ignored #587

Open
lazy opened this issue Oct 25, 2023 · 6 comments
Open

Significant whitespace changes in Python are ignored #587

lazy opened this issue Oct 25, 2023 · 6 comments
Labels
bug Something isn't working display

Comments

@lazy
Copy link

lazy commented Oct 25, 2023

image

I verified via --dump-syntax that ASTs are significantly different.
Can compare them using pr -m -t -w 160 <(difft --dump-syntax a.py) <(difft --dump-syntax b.py)

Issue duplicated as text:

$ cat a.py
def test(a, b):
    if a:
        print(a)
        if b:
            print (b)
$ cat b.py
def test(a, b):
    if a:
        print(a)
    if b:
        print (b)
$ diff -u a.py b.py
--- a.py        2023-10-25 12:42:54.671789363 +0000
+++ b.py        2023-10-25 12:43:07.762393299 +0000
@@ -1,5 +1,5 @@
 def test(a, b):
     if a:
         print(a)
-        if b:
-            print (b)
+    if b:
+        print (b)
$ difft a.py b.py
b.py --- Python
No syntactic changes.

Using the latest x64 build from releases page in github.

$ difft --version
Difftastic 0.52.0 (b07e519 2023-10-08)
@Wilfred
Copy link
Owner

Wilfred commented Oct 25, 2023

Oh my goodness, that's awful. Thanks for the report.

@Wilfred
Copy link
Owner

Wilfred commented Oct 25, 2023

This is a display issue: the diffing logic finds changes, but since Python uses indentation for blocks it's not finding anything to display in this case.

$ DFT_LOG=trace difft a.py b.py                                          
 2023-10-25T15:51:54.861Z INFO  difft::options > CLI arguments: ["a.py", "b.py"]
 2023-10-25T15:51:54.875Z INFO  difft::diff::dijkstra > LHS nodes: 16 (1 toplevel), RHS nodes: 16 (1 toplevel)
 2023-10-25T15:51:54.875Z INFO  difft::diff::dijkstra > Saw 58 vertices (a Vertex is 80 bytes), arena consumed 7424 bytes, with 33 vertices left on heap.
 2023-10-25T15:51:54.875Z DEBUG difft::diff::dijkstra > Found a path of 9 with cost 709.
 2023-10-25T15:51:54.875Z DEBUG difft::diff::dijkstra > Initial 5 items on path: [
    "line:2  ...          line:2  ...          --- 100 EnterUnchangedDelimiter { depth_difference: 0 }",
    "line:2 if            line:2  ...          --- 300 EnterNovelDelimiterRHS",

@Wilfred
Copy link
Owner

Wilfred commented Nov 11, 2023

The display logic is assuming that you can't have a structural change without having some novel tokens, which isn't true here.

Every token has a matching token on the other side, which is interpreted as no changes. See hunks::sorted_novel_positions.

@Wilfred
Copy link
Owner

Wilfred commented Nov 11, 2023

E.g. here's how delimiters are handled in the analogous JS.

Screenshot from 2023-11-11 10-59-34

@Wilfred
Copy link
Owner

Wilfred commented Nov 11, 2023

Should probably mark the first node in the changed subtree as novel when this occurs.

@ztane
Copy link

ztane commented Mar 21, 2024

The Python standard library tokenizer module (which you can test with python3 -mtokenize program.py) does produce INDENT and DEDENT tokens; however the dedent tokens it produces have always zero-width lexed content, yes, but the tokenizer marks them found at the new indent depth; however the lexed text for indent token contains the total whitespace from column 1 to the beginning of indented block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working display
Projects
None yet
Development

No branches or pull requests

3 participants