Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End of line corruption when writing bracket comment #26

Closed
ianbenz opened this issue Feb 25, 2018 · 17 comments
Closed

End of line corruption when writing bracket comment #26

ianbenz opened this issue Feb 25, 2018 · 17 comments

Comments

@ianbenz
Copy link

ianbenz commented Feb 25, 2018

I am seeing an issue since the last release when formatting files in-place with bracket comments and Windows line endings such as this CMakeLists.txt. When formatted in-place, the lines of the bracket comment pick up an extra carriage return, but the rest of the file is written correctly. The output seems correct when writing to stdout or using Unix line endings.

@cheshirekow
Copy link
Owner

cheshirekow commented Feb 25, 2018

This is not a surprise. I kind of expected it actually. The latest release changed some of the lexing rules and I wasn't sure what would happen with windows line endings. Honestly I wasn't even sure if any software still writes out windows line endings. If you need a quick fix and are comfortable editing code try changing

https://github.com/cheshirekow/cmake_format/blob/master/cmake_format/lexer.py#L90

from r"\n" to r"\r*\n". and remove the \r from line 93. I'll test this change against your listfile when I get a chance and push an update.

Also, sorry... I should have really tested that ;).

@cheshirekow
Copy link
Owner

On second thought, I guess \r?\n would be better... in which case maybe leave the \r in line 93. Also note that cmake-format will only output unix line endings. Maybe I should make windows line endings an option...

@ianbenz
Copy link
Author

ianbenz commented Feb 25, 2018

Thanks for putting up the suggestions! Unfortunately, both yielded the same output as before on my Windows machine with Python 3.6. Looking more carefully, when I feed in a file with Unix line endings, it actually gets reformatted to have all Windows endings. Maybe this is platform dependent behavior?

I maintain scripts that are formatted both ways, so it would be a really nice feature if the ending style was detected and preserved in the output.

@cheshirekow
Copy link
Owner

oh, interesting. The fact that cmake-format outputs windows line endings is very surprising to me. Maybe it's not the lexer change, but the change from plain open to codecs.open or io.open. Thanks for tryiing the suggestion and I guess I'll have to dig in more. I wonder if it has to do with unversal-newlines

@cheshirekow
Copy link
Owner

Looking into this now I'm unable to replicate the problem with python 3.5.2 on linux (even with windows line endings). I might have to spin up a virtual machine to look in windows. What version of python are you using?

And just to make sure I understand the problem, what you're seeing is that every line in the file ends with \r\r\n? Do you mind attaching the output you're getting so that I can see?

@ianbenz
Copy link
Author

ianbenz commented Feb 26, 2018

I am using Python 3.6.4 64-bit from the standard installer.

Only the lines in a block comment in a file with Windows line endings are converted to \r\r\n. The remaining lines in the file retain the \r\n ending. The other strange and possibly related issue is that all line endings in a Unix file are converted from \n to \r\n.

Let me show show you input and output from the same script with Windows and Unix endings:
CMakeLists-Windows.txt -> CMakeLists-Windows.txt
CMakeLists-Unix.txt -> CMakeLists-Unix.txt

I'm no Python expert, but I am happy to help you test any ideas on my machine.

@cheshirekow
Copy link
Owner

Ok, this is very helpful. With both 2.7.12 and 3.5.2 on linux I do not see this issue. Windows is has spent the past 15 minutes booting inside virtualbox. If it ever decides to start I'll try to replicate the problem there.

It is very interesting that only the lines within the bracket comment are corrupted. This suggests to me that it might have something to do with the interaction between codecs.open and re.scanner. Bracket comments and bracket arguments are parsed as a complete tokens, and are the only tokens other than NEWLINE that consume newlines. When they are printed to the file, they are printed verbatim.

I'll continue to investigate, but in the meantime can you try the following test for me:

python -Bm cmake_format.lexer CMakeLists-Windows.txt

You should see something like this:

Token(type=BRACKET_COMMENT, content=#[[*****************************************************************************
* Information line 1
* Information line 2
******************************************************************************]], line=1, col=0)
Token(type=NEWLINE, content=
, line=4, col=80)
Token(type=NEWLINE, content=
, line=5, col=0)
Token(type=WORD, content=project, line=6, col=0)
Token(type=LEFT_PAREN, content=(, line=6, col=7)
Token(type=WORD, content=Test, line=6, col=8)
Token(type=RIGHT_PAREN, content=), line=6, col=12)
Token(type=NEWLINE, content=
, line=6, col=13)
Token(type=WORD, content=set, line=7, col=0)
Token(type=LEFT_PAREN, content=(, line=7, col=3)
Token(type=WORD, content=var, line=7, col=4)
Token(type=WHITESPACE, content= , line=7, col=7)
Token(type=WORD, content=value, line=7, col=8)
Token(type=RIGHT_PAREN, content=), line=7, col=13)

@cheshirekow
Copy link
Owner

ah! I'm on windows now and testing and It seems to be something specific about tempfile.NamedTemporaryFile. I get different results if I open a file for writing with codecs.open and tempfile.NamedTemopraryFile. I think that's a start.

@cheshirekow
Copy link
Owner

Alright, if you just need a quick fix you can change lines 224-225 of main.py to:

      outfile = tempfile.NamedTemporaryFile(delete=False, mode='wb')
      outfile = codecs.getwriter('utf8')(outfile)

Note that this will (unfortunately) make your output files unix line endings (except for the contents of the bracket comment, which will be windows line endings verbatim.

I think that what I'll do is make line ending a setting and I'll make it so that bracket comments are at least line-ending corrected and trailing-whitespace-chomped rather than just copying them input -> output verbatim.

Also, I'll probably stop using tempfile.NamedTemporaryFile in order to get consistent behavior between -o and -i.

@ianbenz
Copy link
Author

ianbenz commented Feb 27, 2018

Great! Thanks for putting up the quick fix. It works for me with the same caveat you described.

@cheshirekow
Copy link
Owner

@iabenz if you're willing to test a change would you mind checkout out and testing the v0.3.4rc1 branch I just pushed: https://github.com/cheshirekow/cmake_format/tree/v0.3.4rc1

I'll also test on windows myself sometime soon.

@ianbenz
Copy link
Author

ianbenz commented Mar 2, 2018

Sure, I gave the new branch a try with the CMakeLists-Windows.txt and CMakeLists-Unix.txt files I had used earlier. Let me describe what I'm seeing now:

--line-ending setting Input line ending Output line ending
unix windows windows
unix unix windows
windows windows mixed
windows unix mixed

mixed output: CMakeLists-Windows-Unix.txt

@cheshirekow
Copy link
Owner

oh bummer... ok, reading the documentation for io.iopen() it looks like there's a newline parameter. If not specified it will translate to the system default. I think that is what is happening here. Thanks so much for checking out the branch and testing. I'll update with setting newline and see if that gets consistent behavior.

@cheshirekow
Copy link
Owner

cheshirekow commented Mar 5, 2018

Ok, I believe that adding newline='' to io.open fixes the problem. I've pushed a new commit to v0.3.4rc1 branch which fixes this and one other error. I tested it one windows with --line-endings unix and --line-endings windows and I believe that I'm now seeing consistent behavior. @iabenz if you don't mind could you give this change a try and let me know if the problem is fixed for you. The commit is ebac11c7d77baaa9981b96e29a6ca06ebdb52065 (head of v0.3.4rc1).

@ianbenz
Copy link
Author

ianbenz commented Mar 5, 2018

Nice! Your fix works great on my machine with all combinations as well. Thanks for tracking this down.

@ianbenz
Copy link
Author

ianbenz commented Mar 5, 2018

I know this has already been an ordeal, but would you consider adding an option like --line-ending auto? I often find it useful that clang-format will just keep the original line ending when reformatting.

@cheshirekow
Copy link
Owner

Seems easy-enough. I'll add it to the ol' TODO list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants