New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of files with different line terminator character sequences (NL and CR) #182

Open
McSvenster opened this Issue Nov 10, 2013 · 18 comments

Comments

6 participants
@McSvenster

McSvenster commented Nov 10, 2013

ST 3 replaces window line endings.

EG.: When you have a perl script wich contains a regex to remove windows line endings and open it with ST 3 change something and then save it, the regex is destroyed:

$text =~ s/^^M?\n?//mg;

On opening this script the ^M is interpreted as a line break:

$text =~ s/^
?\n?//mg;

When saved the file it changes the regex. See the diff:

+  $text =~ s/^
+?\n?//mg;
@titoBouzout

This comment has been minimized.

Show comment
Hide comment
@titoBouzout

titoBouzout Feb 6, 2014

Member

I cannot repro this.

Member

titoBouzout commented Feb 6, 2014

I cannot repro this.

@McSvenster

This comment has been minimized.

Show comment
Hide comment
@McSvenster

McSvenster Feb 7, 2014

This is amazing - I just tried and could not repro this too. I'll try to find my original file and test this again.

McSvenster commented Feb 7, 2014

This is amazing - I just tried and could not repro this too. I'll try to find my original file and test this again.

@McSvenster

This comment has been minimized.

Show comment
Hide comment
@McSvenster

McSvenster Feb 7, 2014

OK, I cannot repro this, when I copy and paste this code. But when I use a real file I can repro this. Is there a way to attach a file (other then pics)?

McSvenster commented Feb 7, 2014

OK, I cannot repro this, when I copy and paste this code. But when I use a real file I can repro this. Is there a way to attach a file (other then pics)?

@titoBouzout

This comment has been minimized.

Show comment
Hide comment
@titoBouzout

titoBouzout Feb 7, 2014

Member

Send me the file via mail please to tito.bouzoutXgmaaail.com

It would help to give some very specific step by step instructions on how to repro this

Also, please share which value have the settings "default_line_ending" and the syntax file used to open the file.

FYI: I also tested with a file and were unable to repro this.

Member

titoBouzout commented Feb 7, 2014

Send me the file via mail please to tito.bouzoutXgmaaail.com

It would help to give some very specific step by step instructions on how to repro this

Also, please share which value have the settings "default_line_ending" and the syntax file used to open the file.

FYI: I also tested with a file and were unable to repro this.

@titoBouzout

This comment has been minimized.

Show comment
Hide comment
@titoBouzout

titoBouzout Feb 12, 2014

Member

I received the file, thanks! and can confirm ST and many other editors(notepad++, brackets, etc) interpret it as a newline. Maybe is a literal newline? I'm not sure what to expect here. If I set line_endings to unix, then any windows new_line should be converted to a unix new_line.

I think, I understand the problem, maybe escaping the new_lines makes sense?

Member

titoBouzout commented Feb 12, 2014

I received the file, thanks! and can confirm ST and many other editors(notepad++, brackets, etc) interpret it as a newline. Maybe is a literal newline? I'm not sure what to expect here. If I set line_endings to unix, then any windows new_line should be converted to a unix new_line.

I think, I understand the problem, maybe escaping the new_lines makes sense?

@McSvenster

This comment has been minimized.

Show comment
Hide comment
@McSvenster

McSvenster Feb 13, 2014

Thanks for your investigation! I found another formula for the regex, so this is no problem in the project.

But it' s risky when I open files from others that code with vi. I'll check with coda and text mate - if they do it correct maybe there is a solution. If not then we should simply close this issue.

Anyway I love ST and stick to it!

McSvenster commented Feb 13, 2014

Thanks for your investigation! I found another formula for the regex, so this is no problem in the project.

But it' s risky when I open files from others that code with vi. I'll check with coda and text mate - if they do it correct maybe there is a solution. If not then we should simply close this issue.

Anyway I love ST and stick to it!

@titoBouzout titoBouzout added the C: i18n label Jan 2, 2015

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll Jun 17, 2016

Member

@McSvenster do you still have the literal file available? Please attach it to this issue (in the OP or comment).

Member

FichteFoll commented Jun 17, 2016

@McSvenster do you still have the literal file available? Please attach it to this issue (in the OP or comment).

@FichteFoll FichteFoll added the tagme label Jun 17, 2016

@titoBouzout

This comment has been minimized.

Show comment
Hide comment
@titoBouzout

titoBouzout Jun 17, 2016

Member

https://dl.dropboxusercontent.com/u/9303546/SublimeText/bugs/182/test.zip

Github mirror: test.zip

the enclosed file is test.pl: a /usr/bin/perl script text executable
sublime is configured to use unix line endings:
"default_line_ending": "unix",
When you open the file with e.g. vi you see the correct rgex in line 26.
When you open it in subime text 3 the ^M in the regex is interpreted as a line ending.
If you then save the file in st3 and re-open it in vi the regex will be broken.

Member

titoBouzout commented Jun 17, 2016

https://dl.dropboxusercontent.com/u/9303546/SublimeText/bugs/182/test.zip

Github mirror: test.zip

the enclosed file is test.pl: a /usr/bin/perl script text executable
sublime is configured to use unix line endings:
"default_line_ending": "unix",
When you open the file with e.g. vi you see the correct rgex in line 26.
When you open it in subime text 3 the ^M in the regex is interpreted as a line ending.
If you then save the file in st3 and re-open it in vi the regex will be broken.

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll Jun 19, 2016

Member

Okay, so the original file contains a CR (carriage return) character inside the regular expression whereas it uses LF (line feed) characters for the normal line endings.
ST treats either as a line terminator when opening the file and the resulting file will have all its line endings changed to LF, including the original CR character.

This is sane behavior, imo, since not using the same line terminator character sequence is absolutely insane and could almost considered to be a binary file. This is also why there are escape sequences for line terminating characters in pretty much all languages, including PCRE. The proper "solution" to this problem would be to use \r in the regex instead of a binary CR.

I'll rename this issue and leave it open, tagged as "trivial", for the developers to decide whether they want to treat this situation (normalization of file with different line terminators) differently.
Personally, I'd close this as "wontfix".

Member

FichteFoll commented Jun 19, 2016

Okay, so the original file contains a CR (carriage return) character inside the regular expression whereas it uses LF (line feed) characters for the normal line endings.
ST treats either as a line terminator when opening the file and the resulting file will have all its line endings changed to LF, including the original CR character.

This is sane behavior, imo, since not using the same line terminator character sequence is absolutely insane and could almost considered to be a binary file. This is also why there are escape sequences for line terminating characters in pretty much all languages, including PCRE. The proper "solution" to this problem would be to use \r in the regex instead of a binary CR.

I'll rename this issue and leave it open, tagged as "trivial", for the developers to decide whether they want to treat this situation (normalization of file with different line terminators) differently.
Personally, I'd close this as "wontfix".

@FichteFoll FichteFoll changed the title from Windows line endings are replaced to Handling of files with different line terminator character sequences (NL and CR) Jun 19, 2016

@McSvenster

This comment has been minimized.

Show comment
Hide comment
@McSvenster

McSvenster Jun 20, 2016

I thought it should work like it works in vi. But I am no experienced coder...

Anyway many thanks for your investigation!

McSvenster commented Jun 20, 2016

I thought it should work like it works in vi. But I am no experienced coder...

Anyway many thanks for your investigation!

@evandrocoan evandrocoan referenced this issue Nov 23, 2016

Closed

All Line Endings are converted on file save #1505

3 of 3 tasks complete
@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll Nov 23, 2016

Member

Relevant thread with a reply from Jon (as pointed out in #1505 by @keith-hall):

from the thread you linked: https://forum.sublimetext.com/t/all-line-endings-are-converted-on-file-save/5507/11

There is no way to avoid line ending normalization, nor will there be in the future.

Member

FichteFoll commented Nov 23, 2016

Relevant thread with a reply from Jon (as pointed out in #1505 by @keith-hall):

from the thread you linked: https://forum.sublimetext.com/t/all-line-endings-are-converted-on-file-save/5507/11

There is no way to avoid line ending normalization, nor will there be in the future.

@kakaja

This comment has been minimized.

Show comment
Hide comment
@kakaja

kakaja Dec 27, 2017

This phrase "There is no way to avoid line ending normalization, nor will there be in the future." dates from May 2012.
As you could have seen this feature is needed by many. Cases: working in a big team, working with PDF documents, etc.. (Explanations here).
Other editors allow to avoid line ending normalization. Sublime is so flexible and that's why it attracts so many. And as you have seen, many stop using Sublime because of this feature. And I really don't want to use Exlipse or Atom! So I'm asking again to reconsider adding this feature.
Does anyone know a plugin that helps to avoid line ending normalization?

kakaja commented Dec 27, 2017

This phrase "There is no way to avoid line ending normalization, nor will there be in the future." dates from May 2012.
As you could have seen this feature is needed by many. Cases: working in a big team, working with PDF documents, etc.. (Explanations here).
Other editors allow to avoid line ending normalization. Sublime is so flexible and that's why it attracts so many. And as you have seen, many stop using Sublime because of this feature. And I really don't want to use Exlipse or Atom! So I'm asking again to reconsider adding this feature.
Does anyone know a plugin that helps to avoid line ending normalization?

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll Dec 27, 2017

Member

From what I've gathered from various comments about ST internals, removing line ending normalization would require a significant refactor of the internal data structures and would probably also impact all existing plugins that do something over multiple lines, either reading or editing text in a view.

I can see no doubt that there are some people running into issues with this, but I'm confident that the number of these is a small minority of the actual user base. Including yourself and counting upvotes on the issue, 4 people besides OP have expressed their support for this issue so far.
I'm not in the position to make a decision, but if I were I surely would prefer to spend the time and effort required to make this feature work elsewhere else where more people can profit from it. Again, I'm not saying that this is not an issue, just that it is of so little priority and has comparatively very little gain for the amount of work required that it's just not worth it.

Considering this statement, you might be better off using a different tool for your specific task that involves different line ending characters than Sublime Text.

Member

FichteFoll commented Dec 27, 2017

From what I've gathered from various comments about ST internals, removing line ending normalization would require a significant refactor of the internal data structures and would probably also impact all existing plugins that do something over multiple lines, either reading or editing text in a view.

I can see no doubt that there are some people running into issues with this, but I'm confident that the number of these is a small minority of the actual user base. Including yourself and counting upvotes on the issue, 4 people besides OP have expressed their support for this issue so far.
I'm not in the position to make a decision, but if I were I surely would prefer to spend the time and effort required to make this feature work elsewhere else where more people can profit from it. Again, I'm not saying that this is not an issue, just that it is of so little priority and has comparatively very little gain for the amount of work required that it's just not worth it.

Considering this statement, you might be better off using a different tool for your specific task that involves different line ending characters than Sublime Text.

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll Dec 27, 2017

Member

Also, this is certainly not something that I would try to solve with a plugin, because you're entering "play text editor by yourself" area here. You basically need to handle reading, editing and writing in the plugin itself, since otherwise all line ending information would get lost. It would be possible, since Python and the API are very powerful, but could incur lots of edge cases and general instability.

Edit: Actually, using add_regions a plugin could skip the editing part of its implementation and only care about reading & parsing the file to add the relevant regions and later overwriting the saved file with the regions it added before. That would make it feasible again. The only thing to consider is how ST handles reloading of the file when it was modified externally.

Member

FichteFoll commented Dec 27, 2017

Also, this is certainly not something that I would try to solve with a plugin, because you're entering "play text editor by yourself" area here. You basically need to handle reading, editing and writing in the plugin itself, since otherwise all line ending information would get lost. It would be possible, since Python and the API are very powerful, but could incur lots of edge cases and general instability.

Edit: Actually, using add_regions a plugin could skip the editing part of its implementation and only care about reading & parsing the file to add the relevant regions and later overwriting the saved file with the regions it added before. That would make it feasible again. The only thing to consider is how ST handles reloading of the file when it was modified externally.

@SwooshyCueb

This comment has been minimized.

Show comment
Hide comment
@SwooshyCueb

SwooshyCueb Jun 13, 2018

From what I've gathered from various comments about ST internals, removing line ending normalization would require a significant refactor of the internal data structures and would probably also impact all existing plugins that do something over multiple lines, either reading or editing text in a view.

If this is the case, the S: trivial label should probably be removed.

If my understanding is correct, I just ran into an issue with this when manually editing a diff. The files modified by the diff have different line endings, so the diff has mixed line endings. Sublime normalized the line endings in the diff and now it doesn't apply. As far as I can tell, there is no way to revert these changes without restoring from a backup or something.

SwooshyCueb commented Jun 13, 2018

From what I've gathered from various comments about ST internals, removing line ending normalization would require a significant refactor of the internal data structures and would probably also impact all existing plugins that do something over multiple lines, either reading or editing text in a view.

If this is the case, the S: trivial label should probably be removed.

If my understanding is correct, I just ran into an issue with this when manually editing a diff. The files modified by the diff have different line endings, so the diff has mixed line endings. Sublime normalized the line endings in the diff and now it doesn't apply. As far as I can tell, there is no way to revert these changes without restoring from a backup or something.

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll Jun 13, 2018

Member

The S labels stand for severity, and I'd consider both user impact and number of users affected to be quite small. It could be argued that this was to change to minor, but I honestly don't see this happing any time soon if at all due to the quoted section.

Member

FichteFoll commented Jun 13, 2018

The S labels stand for severity, and I'd consider both user impact and number of users affected to be quite small. It could be argued that this was to change to minor, but I honestly don't see this happing any time soon if at all due to the quoted section.

@SwooshyCueb

This comment has been minimized.

Show comment
Hide comment
@SwooshyCueb

SwooshyCueb Jun 14, 2018

Ah yeah if S stands for severity, then trivial is probably the correct choice, as I imagine diffs with mixed line endings aren't that common.

SwooshyCueb commented Jun 14, 2018

Ah yeah if S stands for severity, then trivial is probably the correct choice, as I imagine diffs with mixed line endings aren't that common.

@zoombahh

This comment has been minimized.

Show comment
Hide comment
@zoombahh

zoombahh Jun 19, 2018

Just a painful example: goreplay uses mixed line endings (\r\n for http, \n for its own lines) which breaks these otherwise plaintext files the moment you open/save them. while i dont agree with the format, It took me a while to realize that simply opening and saving a file would break it.

https://github.com/buger/goreplay/wiki/Saving-and-Replaying-from-file#file-format

1 d7123dasd913jfd21312dasdhas31 127345969\n
GET / HTTP/1.1\r\n
\r\n
\n
🐵🙈🙉
\n
POST /upload HTTP/1.1\r\n
Content-Length: 7\r\n
Host: www.w3.org\r\n
\r\n
a=1&b=2

(note the Line Feeds on the first line and around the unicode monkey seperator)

in LF mode theyre invalid requests because http specifies CRLF, in CRLF mode there are no requests becasue goreplay cant find any seperators. treating this as a LF file with with seperate CR characters
(as vim does) would atleast allow editing it without breaking everything

zoombahh commented Jun 19, 2018

Just a painful example: goreplay uses mixed line endings (\r\n for http, \n for its own lines) which breaks these otherwise plaintext files the moment you open/save them. while i dont agree with the format, It took me a while to realize that simply opening and saving a file would break it.

https://github.com/buger/goreplay/wiki/Saving-and-Replaying-from-file#file-format

1 d7123dasd913jfd21312dasdhas31 127345969\n
GET / HTTP/1.1\r\n
\r\n
\n
🐵🙈🙉
\n
POST /upload HTTP/1.1\r\n
Content-Length: 7\r\n
Host: www.w3.org\r\n
\r\n
a=1&b=2

(note the Line Feeds on the first line and around the unicode monkey seperator)

in LF mode theyre invalid requests because http specifies CRLF, in CRLF mode there are no requests becasue goreplay cant find any seperators. treating this as a LF file with with seperate CR characters
(as vim does) would atleast allow editing it without breaking everything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment