Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTP 26 interprets CR ("\r") from standard_io as a line delimiter ("\n"). #8360

Closed
Moosieus opened this issue Apr 9, 2024 · 8 comments
Closed
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@Moosieus
Copy link

Moosieus commented Apr 9, 2024

Describe the bug:
OTP 26 treats singular CR ("\r") characters from standard_io as line delimiters, whereas OTP 25 and earlier versions do not.

  • The issue appears agnostic to using either unicode or latin1 (raw) encoding.
  • I'm able to replicate this behavior on WSL2 running Ubuntu and macOS.
  • The issue doesn't appear reading bytes one by one, such as calling Elixir's IO.binread(:standard_io, 1) (:file.read/2 within).

To Reproduce:
I've made a small repository to reproduce the bug here: https://github.com/Moosieus/crlf_test.

Expected behavior
In the test repo, given the input "Line #1\rLine #2\n" on stdin, reading a line in OTP 26 yields:

"Line #1\n"

whereas OTP 25 yields:

"Line #1\rLine #2\n"

I suppose the expected behavior here's open to discussion. Is the change in behavior intentional and/or desirable?

Affected versions
OTP 26.

@Moosieus Moosieus added the bug Issue is reported as a bug label Apr 9, 2024
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Apr 9, 2024
@garazdawi
Copy link
Contributor

Is the change in behavior intentional and/or desirable?

It was not intentional, though I'm still on the fence about whether it is desirable or not.

The problem is that we do not know if the what is writing to stdin is a pipe (as in your example) or just a very dumb terminal. If it is a pipe, then the old behaviour is correct, but if it is a very dumb terminal then the new behaviour is better.

Given that today it is very uncommon to have very dumb terminals I'm leaning towards calling this a bug and that we should not translate a single \r to \n.

@garazdawi garazdawi self-assigned this Apr 9, 2024
@Moosieus
Copy link
Author

Moosieus commented Apr 9, 2024

A bit of additional context, I came about this behavior working on the Lexical language server. There's a weird bug where the language server misinterprets LSP messages, which are delimited with \r\n.

Nominally, IO.binread(:standard_io, :line) correctly interprets the \r\n delimiters properly.

However when VSCode's under an exceptional amount of load, such as replacing references across 100+ files, it seems to "miss" one of the delimiters, leading a bad offset in reading the protocol. My working theory is that:

  • The operations has VSCode's sending tons of messages.
  • VSCode chugs somewhat writing to stdin.

The consequence is that the language client more slowly writes some_content_here\r|\n to stdin, but the line-read call occurs at the pipe's position, leaving an unexpected extra \n newline. Once the language server loses its offset, the server and client need to be restarted to work again.

All that considered, I'm inclined to agree with your take.

@RaimoNiskanen
Copy link
Contributor

For what it's worth. erl_scan treats a CR not followed by LF pretty much like SPC. The line number isn't incremented.

@garazdawi
Copy link
Contributor

The line that causes this bug is this: https://github.com/erlang/otp/blob/master/lib/kernel/src/group.erl#L1003

The comment above references "Advanced Programming in the Unix Environment, 2nd ed, Stevens, page 638", if anyone has that maybe we can find an answer as to why it is done this way :)

A possibly simple fix for the problem of data being sent as <<"some_content_here\r>>,<<"\n">> could be to ask for more data if a "\r" is seen alone at the end of a chunk.

@essen
Copy link
Contributor

essen commented Apr 10, 2024

That chapter describes special characters, which of them can have their meaning changed, and has this bit about CR:

CR
The carriage return character. We cannot change this character. This character is recognized on input
in canonical mode. When both ICANON (canonical mode) and ICRNL (map CR to NL) are set and IGNCR
(ignore CR) is not set, the CR character is translated to NL and has the same effect as a NL
character. This character is returned to the reading process (perhaps after being translated to a NL).

It might be more of a reference to the table listing and describing special characters in this same chapter, than to the behavior of CR. I don't know. This stuff goes over my head.

@frej
Copy link
Contributor

frej commented Apr 10, 2024

[@essen beat me to it]
I have access to the third edition, the chapter about terminal I/O starts with "The handling of terminal I/O is a messy area, regardless of the operating system" :), in that edition I think the table referred to is Figure 18.9.

My understanding of edit_line/3 is that it is used to provide Erlang application canonical mode input (i.e. line-at-a-time) but with the underlying terminal in non-canonical mode (character-at-a-time). If that's not the case, ignore the following.

The comment simply states that it ignores ^u, ^t, ^d, ^r, ^w which aren't very useful and/or are already defined by edlin. From https://github.com/erlang/otp/blob/master/lib/kernel/src/group.erl#L1001 it seems like we're emulating a canonical mode where IGNCR is set (we simply ignore \r), but on line https://github.com/erlang/otp/blob/master/lib/kernel/src/group.erl#L1003 we do something else. If we get a single \r we behave as is IGNCR is cleared (do not ignore \r) and translate it to a \n which implies that ICRNL is set. Now Erlang doesn't claim to emulate POSIX termios, but something is inconsistent.

@garazdawi
Copy link
Contributor

I did an attempt att fixing this in #8396, would be great if you could test it and see if it solves your problem.

Still need to write some tests before merging.

@garazdawi
Copy link
Contributor

Fix merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

6 participants