Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it a platform dependent "newline" or a traditional newline? #1092

Closed
briandfoy opened this issue Dec 31, 2016 · 15 comments
Closed

Is it a platform dependent "newline" or a traditional newline? #1092

briandfoy opened this issue Dec 31, 2016 · 15 comments
Labels
docs Documentation issue (primary issue type) update part of "docs" - indicates this is an update for an existing section; rewrite, clarification, etc.

Comments

@briandfoy
Copy link
Contributor

briandfoy commented Dec 31, 2016

Does the term "newline" mean the same thing everywhere in the Perl 6 docs, and if it's to have a specialized meaning, are the docs consistent? There's a lot of historical baggage that comes with the term "newline". This is a bit muddled and I have to run. I'll try to refine it later.

I noticed in the docs for the IO role version of [say], it states:

Print the given text in human readable form, followed by a $*OUT.nl-out (platform dependent newline)

But, the other things that document say are perhaps a bit looser by saying just "newline" when I think they mean "line ending" ("line separator", whatever). When I see "newline", I expect it to be the particular character you get with "\n" even though in many places that's actually LINE FEED. So, I'm wrong to think that, but I think most people are wrong in the same way.

As a different niggling issue, it's not really followed by $*OUT.nl-out, but whatever .print-nl decides to return. That's the difference between playing with black magic and using the intended interface. I'd certainly push mere mortals toward the interface. .print-nl docs in IO::Handle say:

Writes a newline to the filehandle. The newline marker, which is stored
in the attribute C<$.nl-out>, defaults to C<\n> unless another marker has
been specified in the call to L.

The documentation for open doesn't say anything about $*OUT. So, there's that encapsulation issue to deal with.

Then there's this version of say from Mu (which merely dispatches to another say, so kinda another definition but not really):

Prints value to $*OUT after stringification using .gist method with newline at end

And so on. I think a lot of the problem is that there are several places where the same thing is being documented when they are really coming from a common source (even with method adapters). Is there anywhere else other than IO::Handle that implements the actual workings of say? And, is there some other way to document roles, etc to reveal their one true source?

But back to "newline":

There's this line in doc/Language/regexes.pod6, but seems to be the definition closer to what I expect. Although it doesn't say what the single character is, it's something. It also notes that translations take place somewhere else:

C<\n> matches a single, logical newline character. C<\n> is supposed to also
match a Windows CR LF codepoint pair; though it is unclear whether the magic
happens at the time that external data is read, or at regex match time.

The problem I foresee is that people using the various line-ending-appending routines will think they are just getting a LINE FEED when on some other system they get something else. It's sweet that Perl 6 tries to give you the local line ending, but for many of the output files I produce (or consume), I'd like to know what I'm going to get, even if that is the "platform dependent newline".

I don't think it's bad to go one way or the other as long as the docs are abundantly clear on what that is. I think that requires precise language and a decision on which one thing you want the user to mess with to discover the line ending.

@briandfoy briandfoy added the docs Documentation issue (primary issue type) label Dec 31, 2016
@samcv
Copy link
Collaborator

samcv commented Dec 31, 2016

Not sure what you mean by "one true source", but it is in IO::Handle where the newline is supplied. Do you mean the source of nl-out?

@b2gills
Copy link
Contributor

b2gills commented Dec 31, 2016

It's neither platform dependant nor traditional newline.

It is whatever is in the $!nl-out attribute of the output handle you are using ( $*OUT unless you are using the methods on an output handle )

The default value of nl-out is "\n" aka traditional.

open '-', :w returns $*OUT
open IO::Special.new('<STDOUT>') returns a new instance of what is originally in $*OUT

say $*OUT =:= open '-', :w; # True
my $crlf-out = open IO::Special.new('<STDOUT>'), :nl-out("\r\n");
$*OUT.say: 1; # "1\n"
$crlf-out.say: 1; # "1\r\n"

The docs for open should state that it can accept anything that IO::Handle.open accepts.
( I'm not sure why you have to call .new with the path, and then .open with everything else )

@briandfoy
Copy link
Contributor Author

"one true source" is the code that doesn't dispatch to some deeper level to do the work.

@b2gills
Copy link
Contributor

b2gills commented Jan 1, 2017

.nl-out of a handle is only set within IO::Handle.open or by assigning to .nl-out

@briandfoy
Copy link
Contributor Author

@b2gills I think you're discussing something other than what I'm talking about. I'm mostly concerned with the how Perl 6 uses the term "newline" in the docs and if it should be something else. I'm not at all concerned with how a handle chooses what to put at the end of a line.

say (and friends) do whatever they do, but their documentation ends up at different levels and each version is slightly different. That's what I'm aiming to clarify. I figure there's only one place say actually does the work, and its that behavior that should form the documentation.

@coke coke added the update part of "docs" - indicates this is an update for an existing section; rewrite, clarification, etc. label Aug 26, 2017
@JJ
Copy link
Contributor

JJ commented May 4, 2018

As far as I have understood it, nl-out would match Windows \r\n

@JJ JJ modified the milestone: May SQUASHathon May 4, 2018
JJ added a commit that referenced this issue Jun 6, 2018
Which has moved since #1092 was written, which is a good argument to
try and fix issues as soon as possible.
@JJ JJ closed this as completed in 487c679 Jun 6, 2018
@jnthn
Copy link
Contributor

jnthn commented Jun 6, 2018

The commit that closed the ticket doesn't seem to say much about newline handling? Anyway, some notes that can go somewhere (or be checked against the information we have):

  • \n in a string literal means Unicode codepoint 10
  • The default nl-out that is appended to a string by say is also "\n"
  • On output, when on Windows, the encoder will by default transform a \n into a \r\n when it's going to a file, process, or terminal (it won't do this on a socket, however)
  • On input, on any platform, the decoder will by default normalize \r\n into \n for input from a file, process, or terminal (again, not socket)
  • These above two points together mean that you can - socket programming aside - expect to never see a \r\n inside of your program (this is how things work in numerous other languages too)
  • The :$translate-nl named parameter exists in various places to control this transformation (we may need it in more places than it exists so far, though these days that's easy to do, since the decoder and encoder support it, so it's just plumbing)
  • A \n in the regex language is logical, and will match a \r\n

So, outside of the regex language, \n is quite straightforward in its meaning, and transformation is done at the I/O boundary.

@jnthn jnthn reopened this Jun 6, 2018
@JJ
Copy link
Contributor

JJ commented Jun 6, 2018 via email

@briandfoy
Copy link
Contributor Author

Any answer should include how we can turn off that encoder behavior. Just because I'm using Windows to create a file doesn't mean I want \r\n. It's certainly not a requirement of many of the tools I use there.

@JJ
Copy link
Contributor

JJ commented Jun 6, 2018 via email

@briandfoy
Copy link
Contributor Author

As to what I want, it's always to have remove ambiguity and surprises that will turn away newcomers.

When something needs this much documentation and thought I tend to think it's was the wrong choice for behavior.

@jnthn
Copy link
Contributor

jnthn commented Jun 6, 2018

When something needs this much documentation and thought I tend to think it's was the wrong choice for behavior.

"This much documentation"? I listed the entire set of rules around newline handling.

The first two points specify entirely unsurprising things, and the first one I only wrote because there were suggestions it might be something different earlier in the ticket.

The second two specify rules that mean we do the right thing for most cases, meaning there's nothing for the programmer to think about most of the time. Those who do think about it will find they're looking at behavior like is found in many other languages anyway, so there's no distinctive surprises. (Actually, we go a bit further to remove surprises than some language's Windows I/O, where if you happen to write out an explicit \r\n then it'll get turned into \r\r\n!)

The penultimate point specifies an override mechanism. The nature of it is a departure from the stdio-style "binary mode" approach, though I'd argue far more explicit in what it's doing, and picks terminology that doesn't introduce confusion with binary (Buf) vs string (Str) I/O, as specified by :bin.

The logical nature of \n in regexes, which is defined in terms of the Unicode definition of logical newline, is a departure from the traditional meaning of that in regexes, but Perl 6 reforms much regex syntax and semantics anyway.

@rafaelschipiura
Copy link
Contributor

@briandfoy The world is complicated, and the language just works in this case. What do you propose? That the burden be put on the user so that complicated things are kept out of the language?

@JJ
Copy link
Contributor

JJ commented Jun 6, 2018 via email

@rafaelschipiura
Copy link
Contributor

@JJ Go ahead and create a new issue.

JJ added a commit that referenced this issue Jun 7, 2018
@JJ JJ closed this as completed in 3ad88e5 Jun 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation issue (primary issue type) update part of "docs" - indicates this is an update for an existing section; rewrite, clarification, etc.
Projects
None yet
Development

No branches or pull requests

7 participants