New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Issue 12556 - Add persistent stdio.File.byLine #2077
Conversation
{ | ||
ByLineImpl!(Char, Terminator) impl; | ||
bool gotFront; | ||
immutable(Char)[] line; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the design would be more efficient if instead, it simply returned a Char[]
: No point in forcing the user to allocate "yet again", if he needed a mutable (but unique) buffer to begin with.
To keep things "safe by default", you could have byLineCopy
take Char = immutable(char)
as a parameter by default. This should give us a good mix of safe, but customizeable:
File myFile = ...;
string[] lines = myFile.byLineCopy().array(); //safe by copy
char[][] mutableLines = myFileCopy!char().array(); //efficient, without much overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need an extra overload to byLineCopy
to work, but is completely manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point in forcing the user to allocate "yet again", if he needed a mutable (but unique) buffer
I thought perhaps they would use plain byLine
for that and [i]dup
when ready, but you're right that byLineCopy
would be more useful if it supported mutable copies.
you could have byLineCopy take Char = immutable(char) as a parameter by default
Another option would be to take Front = string
, which might be nicer to allow passing [dw]string
instead of (immutable [dw]char)
as a template argument for multi-byte characters. Mutable copies would pass (char[])
. Not a big difference, but makes the immutable case's syntax a bit nicer.
This would need an extra overload to byLineCopy to work
Do you mean because byLine
's first template argument is Terminator
, not Char
(and byLineCopy
is consistent with that order), or am I misunderstanding you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to take Front = string , which might be nicer to allow passing [dw]string instead of (immutable [dw]char) as a template argument for multi-byte characters. Mutable copies would pass (char[]) . Not a big difference, but makes the immutable case's syntax a bit nicer.
If at all possible, the idea would be to remain consistent with vanilla byLine
. Having one that take a Char
type, while the other a String
type, would be confusing :/
Do you mean because byLine 's first template argument is Terminator , not Char (and byLineCopy is consistent with that order)
Yes, that's exactly what I'm saying. This is something I've thought of introducing in byLine
too anyways, to be able to myFile.byLine!dchar()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll make the Char = immutable char
change.
This is something I've thought of introducing in byLine too anyways, to be able to myFile.byLine!dchar().
I understand now. That would work, although I have wondered about solving this by adding an overload that requires terminator
to be passed explicitly as the first argument:
byLine(Char, Terminator)(Terminator terminator, KeepTerminator kt = KeepTerminator.no)
This would be more flexible as "\r\n"
is usually needed on Windows, and this would be clearer to people who may assume file.byLine()
is portable with line endings.
This overload could replace the other two, which could potentially be deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, my proposed overload switches the existing order of kt
and terminator
as well as Terminator
and Char
. AIUI, your overload would be:
byLine(Char = char)()
So if I wanted Windows newlines with multi-byte chars I would still have to write:
byLine!(dstring, dchar)(KeepTerminator.no, "\r\n"d);
vs:
byLine!dchar("\r\n"d);
edit: dchar -> dstring
Updated with |
Well, I think the design is correct. Any other improvements to simplify the declaration of the I think this function is both more efficient that what we can do with UFCS + map/dup, and more convenient to use. It is newbie friendly, and overall, good to have. So this gets my stamp of approval for inclusion. I would like some feedback from others, before merging anything though. @andralex ? Is Nitpick: If this is approved, I will request you give |
@monarchdodra Thanks for the review. Rebased. |
@property front() | ||
{ | ||
if (!gotFront) | ||
line = impl.front.dup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should set gotFront
to true somewhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, fixed.
@@ -1720,6 +1721,129 @@ the contents may well have changed). | |||
return ByLine!(Char, Terminator)(this, keepTerminator, terminator); | |||
} | |||
|
|||
private struct ByLineCopy(Char, Terminator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this struct really necessary? Couldn't it basically just have been an alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean have byLineCopy
return a RefCounted!(ByLineCopyImpl, ...)
instance? I suppose it might be OK, but when I added the ref-counting for ByLine
@monarchdodra preferred that I didn't use alias this
, and instead added individual forwarding for each method - this avoids exposing any implementation details (such as internal types). Having ByLineCopy
as a separate struct
follows the same principle. I don't really mind either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this struct really necessary? Couldn't it basically just have been an alias?
Technically, no, it's not needed. But arguably, it leaves a hole in the type system. Also, the "end type" would be RefCounted!ByLineCopyImpl
, which could confuse users, who might think the return type was actually a ByLineCopyImpl
, and something else ref counted it. By creating a wrapper ByLineCopy
, we can keep things tidy in terms of type and interface.
ping |
auto byLineCopy(Terminator = char, Char = immutable char) | ||
(KeepTerminator keepTerminator = KeepTerminator.no, | ||
Terminator terminator = '\n') | ||
if (isScalarType!Terminator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this indentation style comes from? It looks absolutely unreadable and does not fit rest of the Phobos I have been reading so far. I'd expect something like this:
auto byLineCopy(Terminator = char, Char = immutable char)
(KeepTerminator keepTerminator = KeepTerminator.no,
Terminator terminator = '\n')
if (isScalarType!Terminator)
{
}
(actually Phobos hard line length limit is 120 as per http://dlang.org/dstyle.html so line breaks here can be reduced/removeD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied the style from byLine. It's quite common in algorithm.d to have the opening bracket on column 0. Anyway, I indented the runtime args as suggested, but kept the constraint unindented - this is more in keeping with stdio.d IMO. Less line breaks would exceed the soft limit of 80, so I prefer it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. Looks like quite lot of stdio
code has style notably different from i.e. std.algorithm
:(
Other than code style - LGTM |
Auto-merge toggled on |
Fix Issue 12556 - Add persistent stdio.File.byLine
This syntax to produce mutable strings is not intuitive, I don't understand it:
And it seems KeepTerminator.no still doesn't work on Windows. |
The syntax was chosen to be consistent with
The unit test passes, can you be more specific? |
@ntrel: I think a syntax byLineCopy!(char, char) is confusing. And the docs don't explain well the situation. (Regarding the newlines, I was wrong, sorry, it's my "special" console on Windows that has tricked me again. On the regular Windows console the bug isn't present). |
I have changed my mind again. If I save this program on Windows with normal Windows newlines (that are 2 chars long each) with the "test.d" name:
It outputs 392 404, this shows that KeepTerminator.no is not stripping away the Windows newlines correctly (it means it's not stripping away both \n and \r every time, that chomp instead is correctly removing). |
I want to improve that syntax (as discussed here), but it's difficult to get agreement. Incidentally, putting the terminator argument first and required would solve your problem below:
You need to pass "\r\n" as the separator, otherwise it uses the default of '\n'. Personally, I don't think we should have a default if it's not portable. |
At least this needs to be written clearly in the ddocs.
Can't byLine/byLineCopy strip the newline with a bit smarter algorithm like chomp is doing? |
In this day and age we got to accept that line ending is a sequence: |
What is the problem in having platform-specific defaults? |
No problem, but it would be a breaking change. The same for making it automatically handle all newline variants. |
I think it's a welcome breaking change. |
But people may have corrected for the trailing CR with byLine() on Windows by calling |
Yes I agree, we can't do breakage that does not catch all problematic cases automatically. |
This is a bit mind-bending. |
It is theoretically possible. It will result in silent will breakage => not an option even if completely unreasonable unless some sort of verification/porting tool is provided. |
@bearophile I mostly agree but I think we'd better get things right in |
Created #2364. |
What's needed in the byLine/byLineCopy documentation is a sign: "Warning: use "\r\n" on Windows otherwise your code will not work, because of a design mistake of those functions.". But even this is not a good enough solution. I think that the breaking change (of stripping away newlines automatically in a bit smarter way using chomp) on the whole will cause less bugs than leaving the current wrong code/design untouched + comment. Sometimes breaking changes are the lesser evil. |
https://issues.dlang.org/show_bug.cgi?id=12556
Add
stdio.File.byLineCopy
, which usesByLineImpl
internally.Char
type.ByLineCopyImpl.front
is cached to avoid unnecessary allocations.byLine
for consistency.