Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.byLine terminator string #9895

Closed
dlangBugzillaToGithub opened this issue Dec 26, 2010 · 9 comments
Closed

File.byLine terminator string #9895

dlangBugzillaToGithub opened this issue Dec 26, 2010 · 9 comments

Comments

@dlangBugzillaToGithub
Copy link

bearophile_hugs reported this on 2010-12-26T07:11:51Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=5378

CC List

Description

This is the signature of File.byLine:

ByLine!(Char,Terminator) byLine(Terminator = char, Char = char)
(KeepTerminator keepTerminator = KeepTerminator.no, Terminator terminator = '\x0a'); 

But on Windows the line terminators are 2 chars long (CR+LF), see:
http://en.wikipedia.org/wiki/Newline#Representations

So I think the second argument of argument byLine() needs to be a string.

This is code I expected to use, that currently is not accepted:

import std.stdio;
void main() {
    auto lines = File("test.txt").byLine(File.KeepTerminator.no, "\r
");
}

----------------

After that bug report, a little enhancement request: generally on Windows I usually open files with Windows-style line terminators, while on Linux I open files with Unix-style line terminators, so if possible a better default for the second argument of byLine() is a string constant that changes according to the operating system.

----------------

A workaround is to open the file in text mode, but I don't know if this works well if you want to open a Windows-style file on Linux:


import std.stdio;
void main() {
    auto lines = File("test.txt", "r").byLine();
}
@dlangBugzillaToGithub
Copy link
Author

andrei (@andralex) commented on 2013-01-08T01:11:43Z

This is by design. The length name is special and defined to return size_t compulsively. You may want to choose a different name instead.

@dlangBugzillaToGithub
Copy link
Author

andrei (@andralex) commented on 2013-01-08T01:12:15Z

Oops, wrong window.

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2013-03-16T01:40:03Z

Would it be acceptable if we special-cased byLine to strip a trailing \r if the terminator is \n?

Often, the programmer doesn't know beforehand if the line terminator of a text file will be \r\n or \n. A behavior close to that of splitLines would be more useful than forcing the programmer to choose an exact terminator sequence.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2013-03-17T17:44:06Z

(In reply to comment #3)
> Would it be acceptable if we special-cased byLine to strip a trailing \r if the
> terminator is \n?

Probably the problem presented in this issue has various solutions.

@dlangBugzillaToGithub
Copy link
Author

andrej.mitrovich (@AndrejMitrovic) commented on 2013-03-18T12:11:08Z

*** Issue 9750 has been marked as a duplicate of this issue. ***

@dlangBugzillaToGithub
Copy link
Author

nick (@ntrel) commented on 2013-08-08T07:18:09Z

https://github.com/D-Programming-Language/phobos/pull/1458

@dlangBugzillaToGithub
Copy link
Author

github-bugzilla commented on 2013-08-11T19:19:36Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/7f76586e16623894c7b6119014f76ed7bfef527e
Fix Issue 5378 - Make File.byLine accept a string terminator

Add an overload of byLine without a default argument for terminator.
Before, byLine!string tried to instantiate "string terminator = '\n'",
which is invalid.

Note: This removes the default arguments from ByLine.this, but the
constructor was never documented anyway (unlike the range primitives).

https://github.com/D-Programming-Language/phobos/commit/5576d899af510de798b0d5aaa8bd13e6caebfbce
Merge pull request #1458 from ntrel/byLine-crlf

Fix Issue 5378 - Make File.byLine accept a string terminator

@dlangBugzillaToGithub
Copy link
Author

rumbu commented on 2021-02-11T08:37:04Z

Default behaviour is unexpected, at least on Windows where most files contain lines ended in \r\n.

Use of lineSeparator enum for cross platform development does not guarantee that the file you are processing contain only the lineSeparator terminator.

If I ask byLine, I expect to obtain a line not something else, not a line ended with another line terminator. What if my file contains some lines ended in \r and other line ended in \r\n?

Default behaviour must strip terminators and must consume all known line separators, there is no point to discriminate between them:
- 0x0d
- 0x0d\0x0a
- 0x0a
- Unicode categories Zl, Zp.

@thewilsonator thewilsonator removed P3 OS:Windows Issues Specific to Windows Arch:x86 Issues specific to x86 labels Dec 5, 2024
@thewilsonator
Copy link
Contributor

This was fixed by #1458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants