-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What do we mean by "paragraph mode"? #16787
Comments
From @jkeenanSummary: This ticket replaces RT 133703 and proposes improved I. Background: In https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133703, I submitted a ##### While poking around in the code and the test suite, I became convinced: (a) that I didn't understand "paragraph mode" very well; (b) that "paragraph mode" wasn't well documented; (c) that "paragraph mode" wasn't thoroughly tested in the core (d) that as a consequence of (b) and (c), it might contain bugs. I spent several days working on this. I no longer think "paragraph II. Paragraph Mode as Currently Found in the Core Distribution From this point forward I'll treat "paragraph mode" and "setting $/ to In the core distribution $/ (or $INPUT_RECORD_SEPARATOR) is defined in ##### ... for a thorough discussion of paragraph mode in the core ##### III. Proposed Additional Documentation I believe that the best place to put additional discussion of paragraph ##### Note that perlfaq5 is maintained upstream on CPAN, so once P5P is IV. Proposed Additional Testing Please see the program attached: ##### To facilitate discussion, I've written this test program in a modern What is important for discussion now is: Do these tests thoroughly Thank you very much. |
From @jkeenan0001-More-detailed-explanation-of-paragraph-mode.patchFrom dc1d6b22a64e9ddbf003204beef033bf320cbe81 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Wed, 12 Dec 2018 16:52:00 -0500
Subject: [PATCH] More detailed explanation of "paragraph mode"
---
lib/perlfaq5.pod | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/perlfaq5.pod b/lib/perlfaq5.pod
index 7464b26..96470e7 100644
--- a/lib/perlfaq5.pod
+++ b/lib/perlfaq5.pod
@@ -1179,6 +1179,16 @@ C<"\n\n"> to accept empty paragraphs.
Note that a blank line must have no blanks in it. Thus
S<C<"fred\n \nstuff\n\n">> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
+When C<$/> is set to C<""> -- so-called I<paragraph mode> -- and the entire
+file is read in with that setting, any sequence of consecutive newlines
+C<"\n\n"> at the beginning of the file is discarded. With the exception of
+the final record in the file, each sequence of characters ending in two or
+more newlines is treated as one record and is read in to end in exactly two
+newlines. If the last record in the file ends in zero or one consecutive
+newlines, that record is read in with that number of newlines. If the last
+record ends in two or more consecutive newlines, it is read in with two
+newlines like all preceding records.
+
=head2 How can I read a single character from a file? From the keyboard?
X<getc> X<file, reading one character at a time>
--
2.17.1
|
From @jkeenanuse strict; # Test paragraph mode in two ways: my ($OUT, $filename, @chunks, @expected, $msg); { { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { @chunks = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { @chunks = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( { ($OUT, $filename) = open_tempfile(); @expected = ( done_testing(); sub open_tempfile { sub perform_tests { seek $IN, 0, 0; |
From @tonycozOn Wed, 12 Dec 2018 14:41:11 -0800, jkeenan@pobox.com wrote:
If the behaviour of <> with
One suggestion I'd make for the tests is to include a brief description of the test case in the is()/is_deeply() calls, since normal default test failure out includes the name of the test - it doesn't include the note() output. In your case you might pass a test name prefix to perform_tests() and include that as part of the name supplied to is()/is_deeply(). Tony |
The RT System itself - Status changed from 'new' to 'open' |
From @jkeenanOn 12/12/18 11:11 PM, Tony Cook via RT wrote:
Please review the two new patches attached. 0001-More-specific-documentation-of-paragraph-mode.patch Thank you very much. |
From @jkeenan0002-Thoroughly-test-paragraph-mode.patchFrom 5a2ed1015aa3f39bf3a320962d519e57c22a8771 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Thu, 13 Dec 2018 18:29:29 -0500
Subject: [PATCH 2/2] Thoroughly test paragraph mode
For: RT # 133722
---
MANIFEST | 1 +
t/io/paragraph_mode.t | 504 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 505 insertions(+)
create mode 100644 t/io/paragraph_mode.t
diff --git a/MANIFEST b/MANIFEST
index 4276316980..ca5f78cdf3 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5404,6 +5404,7 @@ t/io/layers.t See if PerlIO layers work
t/io/nargv.t See if nested ARGV stuff works
t/io/open.t See if open works
t/io/openpid.t See if open works for subprocesses
+t/io/paragraph_mode.t See if paragraph mode works
t/io/perlio.t See if PerlIO works
t/io/perlio_fail.t See if bad layers fail
t/io/perlio_leaks.t See if PerlIO layers are leaking
diff --git a/t/io/paragraph_mode.t b/t/io/paragraph_mode.t
new file mode 100644
index 0000000000..edbb4cb196
--- /dev/null
+++ b/t/io/paragraph_mode.t
@@ -0,0 +1,504 @@
+#!./perl
+
+BEGIN {
+ chdir 't' if -d 't';
+ require './test.pl';
+ set_up_inc('../lib');
+}
+
+plan tests => 80;
+
+my ($OUT, $filename, @chunks, @expected, $msg);
+
+{
+ # We start with files whose "paragraphs" contain no internal newlines.
+ @chunks = (
+ join('' => ( 1..3 )),
+ join('' => ( 4..6 )),
+ join('' => ( 7..9 )),
+ 10
+ );
+
+ {
+ $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 3 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ );
+ print $OUT $chunks[3];
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ $chunks[3],
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 0 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 1 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 1,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 2 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 2,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+}
+
+{
+ # We continue with files whose "paragraphs" contain internal newlines.
+ @chunks = (
+ join('' => ( 1, 2, "\n", 3 )),
+ join('' => ( 4, 5, " \n", 6 )),
+ join('' => ( 7, 8, " \t\n", 9 )),
+ 10
+ );
+
+ {
+ $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 3 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ );
+ print $OUT $chunks[3];
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ $chunks[3],
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 0 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 1 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 1,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 2 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 2,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+}
+
+{
+ # We continue with files which start with newlines
+ # but whose "paragraphs" contain no internal newlines.
+ # We'll set our expectation that the leading newlines will get trimmed off
+ # and everything else will proceed normally.
+
+ @chunks = (
+ join('' => ( 1..3 )),
+ join('' => ( 4..6 )),
+ join('' => ( 7..9 )),
+ 10
+ );
+
+ {
+ $msg = "'Badly behaved' file: leading newlines; 3 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ );
+ print $OUT $chunks[3];
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ $chunks[3],
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Badly behaved' file: leading newlines; 0 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Badly behaved' file: leading newlines; 1 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 1,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Badly behaved' file: leading newlines; 2 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 2,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+}
+
+{
+ # We continue with files which start with newlines
+ # and whose "paragraphs" contain internal newlines.
+ # We'll set our expectation that the leading newlines will get trimmed off
+ # and everything else will proceed normally.
+
+ @chunks = (
+ join('' => ( 1, 2, "\n", 3 )),
+ join('' => ( 4, 5, " \n", 6 )),
+ join('' => ( 7, 8, " \t\n", 9 )),
+ 10
+ );
+
+ {
+ $msg = "'Very badly behaved' file: leading newlines; internal newlines; 3 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ );
+ print $OUT $chunks[3];
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ $chunks[3],
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Very badly behaved' file: leading newlines; internal newlines; 0 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Very badly behaved' file: leading newlines; internal newlines; 1 final newline";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 1,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+
+ {
+ $msg = "'Very badly behaved' file: leading newlines; internal newlines; 2 final newlines";
+
+ ($OUT, $filename) = open_tempfile();
+ print $OUT "\n\n\n";
+ print $OUT "$_\n" for (
+ $chunks[0],
+ ("") x 1,
+ $chunks[1],
+ ("") x 2,
+ $chunks[2],
+ ("") x 3,
+ $chunks[3],
+ ("") x 2,
+ );
+ close $OUT or die;
+
+ @expected = (
+ "$chunks[0]\n\n",
+ "$chunks[1]\n\n",
+ "$chunks[2]\n\n",
+ "$chunks[3]\n\n",
+ );
+ local $/ = '';
+ perform_tests($filename, \@expected, $msg);
+ }
+}
+
+########## SUBROUTINES ##########
+
+sub open_tempfile {
+ my $filename = tempfile();
+ open my $OUT, '>', $filename or die;
+ binmode $OUT;
+ return ($OUT, $filename);
+}
+
+sub perform_tests {
+ my ($filename, $expected, $msg) = @_;
+ open my $IN, '<', $filename or die;
+ my @got = <$IN>;
+ my $success = 1;
+ for (my $i=0; $i<=$#${expected}; $i++) {
+ if ($got[$i] ne $expected->[$i]) {
+ $success = 0;
+ last;
+ }
+ }
+ ok($success, $msg);
+
+ seek $IN, 0, 0;
+ for (my $i=0; $i<=$#${expected}; $i++) {
+ is(<$IN>, $expected->[$i], "Got expected record $i");
+ }
+ close $IN or die;
+}
--
2.17.1
|
From @jkeenan0001-More-specific-documentation-of-paragraph-mode.patchFrom efd60cd2d95b880edd52d8f2402154d6e9423665 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Thu, 13 Dec 2018 17:42:42 -0500
Subject: [PATCH 1/2] More specific documentation of paragraph mode.
For: RT # 133722
---
pod/perlvar.pod | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/pod/perlvar.pod b/pod/perlvar.pod
index 5faea28062..03b2215b66 100644
--- a/pod/perlvar.pod
+++ b/pod/perlvar.pod
@@ -1487,6 +1487,44 @@ the next paragraph, even if it's a newline.
Remember: the value of C<$/> is a string, not a regex. B<awk> has to
be better for something. :-)
+Setting C<$/> to an empty string -- the so-called I<paragraph mode> -- merits
+special attention. When C<$/> is set to C<""> and the entire file is read in
+with that setting, any sequence of consecutive newlines C<"\n\n"> at the
+beginning of the file is discarded. With the exception of the final record in
+the file, each sequence of characters ending in two or more newlines is
+treated as one record and is read in to end in exactly two newlines. If the
+last record in the file ends in zero or one consecutive newlines, that record
+is read in with that number of newlines. If the last record ends in two or
+more consecutive newlines, it is read in with two newlines like all preceding
+records.
+
+Suppose we wrote the following string to a file:
+
+ my $string = "\n\n\n";
+ $string .= "alpha beta\ngamma delta\n\n\n";
+ $string .= "epsilon zeta eta\n\n";
+ $string .= "theta\n";
+
+ my $file = 'simple_file.txt';
+ open my $OUT, '>', $file or die;
+ print $OUT $string;
+ close $OUT or die;
+
+Now we read that file in paragraph mode:
+
+ local $/ = ""; # paragraph mode
+ open my $IN, '<', $file or die;
+ my @records = <$IN>;
+ close $IN or die;
+
+C<@records> will consist of these 3 strings:
+
+ (
+ "alpha beta\ngamma delta\n\n",
+ "epsilon zeta eta\n\n",
+ "theta\n",
+ )
+
Setting C<$/> to a reference to an integer, scalar containing an
integer, or scalar that's convertible to an integer will attempt to
read records instead of lines, with the maximum record size being the
--
2.17.1
|
From @tonycozOn Thu, 13 Dec 2018 15:44:48 -0800, jkeenan@pobox.com wrote:
That's fine. Tony |
From @jkeenanOn Wed, 19 Dec 2018 03:09:34 GMT, tonyc wrote:
Merged to blead in commits 440af01 and bf8c368. Will monitor for a few days. -- |
From @jkeenanOn Wed, 19 Dec 2018 14:46:07 GMT, jkeenan wrote:
No failures observed; resolving ticket. -- |
@jkeenan - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.30.0, this and 160 other issues have been Perl 5.30.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#133722 (status was 'resolved')
Searchable as RT133722$
The text was updated successfully, but these errors were encountered: