This commit fixes here-docs in single-line re-evals in files (as opposed to evals) and here-docs in single-line quote-like operators inside re-evals. In both cases, the here-doc parser has to look into an outer lexing scope to find the here-doc body. And in both cases it was stomping on PL_linestr (the current line buffer) while PL_sublex_info.re_eval_start was pointing to an offset in that buffer. (re_eval_start is used to construct the string to include in the regexp’s stringification once the lexer reaches the end of the re-eval.) Fixing this entails moving re_eval_start and re_eval_str to PL_parser->lex_shared, making the pre-localised values visible. This is so that the code that peeks into an outer linestr buffer to steal the here-doc body can set up re_eval_str in the right scope. (re_eval_str is used to store the re-eval text when the here- oc parser has no choice but to modify linestr; see also commit db44426.) It also entails making the stream-based parser (i.e., that reads from an input stream) leave PL_linestr alone, instead of clobbering it and then reconstructing part of it afterwards.
Unfortunately, PL_parser->linestr and PL_parser->bufptr are both part of the API, so we can’t just move them to PL_parser->lex_shared. Instead, we have to copy them in sublex_push, to make them visible to inner lexing scopes. This allows the SvIVX(PL_linestr) and SvNVX(PL_linestr) hack to be removed. It should also speed things up slightly. We are already allocating PL_parser->lex_shared in sublex_push, so there should be no need to upgrade PL_linestr to SvNVX as well. I was pleasantly surprised to see how the here-doc code seemed to shrink all by itself when modified to account. PL_sublex_info.super_bufptr is also superseded by the addition of ->ls_bufptr to the LEXSHARED struct. Its old values when localised were not visible, being stashed away on the savestack, so it was harder to use.
PL_parser->herelines needs to be visible to inner lexing scopes, which also need to have their own copy of it, so that the here-doc parser can modify the right herelines variable corresponding to the PL_linestr from which it is stealing its body. (A subsequent commit will take take of that.)
The line numbers for operators after a here-doc marker on the same line were off by the length of the here-doc. This is because the here-doc parser would artificially increase the line number as it went, because it was stealing lines out of the input stream. Instead, we can record the number of lines in the here-doc, and add it to the line number the next time we need to increment it. This also fixes the line numbers after s//<<END/e to the end of the file, which were off because the line number adjusted by the <<END was localised to the s///. Since herelines is visible to inner lexing scopes, the outer lexing scope can see changes made by the inner one. The lack of localisation does cause problems with line numbers inside quote-like operators (but they were off by one already), which will be addressed in subsequent commits.
For re-evals, this is something that broke recently, post-5.16 (the jumbo fix). For other interpolating constructs, this has never worked, as far as I can tell. The lexer was losing track of PL_lex_state (aka PL_parser->lex_state) when parsing formats. Usually, the state alternates between LEX_FORMLINE (a picture line) and LEX_NORMAL (an argument line), but the LEX_NORMAL should actually be whatever the state was before the format started. This commit adds a new parser member to track the ‘normal’ state when parsing a format. It also tweaks S_scan_formline to handle multi-line buffers outside of string eval (such as happens in interpolating constructs). That bufend assignment that is removed as a result is not necessary as of a0d0e21 (perl 5.000). That very commit added a bufend assign- ment after the sv_gets (later filter_gets; later lex_next_chunk) fur- ther down in the loop in scan_formline.
Previously it would leave the file handle open if it was (equal to) stdin, on the assumption that this must have been because no script name was supplied on the interpreter command line, so the interpreter was defaulting to reading the script from standard input. However, if the program has closed STDIN, then the next file handle opened (for any reason) will have file descriptor 0. So in this situation, the handle that require opened to read the module would be mistaken for the above situation and left open. Effectively, this leaked a file handle. This is now fixed, by explicitly tracking from parser creation time whether it should keep the file handle open, and only setting this flag when defaulting to reading the main program from standard input. This resolves RT #37033.
lex_flags holds 4 flag bits, with multiple flag bits manipulated together at times, so they can't be split out into individual bitfields. This change permits the C compiler to generate simpler code, reducing toke.o by about 400 bytes on this platform, but doesn't change the size of the structure. lex_flags was added in commit 802a15e in August 2011, so is not in any stable release.
Before this commit: commit f07ec6d Author: Zefram <firstname.lastname@example.org> Date: Wed Oct 13 19:05:19 2010 +0100 remove filter inheritance option from lex_start The only uses of lex_start that had the new_filter parameter false, to make the new lexer context share source filters with the previous lexer context, were uses with rsfp null, which therefore never invoked source filters. Inheriting source filters from a logically unrelated file seems like a silly idea anyway. string evals could inherit the same source filter space as the cur- rently compiling code. Despite what the quoted commit message says, sharing source filters allows filters to be inherited in both direc- tions: A source filter created when the eval is being compiled also applies to the file with which it is sharing its space. There are at least 20 CPAN distributions relying on this behaviour (or, rather, what could be considered a Test::More bug). So this com- mit restores the source-filter-sharing capability. It does not change the current API or make public the API for sharing source filters, as this is supposed to be a temporary stop-gap measure for 5.14.
New API function parse_label() parses a label, separate from statements. If a label has not already been lexed and queued up, it does not use yylex(), but parses the label itself at the character level, to avoid unwanted lexing past an absent optional label.
PL_doextract had two unrelated jobs, neither best served by an interpreter global variable. The first was to track the -x command-line switch. That is replaced with a local variable in S_parse_body(). The second was to track whether the lexer is in the middle of a =pod section. That is replaced with an element in PL_parser.
…d make it reference counted. Properly solves [perl #66094]
Attached is a patch that adds a public API for the lowest layers of lexing. This is meant to provide a solid foundation for the parsing that Devel::Declare and similar modules do, and it complements the pluggable keyword mechanism. The API consists of some existing variables combined with some new functions, all marked as experimental (which making them public certainly is).
Re-order struct yy_stack_frame to save space on LP64 systems. p4raw-id: //depot/perl@31618
p4raw-link: @31615 on //depot/perl: 503de47 p4raw-id: //depot/perl@31616
Change 22306# inadvertently made 'local $[' statement-scoped rather than block-scoped; so revert that change and add a different fix. The problem was to ensure that the savestack got popped correctly while popping errored tokens. We how record the current value of PL_savestack_ix with each pushed parser state. p4raw-id: //depot/perl@31615