-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some obscure Segmentation fault on a long file #20411
Comments
|
On further inspection this:
Which is used as pointer (is originally May actually be truncated somewhere I suspect highly. |
|
I also made reproducible file (that is not copyrighted) - basically it consists of 123416 lines of Here you can get it (if its easier - but you can also copy paste the above line enough times). You can run my https://github.com/AnFunctionArray/cperllexer (basically |
|
#20473 is similar to your #20412 in that it changes the effective type of SSNEW(), but uses a larger type. With U32, on a 64-bit platform this could result in overwriting the start of the save stack. But I don't think this is the base cause of the problem, the save stack entries in save_magic(), and further entries generated during some of the (?{...}) code isn't being cleaned up, so the save stack gets larger and larger and larger. Normally this would be handled by wrapping an ENTER/LEAVE pair around the calling code, but I have vague memories of there being a reason this wasn't being done. Do you have any ideas @iabyn ? |
|
Lets ge the optimization patch merged. |
Nah it was the fix. I don't know if it was a mistake or not tbh. Because @tonycoz said he was going to investigate it further. I don't complain - it did fix things. |
|
@AnFunctionArray i just think we should get the optimization patch your wrote merged, my comment wasnt about this ticket. Just a reminder to @tonycoz and @khwilliamson and me to get your optimization patch reviewed and merged. |
I'm a little bit tired and read it as an of the sort "Lets go the optimization patch merged." - sorry - yeah I don't mind. |
|
On Tue, Nov 01, 2022 at 05:05:04PM -0700, Tony Cook wrote:
Normally this would be handled by wrapping an ENTER/LEAVE pair around
the calling code, but I have vague memories of there being a reason this
wasn't being done. Do you have any ideas @iabyn ?
When code blocks were first added to regexes (before even my time!) it
was decided that 'local' should accumulate across iterations (but be
undone when backtracking) rather than being undone at the end of each
code block.
I've always hated this, as it makes it harder internally (as if code blocks
in patterns wasn't already complex enough...)
…--
Standards (n). Battle insignia or tribal totems.
|
|
On Mon, 7 Nov 2022, 16:37 iabyn, ***@***.***> wrote:
On Tue, Nov 01, 2022 at 05:05:04PM -0700, Tony Cook wrote:
> Normally this would be handled by wrapping an ENTER/LEAVE pair around
> the calling code, but I have vague memories of there being a reason this
> wasn't being done. Do you have any ideas @iabyn ?
When code blocks were first added to regexes (before even my time!) it
was decided that 'local' should accumulate across iterations (but be
undone when backtracking) rather than being undone at the end of each
code block.
I've always hated this, as it makes it harder internally (as if code blocks
in patterns wasn't already complex
It sounds like you think this should be changed, should we dig into it and
see if we can change it?
Yves
… |
|
On Tue, Nov 08, 2022 at 01:31:56AM -0800, Yves Orton wrote:
On Mon, 7 Nov 2022, 16:37 iabyn, ***@***.***> wrote:
> When code blocks were first added to regexes (before even my time!) it
> was decided that 'local' should accumulate across iterations (but be
> undone when backtracking) rather than being undone at the end of each
> code block.
>
> I've always hated this, as it makes it harder internally (as if code blocks
> in patterns wasn't already complex
>
It sounds like you think this should be changed, should we dig into it and
see if we can change it?
Well, it's behaviour that (IIRC) is documented in the camel Book - it's
certainly a feature not a bug. So although it has made my life hard from
time to time when messing in the internals, I've always accepted it and
worked around it. I don't think we could change it without breaking stuff.
…--
"There's something wrong with our bloody ships today, Chatfield."
-- Admiral Beatty at the Battle of Jutland, 31st May 1916.
|
Changing it would certainly break a lot of my stuff, some of which is still in production. I consider it an essential feature for non-trivial recursive regexps such as grammars. |
|
@hvds can you work out a simple example script to demonstrate what this provides? I don't want or intend to break anything, but I would like to understand the intent and background here (and maybe take the time to document it somewhere). Maybe i misunderstand. My understanding is that in code like this: we do not collect locals when the block ends, but we do on backtracking. But i can't quite picture in my mind what this enables exactly. @iabyn described what is supposed to happen and said this is demonstrated in the camel book, but didn't mention where. If you can come up with a simple demo it would be helpful. I will review the camel book, but i suspect since you care about this you can come up with an example fairly directly. |
|
@demerphq That's interesting - I personally don't think I use that. But if it's like this - do perl have destructors - because I could use this possibly as a way to catch backtracking - currently I have this:
|
|
But maybe it could be more elegantly written - with this feature. |
|
@AnFunctionArray What do you mean "catch backtracking"? In theory we could have a code block that executes only when traversed into via backtracking. Eg, something like this: So the |
|
@AnFunctionArray if you are interested in this stuff maybe try reaching out to me on the |
|
@demerphq I'm definitely interested in this stuff maybe I'll join but I've issue with the fact that you must be constantly online to keep with news there. |
I don't have access to the serious examples, those were all at work. My crossword-helper program provides some less serious examples. Throughout, we may use Here's a pattern that matches words that are an anagram of While this matches words that are an anagram of a subset of This matches an anagram of And this matches an anagram of a subset of |
|
@hvds I used to do this but I reckon it was slow so I switched to:
|
|
But I'm not sure how this relates to locals being kept until backtrack (if I understand the feature in question). |
|
@AnFunctionArray we are having a near synchonous conversation in github ticket comments, IMO p5p would make that process quite a bit more efficient. |
|
@demerphq I've written there - It's MAGnet right? |
I still haven't exactly figured out why but here is some debug info nevertheless:
The above is on commit:
Plus my optimisation patch (which btw still cuts around half of the execution time - just FYI)
But it was crashing without it as well (and on blead).
The regex is (the executed part at least):
perl -V:
It's not my RAM running out because I've 23 GBs
Some more info (with -O0 build):
The text was updated successfully, but these errors were encountered: