Skip to content

Add limits to the size of the string repetition multiplier #23561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

richardleach
Copy link
Contributor

Historically, given a statement like my $x = "A" x SOMECONSTANT;, no examination of the size of the multiplier (SOMECONSTANT in this example) was done at compile time. Depending upon the constant folding behaviour, this might mean:

  • The buffer allocation needed at runtime could be clearly bigger than the system can support, but Perl would happily compile the statement and let the author find this out at runtime.
  • Constants resulting from folding could be very large and the memory taken up undesirable, especially in cases where the constant resides in cold code.

This commit adds some compile time checking such that:

  • A string size beyond or close to the likely limit of support triggers a fatal error.
  • Strings above a certain static size do not get constant folded.

Things could obviously still go bad at runtime when the multiplier isn't a simple
constant, but that's not for this PR.

Closes #13324, closes #13793, closes #20586.

Besides general correctness checking, the arbitrary cut-off numbers are up for discussion, and please could reviewers suggest any improvements to the_perldiag.pod_ that come to mind.


  • This set of changes requires a perldelta entry, and I will write one post-bikeshedding.

Historically, given a statement like `my $x = "A" x SOMECONSTANT;`, no
examination of the size of the multiplier (`SOMECONSTANT` in this example)
was done at compile time. Depending upon the constant folding behaviour,
this might mean:
* The buffer allocation needed at runtime could be clearly bigger than
  the system can support, but Perl would happily compile the statement
  and let the author find this out at runtime.
* Constants resulting from folding could be very large and the memory
  taken up undesirable, especially in cases where the constant resides
  in cold code.

This commit adds some compile time checking such that:
* A string size beyond or close to the likely limit of support triggers
  a fatal error.
* Strings above a certain static size do not get constant folded.
@richardleach richardleach force-pushed the hydahy/const_fold_repeatmax branch from 953cf3f to 3725af9 Compare August 10, 2025 21:19
@guest20
Copy link

guest20 commented Aug 10, 2025

One of the mottos of perl, at least when I picked up the book, was "no internal limits".

Just to get them out of the way, I'm going to front load these:

  • "back in my day we walked 10 miles up hill in the snow both ways"
  • "no airbags, we die like men"
  • "sounds like a skill issue"
  • "What's next, do I have to get a licence to make toast in my own goddamn toaster?!"
  • etc

More constructively:

  • I don't like the idea of "random" string operators becoming fatal, so
  • I would rather that x did a lazy thing instead
  • "Unrealistically large string repetition value" is pretty subjective. My machine has infinite amount of paper tape (in both directions)... and/or Gbs of ram/swap.
  • Heck, in the case of x, something as simple as some tie/magic with gzip or even just run-length encoding could be enough to save the memory while allowing ones program to dump out 0+[] bytes worth of y's for zip bombs or http buffer overruns or piping to apt-get

@richardleach
Copy link
Contributor Author

Thanks for the feedback, @guest20. I can see that the discussion might have to balance what people could conceivably do with improving the guardrails around the patterns of usage we can observe on CPAN / other public code repositories.

  • "Unrealistically large string repetition value" is pretty subjective. My machine has infinite amount of paper tape (in both directions)... and/or Gbs of ram/swap.

More than SIZE_MAX >> 2 usable RAM/swap and you would be happy using it?

  • Heck, in the case of x, something as simple as some tie/magic with gzip or even just run-length encoding

Yes, the user doing something funky with magic is definitely worth considering.

This PR is checking for a right operand that is CONST (and implicitly that won't have any magic attached). The left operand would therefore have to have magic attached.

Perhaps the code should check for a CONST left operand too? If we did that, it might not cover as many cases, but we could be sure of no magic - in which case, there's also a threshold of IV_MAX, because that's the biggest count that pp_repeat (currently) supports.

@guest20
Copy link

guest20 commented Aug 11, 2025

More than SIZE_MAX >> 2 usable RAM/swap and you would be happy using it?

Yes. And more specifically:

That scene from emperors new groove, in which Kuzco is llama lashed to the back of a log, about to fall over a waterfall and he says "bring it on"

Yes, the user doing something funky with magic is definitely worth considering.

No, what I mean that x† could do the magic, to give back a "lazy string"... so a caller could =~, substr, utf-8 length etc without it needing to allocate / consume all my ram‡

This kind of lazyboi could even be suitable for constant folding, since it has a known truthyness at compile time ("" x ... on one side, or ... x 0 on the other)

__
†. the everything operator?
‡. though I don't mind if perl uses all my memory, it's mostly just being used to maintain hundreds of firefox tabs

@book
Copy link
Contributor

book commented Aug 11, 2025

I like the fact that Perl gives you enough rope to shoot yourself in the foot.

I also don't think this patch solves either of the issues mentioned in the tickets linked in the commit (#13793 and #20586).

@richardleach
Copy link
Contributor Author

More than SIZE_MAX >> 2 usable RAM/swap and you would be happy using it?

Yes.

Care to share details of the platform you're running on to help me understand use cases better?

No, what I mean that x† could do the magic, to give back a "lazy string"... so a caller could =~, substr, utf-8 length etc without it needing to allocate / consume all my ram‡

Ah, not magic in the SvMAGICAL sense, instead a redesign of the repetition operator into some kind of iterator-thing?

Last time I grepped CPAN for this, there was definitely usage where it seemed like people would expect the whole string - e.g. for preparing a buffer or some kind of initialization - and not some iterator behaviour. Maybe that would be more of a feature request for a separate operator?

though I don't mind if perl uses all my memory, it's mostly just being used to maintain hundreds of firefox tabs

That's what i use my RAM for.

@richardleach
Copy link
Contributor Author

I also don't think this patch solves either of the issues mentioned in the tickets linked in the commit (#13793 and #20586).

What would you be looking for to resolve those tickets or declare them "wontfix"?

Both of those tickets are about constant folding producing huge strings, possibly in rarely-taken or even never-taken branches, and that memory use being undesirable. The options suggested there seemed to be:

  • Leave the existing behaviour alone, people get to shoot themselves in the foot and enjoy it.
  • Don't constant fold above a certain threshold [which is what this PR does for a CONST right operatnd at compile time - happy to warn instead of croak though]
  • Some kind of lazy constant folding at run time the first time a branch is encountered. (Not sure what this would look like on a threaded build.)

@richardleach
Copy link
Contributor Author

(I've pushed a commit changing the DIE to a warning, in case that's helpful to the discussion.)

@richardleach richardleach force-pushed the hydahy/const_fold_repeatmax branch from 033a494 to fdc3bd8 Compare August 11, 2025 23:18
For discussions on Perl#23561.

perl -e 'use warnings; my $x = ($_) ? "A" x (2**62) : "Z"'

gives this on blead for me:
```
Out of memory!
panic: fold_constants JMPENV_PUSH returned 2 at -e line 1.
```

on the previous commit, it would die:
```
Unrealistically large string repetition value"
```

With this commit, it just warns:
```
Unrealistically large string repetition value at -e line 1.
```

but will blow up if the repetition OP does get executed:
```
Out of memory in perl:util:safesysrealloc
```
@richardleach richardleach force-pushed the hydahy/const_fold_repeatmax branch from fdc3bd8 to 8fa7f8e Compare August 11, 2025 23:39
@guest20
Copy link

guest20 commented Aug 11, 2025

Ah, not magic in the SvMAGICAL sense, instead a redesign of the repetition operator into some kind of iterator-thing?

Well, to maintain back-compat it'd have to be full of tie-magic so the caller got the corresponding face full of bytes when they print or maybe . the lazy object, etc... I was thinking it as more of a Promise, but you might be right, it's closer to an iterator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants