Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do with the empty regex initializer? #23822

Closed
jabraham17 opened this issue Nov 9, 2023 · 4 comments · Fixed by #23832
Closed

What to do with the empty regex initializer? #23822

jabraham17 opened this issue Nov 9, 2023 · 4 comments · Fixed by #23832

Comments

@jabraham17
Copy link
Member

jabraham17 commented Nov 9, 2023

While doing a pass through the standard modules for documented/public symbols missing docstrings, I came across this initializer in Regex.regex.

  proc init(type exprType) {
    this.exprType = exprType;
  }

This shows up in our docs, but its very easy to write surprising code that breaks with it. The following example compiles and runs, but then segfaults on writeln(r) inside c++ std::string. This is arguably a misuse of the code, in that we didn't really initialize anything so of course it segfaults, but there is no documentation on it.

use Regex;
var r = new regex(string);
writeln(r);

The only place I see this initializer being used is in the regex.chpl__deserialize. Is this useful to users? If not, can we just nodoc it? It was added in #14686 by @e-kayrakli

Also tagging @benharsh

@e-kayrakli
Copy link
Contributor

I'm ok with no-doc'ing it. chpl__deserialize's use is not a good precedent. What it does with the quasi-initialized regex is also not user-facing.

The only use case I can think of is to create a dummy empty regex for a sentinel value or a dummy return, but the way to do it should be new regex("").

@benharsh
Copy link
Member

benharsh commented Nov 9, 2023

This initializer appears to be the regex type's default initializer, so it's not quite as weird or edge-case-y as its usage by chpl__deserialize might suggest. I think documenting this initializer and putting some kind of .. warning:: could help users who still accidentally write code like the above example.

An even better next step (separate from this issue) would be to add some better checking for this case within regex's implementation and issue some better errors.

@mppf
Copy link
Member

mppf commented Nov 13, 2023

I would imagine this initializer comes up primarily with things like var x: regex; and that it's significantly more common in that form than new regex(string) etc. Generally speaking, I think it's reasonable for regex to be default-initializable; and that it should not core dump when you go to write one of those. Of course, the alternative idea is to prevent it from being default-initializable, but IMO that is not called for here.

@jabraham17
Copy link
Member Author

It seems like the general consensus is that this should stay in the public interface and be documented, and that we should fix the bug.

The bug comes from a nullptr dereference in the re2 runtime shim, where regex->regex is a nullptr. writeln triggers 1 of the possible errors. There are a few other ways to hit this. I will patch these and open a PR with the bug fix and the documentation (with a warning about possible erroneous behavior when used incorrectly)

jabraham17 added a commit that referenced this issue Nov 13, 2023
Fixes multiple possible nullptr dereferences in the RE2 runtime shim,
which could cause uninitialized `regex`s to segfault when used.

Also documents the default initializer `regex.init(type exprType)`,
which was publicly visible but had no docstring.

## Testing
- Built docs locally
- paratest with comm=none
- paratest with comm=gasnet

Resolves #23822

[Reviewed by @dlongnecke-cray]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants