Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple option values of regex literal modifiers #13252

Open
straight-shoota opened this issue Mar 31, 2023 · 5 comments
Open

Decouple option values of regex literal modifiers #13252

straight-shoota opened this issue Mar 31, 2023 · 5 comments

Comments

@straight-shoota
Copy link
Member

The compiler currently passes the integer value of the compiler's version of Regex::Options when expanding a RegexLiteral.

For example /cat/i expands to Regex.new("cat", Regex::Options.new(1)) (with Regex::Options::CASELESS = 0x00000001).

This constitutes a strong dependency between the compiler and Regex implementation because they need to agree on what value represents what. With such a tight coupling it's impossible to change the stdlib implementation (as discussed in #13152).

The compiler should use logical names to represent the modifier options instead of integer values.

I expect this can be done in a backwards-compatible way so that the compiler would also be able to build an older stdlib.

Extracted from #13152 (comment)

@straight-shoota
Copy link
Member Author

@bcardiff mentioned in #13152 (comment)

For the regex options I would suggest to add a known method by the compiler. Regex.make_options that would receive boolean named arguments.

What type is returned will only need to match the constructor.

/cat/i becomes Regex.new("cat", Regex.make_options(i: true))

The compiler can check the presence of make_options method. If not defined the current logic/values are used. The new convention would be supported by both engines and decouple the compiler from it going forward.

@straight-shoota
Copy link
Member Author

straight-shoota commented Mar 31, 2023

I think it should also be possible for the compile to compose an option value from named members.

::Regex.new("cat", ::Regex::Options::IGNORE_CASE | ::Regex::Options::MULTILINE)

That means the compiler expects the constants IGNORE_CASE, MULTILINE, and EXTENDED to be defined in Regex::Options and respond to #|. That would be a reasonable interface and doesn't require anything specific for literal integration.

However, there is some value for the regex implementation to know that a pattern comes from a literal. For example to skip UTF validation (ref #13237).
Theoretically the compiler could handle that by always passing the NO_UTF8_CHECK flag. But it's cleaner to let the implementation deal with that if we just pass it the information that it's a literal.

@bcardiff's API proposal should work for that. But we can go a step further to a single method: Regex.literal. This reduces the interface between compiler and stdlib and thus gives more freedom to the implementation.

Regex.literal("cat", i: true)

The .literal method has been discussed before (I don't recall where exactly though) and I think this would be a perfect use case for it.

@Sija
Copy link
Contributor

Sija commented Mar 31, 2023

@bcardiff's API proposal should work for that. But we can go a step further to a single method: Regex.literal. This reduces the interface between compiler and stdlib and thus gives more freedom to the implementation.

Regex.literal("cat", i: true)

The .literal method has been discussed before (I don't recall where exactly though) and I think this would be a perfect use case for it.

Why not use an array instead?

Regex.literal("cat", [:i])

@straight-shoota
Copy link
Member Author

straight-shoota commented Mar 31, 2023

I think it's good that the implementation can declare which flags it supports in the method signature and the compiler just ticks them. This could allow adding support for new modifiers without any changes to the compiler (except recognizing the syntax). It keeps backwards compatibility and produces a compile time error if the implementation does not support a given modifier.
It's a more structured approach than just passing forward the list of modifiers.

@bcardiff
Copy link
Member

👍 on Regex.literal("cat", i: true). Seems like a nice and clear API.

This was referenced Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants