Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex Literal Pitch v2 #187

Merged
merged 36 commits into from
Apr 13, 2022
Merged

Conversation

hamishknight
Copy link
Contributor

No description provided.

Documentation/Evolution/DelimiterSyntax.md Outdated Show resolved Hide resolved
Documentation/Evolution/DelimiterSyntax.md Outdated Show resolved Hide resolved
Documentation/Evolution/DelimiterSyntax.md Outdated Show resolved Hide resolved
Documentation/Evolution/DelimiterSyntax.md Outdated Show resolved Hide resolved
Documentation/Evolution/DelimiterSyntax.md Outdated Show resolved Hide resolved
@hamishknight
Copy link
Contributor Author

I vote for insisting on raw syntax here. Every little heuristic like this is a strike against elegance. It's something that every editor will have to deal with. Raw literals are an obvious extension, and are probably easier to implement.

Yeah, I think I agree. I guess it ultimately depends on what the raw syntax rules regarding backslash end up being, if they end up treating backslashes as literal, we may not want to recommend raw syntax, as that would require changing any escape sequences written. We would likely want the user to use the alternative spellings e.g (?<...>) instead.

However even if this is not something we support, I still feel there is some value in the compiler at least implementing the heuristic, as it only impacts invalid code, and allows us to effectively emit a diagnostic with a fix-it to change the regex. Though I don't have a good sense of how common the (?'...') syntax is in the wild.

@hamishknight hamishknight changed the title Delimiter Syntax Pitch Regex Literal Pitch v2 Mar 29, 2022
@hamishknight hamishknight marked this pull request as ready for review April 8, 2022 21:52
More details and word smithing.

### Named typed captures

Regex literals have their capture types statically determined by the capture groups present. This follows the same inference behavior as [the DSL][regex-dsl], and is explored in more detail in *[Strongly Typed Captures][strongly-typed-captures]*. One aspect of this that is currently unique to the literal is the ability to infer labeled tuple elements for named capture groups. For example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly typed captures isn't a proposal itself. Should we link the DSL proposal or else talk about it more here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do link the DSL here ([the DSL][regex-dsl]), should it be used as the main reference for the typed capture behavior? I was hesitant to talk more about typed captures here, as there's quite a bit to cover, and I believe most of it is shared behavior with the DSL. But maybe an overview would be reasonable?

hamishknight and others added 2 commits April 11, 2022 10:43
Co-authored-by: Michael Ilseman <michael.ilseman@gmail.com>

```swift
let regex = #/
/usr/lib/modules/ # Prefix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is a slightly different regex to the single-line version as it includes a / at the start, we probably ought to be consistent, what do you think? I mainly avoided it in the single-line version as GitHub syntax colors it as a comment 😬

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any extra slashes are unintentional and the natural outcome of using slash as both part of a delimiter and an interior character.

It's interesting that #//usr/local/# and #/user/local//# would sensibly be lexed as comments by source tools. That's a downside of / we should mention too. I suppose the workaround is to escape the slash if it's at the very beginning or end, which is another little wrinkle that #/.../# doesn't alleviate.


### Extended delimiters `#/.../#`, `##/.../##`

Backslashes may be used to write forward slashes within the regex literal, e.g `/foo\/bar/`. However, this can be quite syntactically noisy and confusing. To avoid this, a regex literal may be surrounded by an arbitrary number of balanced octothorpes. This changes the delimiter of the literal, and therefore allows the use of forward slashes without escaping. For example:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call # a pound symbol elsewhere in the proposal, should we be consistent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW Unicode calls it "number sign", which is less confusable with £ (whose official name is "POUND SIGN"). How does "number sign" consistently sound? Even in the US, we don't dial numbers that often anymore, so I expect "pound sign" to have limited life span, especially in the era of hashtags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use "number sign", though a couple of occurrences were phrased as "number of ...", which sounded a bit awkward, so I changed those to use the # character directly.

Standardize on "number signs" for mentions of `#`
(though a couple of them read better as just the
character). Also change the multi-line example to
not include a `/` at the start, which matches the
single-line version.
@airspeedswift
Copy link
Member

This is looking good, the only things I think might want tweaking:

  • I would spend less time on comments. It's a challenge, but not a major one
  • I think it's worth acknowledging in the alternatives considered that in some communities such as Perl, the / syntax has become less popular.

- Add Source Compatibility section
- Condense comment syntax ambiguity section
- Mention `/.../` being less popular in some
communities
And remove the old version of the pitch.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

@hamishknight hamishknight merged commit 0338178 into apple:main Apr 13, 2022
@hamishknight hamishknight deleted the delimiter-syntax branch April 13, 2022 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants