Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: backreferences, named capture groups, named backreferences #71

Closed
wants to merge 1 commit into from

Conversation

PaulJPhilp
Copy link
Contributor

changed babel.config.js to support node version 10.0 and greater (for named capture support).

Summary

Added support for named-capture, backreference, name-backreference. All three functions are as defined in the Swift RegEx Builder.

I needed to make a change to babel.config.js because the default config didn't support named capture groups. This was the toughest bug I've tracked down in quite a while. I'm not confident that my solution is optimal and would appreciate a second set of eyes.

Test plan

Added unit tests for each of the new functions.

Add a couple of new examples showing how to use new functions in typical use cases.

changed babel.config.js to support node version 10.0 and greater (for named capture support).
@mdjastrzebski
Copy link
Member

@PaulJPhilp There already is an open PR #66 (of mine) about named captures and backreferences. It's nearly finished in terms of features, I was planning to gather some critique about the API to perfect it before merging.

Let's pick the best parts of both and unify them. Please take a look at it, and I will take a careful look at your PR, and let's discuss it in the following days.

@PaulJPhilp
Copy link
Contributor Author

PaulJPhilp commented Mar 12, 2024 via email

@@ -74,6 +74,41 @@ Captures, also known as capturing groups, extract and store parts of the matched
> [!NOTE]
> TS Regex Builder does not have a construct for non-capturing groups. Such groups are implicitly added when required. E.g., `zeroOrMore(["abc"])` is encoded as `(?:abc)+`.

### `backreference()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ordinal backreferences accurately might be problematic in case of more complex expressions, nesting, etc. Therefore, I think we can drop them without loosing any functionality to the user trying to build maintainable regexes.


A backreference is a way to match the same text as previously matched by a capturing group.

### `namedCapture()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered option of having a separate namedCapture() construct in addtion to basic capture(). After some prototyping and consulting I've found capture(..., { name: 'aaa' }) to be better due to improving discoverability, and following JS convetion of "config" or "options" objects.


A named capturing group is a capturing group that give a name to the group. The group's matching result can later be identified by this name.

### `namedBackreference()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swift Regex Builder uses following convention for named captures/references:

let kind = Reference(Substring.self)

let regex = Capture(as: kind) {
  ChoiceOf {
    "CREDIT"
    "DEBIT"
  }
}

see: https://github.com/apple/swift-evolution/blob/main/proposals/0351-regex-builder.md#reference

It has a nice feature of connecting reference straight to capturing group, instead of forcing user to repeat the name twice, once in capture then in backreference.

However, in such case dropping "back" prefix seems beneficial, as reference becomes "backreference" only when added to regular expression. Until it's applied to the previous part of the express ("back"), is more of reference.

import { buildRegExp, digit, endOfString, namedCapture, repeat, startOfString } from '..';

// Example: dateRegex
const dateRegex = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one, I'll add this example.


const usernameRegex = buildRegExp([startOfString, username, endOfString]);

test('Matching the Username component.', () => {
Copy link
Member

@mdjastrzebski mdjastrzebski Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this with existing email example, as there quite similar.

Note: I am planning to add some frequently used patterns (URL, email, maybe hashtags, etc). So that each user does not have to define them by hand. I will soon spec-out this feature. I invite you to join in if you have capacity to work for that.


describe('namedCapture RegEx matching', () => {
test('`named-capture` pattern', () => {
expect(namedCapture('a', 'abba')).toEqualRegex(/(?<abba>a)/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When seeing namedCapture('a', 'abba') it's hard to tell which part is the matched pattern and which one is the name of capturing group.

@PaulJPhilp
Copy link
Contributor Author

PaulJPhilp commented Mar 13, 2024 via email

name: string;
}

export function namedBackreference(groupName: string): NamedBackreference {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #66 I proposed an option to skip the name, in which case Regex Builder would autogenerate a brief name for the user (ref-1, etc). That matches Swift Regex Builder which defines Reference without name parameter at all. Not sure if this is worth it. wdyt?

@PaulJPhilp
Copy link
Contributor Author

PaulJPhilp commented Mar 13, 2024 via email

@PaulJPhilp PaulJPhilp closed this Mar 13, 2024
@PaulJPhilp PaulJPhilp deleted the namedgroups branch March 13, 2024 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants