feat: named capture groups and reference (alternative approach) #78

mdjastrzebski · 2024-03-28T13:37:52Z

Summary

This is an alternative syntax for backreferences proposed by @Killavus.

// Define a capture as a variable.
const someCapture = capture([...], { name: 'some' });

const regex = buildRegExp([
  // Create a named capture using name from `someRef`.
  someCapture,
  // ... some other elements ...
  // Match the same text as captured in a `capture` using `someRef`.
  someCapture.ref(),
])

It requires assigning a capture group to a JS variable which then can generate a backreference by calling .ref() method on it.

Comparison to #66

Pros:

more natural flow: from capture to reference
more compact in the main expression, as capture needs to be assigned to a separate variable

Cons:

requires assigning capture to a separate variable, not possible to have capture in the main regex expression
requires assigning explicit name for capture, as we want to avoid creating named capturing group with automatically generated name if not needed.
deviates from Swift Regex Builder pattern: https://developer.apple.com/documentation/regexbuilder/reference

Real life example (HTML open-closing tags matching):

test('example: html tag matching', () => {
  const tagName = capture(
    oneOrMore(/[a-z0-9]/),
    { name: 'tag' },
  );
  const tagContent = capture(
    zeroOrMore(any, { greedy: false }),
    { name: 'content' },
  );

  const tagMatcher = buildRegExp(['<', tagName, '>', tagContent, '</', tagName.ref(), '>'], {
    ignoreCase: true,
    global: true,
  });

  expect(tagMatcher).toMatchAllNamedGroups('<a>abc</a>', [{ tag: 'a', content: 'abc' }]);
});

@PaulJPhilp pls take a look at this approach vs the one in #66 .

Test plan

Added automated tests & real-life example.

wip chore: more tests feat: improved refs refactor: merge name and ref refactor: self code review refactor: tweaks refactor: rename reference to ref feat: example with html tags chore: self code review

PaulJPhilp · 2024-04-01T19:49:39Z

One advantage I see with the ts-regex-builder approach is that it is can be external DSL, separate from JS/TS. An external DSL can facilitate:

sharable RegEx components
debugging tools (like: https://swiftregex.com)
tag functions (in JS/TS)

Storing explicit refs in vars breaks this paradigm (in the Swift approach too).

Is it possible to get the benefits of the explicit refs by using implicit refs? E.g:

notes:
I have no preference between 'name' or 'as' ( or 'group', .....) so I use name/as below.
I am using "useGroup" as a placeholder only. The actual naming convention would need some
thought.

In Typescript:

 const htmlTag = capture(
    zeroOrMore(alphabetical, digit),
   {name/as: "htmlTag"}
 )

 const tagContent = capture(
    zeroOrMore(any, { greedy: false }),
   { name/as: 'taggedContent' },
 );

const tagMatcher = buildRegExp(['<', tagName, '>', tagContent, '</', useGroup("htmlTag"), '>'], {
ignoreCase: true,
global: true,
});

Or, in my imaginary DSL:
Regex {
'<'
Capture(name/as: "htmlTag") {
oneOrMore(alphabetical, digit)
}
'>'
Capture(name/as: "taggedContent") {
zeroOrMore(any, greedy:false)
}
'</'
useGroup("htmlTag")
}

As a first step, these group names could be scoped globally, meaning each name
can only be used once. It's a limitation, but perhaps better than automatic name generation.

mdjastrzebski · 2024-04-01T21:50:34Z

@PaulJPhilp Hmmm, I'm not sure I get your point. Could you explain what would be your goal(s) in terms of API design, and how the above options realize/don't realize it?

For brief summary here's the recap of the two approaches:

In the original PR feat: named capture groups and reference #66 standalone ref("abc") was just a holder for name ({ type: "reference", name: "abc" }). When assigning it to a capture group: capture([...], { name: ref }) the capture group did not actually "hold" the ref in any meaningful way, it just used its name when encoding the capture regex. The goal here is to not require user to type the name twice, once in the capture group and second in the backreference.

So the following examples actually worked exactly the same from technical point of view:

// Suggested syntax
const myRef = ref('my'):
const myRegex = buildRegExp([catpure([...], { name: myRef }, myRef);

// Other option that actually worked
const myRegex = buildRegExp([catpure([...], { name: "my" }, ref("my"));

In case of this PR, this is somewhat similar, ref() is not a real reference, just ensures that the capture/backref names are the same. However, this PR removes ref("my") standalone construct so

const myRegex = buildRegExp([catpure([...], { name: "my" }, ref("my"));

Would no longer be possible.

PaulJPhilp · 2024-04-01T23:37:02Z

Maciej: This is a different topic. As I mentioned, I am writing an article about the ts-regex-builder library. Here is a link to the first draft. I'd appreciate any feedback you can provide. ***@***.***_93037/from-ancient-hieroglyphics-to-modern-software-8bbc6012560f Thank you, Paul

…

On Mon, Apr 1, 2024 at 5:50 PM Maciej Jastrzebski ***@***.***> wrote: @PaulJPhilp <https://github.com/PaulJPhilp> Hmmm, I'm not sure I get your point. Could you explain what would be your goal(s) in terms of API design, and how the above options realize/don't realize it? For brief summary here's the recap of the two approaches: 1. In the original PR #66 <#66> standalone ref("abc") was just a holder for name ({ type: "reference", name: "abc" }). When assigning it to a capture group: capture([...], { name: ref }) the capture group did not actually "hold" the ref in any meaningful way, it just used its name when encoding the capture regex. The goal here is to not require user to type the name twice, once in the capture group and second in the backreference. So the following examples actually worked exactly the same from technical point of view: // Suggested syntax const myRef = ref('my'): const myRegex = buildRegExp([catpure([...], { name: myRef }, myRef); // Other option that actually worked const myRegex = buildRegExp([catpure([...], { name: "my" }, ref("my")); 2. In case of this PR, this is somewhat similar, ref() is not a real reference, just ensures that the capture/backref names are the same. However, this PR removes ref("my") standalone construct so const myRegex = buildRegExp([catpure([...], { name: "my" }, ref("my")); Would no longer be possible. — Reply to this email directly, view it on GitHub <#78 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BA6V6ZXTFTSUONBIIMQM52DY3HJEDAVCNFSM6AAAAABFMZZKZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZQGYZDKNZYGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mdjastrzebski · 2024-04-02T21:57:02Z

@PaulJPhilp The URL you have posted seems to be obfuscated. Perhaps this is somesecurity feature from GitHub.

PaulJPhilp · 2024-04-02T21:59:33Z

That's weird. Can you follow me on twitter so I can send it to you? Or, is there a better way? Paul

…

On Tue, Apr 2, 2024 at 5:57 PM Maciej Jastrzebski ***@***.***> wrote: @PaulJPhilp <https://github.com/PaulJPhilp> The URL you have posted seems to be obfuscated. Perhaps this is somesecurity feature from GitHub. — Reply to this email directly, view it on GitHub <#78 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BA6V6ZRAODNPYTHV4XYEDKLY3MSUHAVCNFSM6AAAAABFMZZKZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGE3TGMJUGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mdjastrzebski · 2024-04-03T14:59:47Z

Closing in favor of more restrictive syntax in #66.

mdjastrzebski · 2024-04-05T03:08:18Z

@PaulJPhilp I've followed you on Twitter, you can send the draft by DMs.

PaulJPhilp · 2024-04-05T04:06:16Z

Great, thanks. I'm just finishing a 2nd draft. I'll send that in the next day or two. Paul p.s. I'm going to update my PR (URL pattern) now that named capture is available. Might as well hold off on reviewing it now.

…

On Thu, Apr 4, 2024 at 11:08 PM Maciej Jastrzebski ***@***.***> wrote: @PaulJPhilp <https://github.com/PaulJPhilp> I've followed you on Twitter, you can send the draft by DMs. — Reply to this email directly, view it on GitHub <#78 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BA6V6ZW5N2Z6TMCIIB67ONLY3YITTAVCNFSM6AAAAABFMZZKZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZYG42TINRXGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mdjastrzebski added 9 commits March 28, 2024 13:07

feat: named capture groups & backreferences

246e8fa

wip chore: more tests feat: improved refs refactor: merge name and ref refactor: self code review refactor: tweaks refactor: rename reference to ref feat: example with html tags chore: self code review

chore: tweak option names

00cd03c

refactor: tweak API

31b4990

refactor: tweaks

805a121

chore: fix exports

8ff0632

refactor: rename reference to ref

0ac8420

wip

b3dfcc3

refactor: cleanup old staff

021554a

docs: update docs

77c85f6

mdjastrzebski requested a review from Killavus March 28, 2024 13:37

mdjastrzebski closed this Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: named capture groups and reference (alternative approach) #78

feat: named capture groups and reference (alternative approach) #78

mdjastrzebski commented Mar 28, 2024

PaulJPhilp commented Apr 1, 2024

mdjastrzebski commented Apr 1, 2024

PaulJPhilp commented Apr 1, 2024 via email

mdjastrzebski commented Apr 2, 2024

PaulJPhilp commented Apr 2, 2024 via email

mdjastrzebski commented Apr 3, 2024

mdjastrzebski commented Apr 5, 2024

PaulJPhilp commented Apr 5, 2024 via email

feat: named capture groups and reference (alternative approach) #78

feat: named capture groups and reference (alternative approach) #78

Conversation

mdjastrzebski commented Mar 28, 2024

Summary

Comparison to #66

Test plan

PaulJPhilp commented Apr 1, 2024

mdjastrzebski commented Apr 1, 2024

PaulJPhilp commented Apr 1, 2024 via email

mdjastrzebski commented Apr 2, 2024

PaulJPhilp commented Apr 2, 2024 via email

mdjastrzebski commented Apr 3, 2024

mdjastrzebski commented Apr 5, 2024

PaulJPhilp commented Apr 5, 2024 via email