Skip to content

Conversation

@Raiondesu
Copy link
Contributor

@Raiondesu Raiondesu commented Nov 20, 2025

Having a type-safe function is amazing and all, but sometimes all you need is just the types.

This PR exports the Parse type, while also introducing a separate ParseCaptures in case the user needs only the regular (non-named) captures.

Motivation

Many libraries would benefit from being able to type-parse regexes from user-defined string types without actually doing all of the runtime work that typedRegExp is doing.
Also, a lot of libraries that use regular expressions in some way do not support named captures, so parsing their syntax is just wasted work in this case, hence the new ParseCaptures type to simplify DX.

This PR allows to smoothly integrate ts-regexp into other libraries because of these improvements.

Having a type-safe function is amazing and all, but sometimes all you need is just the types.
@codpro2005
Copy link
Owner

I'll give this a look at the weekends. Thank you very much for your effort already!

@codpro2005
Copy link
Owner

codpro2005 commented Nov 22, 2025

So, I just gave this a look and I really like the idea of exporting the Parse type! I do have some suggestions/questions:

  • Should ParseCaptures<T> include the full match as the first index to mimic the runtime behavior? It's really more of a question of perspective since the full match can be considered an implicit unnamed capture, or whether you only want to include explicit captures. But the advantage of including the full match is that the index would match with the matched pattern groups.

  • I believe ParseCaptures<T> could be simplified to either export type ParseCaptures<T extends string> = Parse<T>['captures']; or export type ParseCaptures<T extends string> = Tail<Parse<T>['captures']>; depending on whether the full match should be included or not. This would be great for maintainability—not having to maintain 2 similar types—but I do wonder whether/how much of a negative impact on memory or performance my suggested change would have. It's worth noting that the type CaptureRecord already includes the parsed named captures, so perhaps it has a negative performance impact either way.

  • Also, perhaps it would make sense to have a separate ParseNamedCaptures<T> for consistency with other libraries that may want to handle named captures too.

All of these are just ideas. I'm happy with the PR as is and can merge it already. Just leave a comment on what you think about these points and I can implement the changes accordingly.

@Raiondesu
Copy link
Contributor Author

Thanks for the detailed feedback and suggestions! You raise good points.

Should ParseCaptures<T> include the full match as the first index to mimic the runtime behavior?

Whenever libraries forward captures to the user from the regex, it is usually done for the sake of convenience: the regular expression acts as a makeshift parser of sorts, and the user defines the tokens that need to be extracted from capture groups to use in some other context, as the original input string is discarded. Therefore, the following pattern emerges in many libraries' code, where the first match is omitted altogether:

//...
const [/* omitted */, ...matches] = regex.exec(str);
//...
return matches;

You're right, however, that this behavior is inconsistent with runtime behavior of the regex itself.
I propose correcting the type in favor of runtime correctness and exporting the Tail helper to allow for a quick solution for the common use-case.

I believe ParseCaptures<T> could be simplified ...

Indeed, I initially duplicated the definition from Parse<T> for performance reasons, as I didn't want to make TypeScript iterate over the CaptureRecord an extra time. But now that you mention it, I see that at this stage all captures, including named, are already parsed indeed. So it seems that the (possible) extra work is really negligible here. I guess my initial worries were overstated. We'd need to skip parsing the named groups altogether at the TokenTree stage to see any real performance gains, which is a lot more complicated and is not really worth it at this point.
Simplifying the type then would mean making ParseCaptures<> into just an alias for Parse<>['captures'], which is still useful as:

  • it introduces an abstraction layer to maybe implement the aforementioned optimization in the future;
  • it is a convenience for the user, as they'd have no idea about the structure of the output from the Parse<> type.

Also, perhaps it would make sense to have a separate ParseNamedCaptures<T> for consistency with other libraries that may want to handle named captures too.

Absolutely, good point once again.


I'm preparing the changes as I write this response, so the update should follow shortly.

Add ParseNamedCaptures for parity
Add Tails to expoted types to address the common use-case of omitting the input string from captures
@codpro2005
Copy link
Owner

Oh wow, thanks for taking care of everything on your own — awesome job! Everything looks really well done, and I really appreciate the extra documentation and examples you added. I'll check it out now and aim to publish this version today.

If you come across any issues or think anything could be revisited for performance or benchmarking reasons, feel free to let me know. For now I'm more than happy with the initial state of this new feature.

@codpro2005 codpro2005 merged commit 0e51a72 into codpro2005:main Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants