Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand character classes #1

Closed
SicroAtGit opened this issue Aug 14, 2021 · 1 comment
Closed

Shorthand character classes #1

SicroAtGit opened this issue Aug 14, 2021 · 1 comment
Labels
feature New feature

Comments

@SicroAtGit
Copy link
Owner

In addition to character classes, there will also be shorthand character classes. However, I'm not quite sure yet which ones there should be and which characters they should cover.

According to this website, the different RegEx engines cover different characters in the shorthand character classes:
https://www.regular-expressions.info/shorthand.html

The current listing:

  • \d for [0-9]
  • \D for [^\d]
  • \t for the tab character
  • \r for carriage return (CR)
  • \n for linefeed (LF)
  • \f for form feed
  • \s for [ \t\r\n\f]
  • \S for [^\s]
  • \w for [A-Za-z0-9_]
  • \W for [^\w]
  • \h for [ \t]
  • \v for [\r\n\f]
@SicroAtGit SicroAtGit added feature New feature help wanted Extra attention is needed labels Aug 14, 2021
@tajmone
Copy link
Contributor

tajmone commented Aug 14, 2021

According to this website, the different RegEx engines cover different characters in the shorthand character classes:
https://www.regular-expressions.info/shorthand.html

BTW, I've bought both RegexBuddy and RegexMagic form JGS (author of the website you linked), so if you need me to test some RegExs for you I'll happily do it. Both tools have a custom engine that includes all versions of the major RegEx engines (so that you can test backward compatibility issues with any engine) plus the custom engine by JGS, which is very powerful (also documented at the website).

One of these two programs also allows debugging a RegEx to break it down into each single passage, in case you need to compare expected behaviour in your code with actual behaviour by other engines.

As for the shorthand classes to implement, it really depends on what your engine goals are — which I'm guessing is mostly oriented toward lexers creation?

I'm not quite sure that \h and \v would be all that useful (vertical tabs are not used much in Western languages, and \t should suffice in place of \h), also these tend to have different meanings across engines.

Some other useful shorthands can be found here:

I know that the above don't all qualify as characters shorthand, for some of them are more abstract in nature, but still...

Repository owner locked and limited conversation to collaborators Aug 15, 2021
@SicroAtGit SicroAtGit removed the help wanted Extra attention is needed label Aug 5, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

2 participants