Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: make helper-grammar-regex-collection more readable #282

Merged
merged 36 commits into from
Feb 17, 2017

Conversation

josephfrazier
Copy link
Member

@josephfrazier josephfrazier commented Feb 15, 2017

This uses XRegExp to allow the regular expressions in
helper-grammar-regex-collection to:

  1. be written in a whitespace-agnostic format
  2. to share common subpatterns like ['"]([^'"\s]+)['"]
  3. to only capture named groups, to avoid (?: everywhere

It takes advantage of template literals to allow the patterns to be
written as multi-line strings, but without having to double-escape
special characters.

@stefanbuck, what do you think? In my opinion, this makes the regex
syntax a bit more verbose, but much easier to read and modify.

XXX note that moving 'x' makes tests fail. Not sure why
Note that XRegExp.build automatically wraps it into a group when used as
a subpattern
It can match the empty string, and isn't captured anyway
This cuts dist/app.js from 724K back down to 580K
@stefanbuck
Copy link
Member

stefanbuck commented Feb 15, 2017

This looks awesome, well done! I'll take a closer look later.

Copy link
Member

@stefanbuck stefanbuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing. I really like the subpattern concept, so genius. Added two minor comments, besides LGTM

const subpatterns = {};
const regex = regexBuilder(subpatterns);

subpatterns.captureQuotedWord = regex`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit odd to me to modify subpatterns after passing it to regexBuilder. Just my personal preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed up a refactor I had been working on, that addresses this by allowing normal template literal interpolation to be used, so that subpatterns isn't exposed outside of the regex tag function.

For what it's worth, this refactor was also done with the idea that I might eventually contribute it back to XRegExp. See slevithan/xregexp#103

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring, looks damn good 👍 It's great to see your contribution to other project as well. Hopefully they will respond soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Once we sort out the naming question, I'll merge it.

If XRegExp later decides to incorporate the tag function, we'll be able to use it directly instead. Otherwise, I could always publish a separate module to make the tag more generally available.

const RUST_CRATE = /(?:extern crate|use) ([^:; ]+)/g;
const PYTHON_IMPORT = /^\s*(?:import|from)\s([^\s]*)/gm;
const subpatterns = {};
const regex = regexBuilder(subpatterns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result from regexBuilder is actually the regexBuilder, isn't it? Therefore I would prefer to rename regex to regexBuilder and regexBuilder to some different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the naming is a little strange here. My perspective on it is that regex is a function that takes a string and returns a RegExp, much like its constructor: RegExp('my pattern'), so the regex name makes sense to me.

However, regexBuilder is still a little awkward. While it does build the regex function, it still sounds like it might do the same thing as regex. Besides, I discovered there's already another regexBuilder in the codebase (lib/plugins/helper/regex-builder.js). I not sure what a better name would be, though. Some ideas:

  • regexTagBuilder
  • regexTagFunctionBuilder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just get rid of of the regexBuilder function, and have regex use the same flags every time (since the code calling it doesn't actually need to use different flags)? I just pushed up a commit showing what it could look like: a3a25ad?w=true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good and your explanation behind naming it regex makes sense to me as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just merged in the changes! I'm still working on learning why this is necessary, but I wanted to go ahead and land the rest of it, since it would conflict with future changes to the regular expressions.

EDIT: I reported a bug corresponding to the code linked above: slevithan/xregexp#162

Since we never actually use any other combination of flags, we can just
export `regex` with the flags already set, rather than exporting
`regexBuilder`.

See conversation here:
https://github.com/OctoLinker/browser-extension/pull/282/files/54d73ecb030eb4a3a3f29a5a4ccf331f4ecd0caa#r101380958
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants