Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pattern helper to make pattern-matching on ASTs more concise #2362

Merged
merged 3 commits into from
Nov 4, 2015

Conversation

alexdowad
Copy link
Contributor

While working on #2359, I used a very complex conditional with lots of calls like node.children[1].children[0].type == ... and so on. @bbatsov commented on this:

I don't like the usage of array references and children in general, as it really obscures what you're trying to do. Even I'm not sure about it, so people not familiar with parser will be extra confused.

This PR is an attempt to make that kind of code very concise. It introduces a pattern-matching language for AST nodes. Pass a pattern string to RuboCop::Pattern.new, it compiles it into a method which runs just as fast as if it was written in straight Ruby.

Here are some examples of what the pattern format looks like:

# Let's say we want to match this
Const = Class.new
# Or:
Const = Module.new

pattern = Pattern.new '(casgn _ const (send (const _ {:Class :Module}) :new ...))'
pattern.match(parser.parse(src)) # => true

Alphabetic identifiers match node types, () destructures a node, {} provides alternative patterns, any of which can match. _ is a wildcard, symbol and integer literals match themselves, ... matches any remaining children up to the end of a node.

Other features: ! negates the next part of the pattern, $ captures whatever it matches (returning it from #match if the overall match succeeds), and you can match against the last child of a node by preceding it with a ....

If you extend Astrolabe::Node with methods which have a _type? suffix, the prefix will become usable in Pattern. For example:

class Astrolabe::Node
  BASIC_LITERALS = [:str, :int, :float, :sym, :regexp, :nil, :true, :false]
  def basic_literal_type?
    BASIC_LITERALS.include?(type)
  end
end

Pattern.new '(send basic_literal ...)' # this will now work

Thoughts?

@alexdowad alexdowad force-pushed the pattern branch 2 times, most recently from e05fab9 to 8af55f0 Compare October 28, 2015 20:38
@alexdowad
Copy link
Contributor Author

The Travis CI failure is nothing to do with the PR -- when testing under MRI 2.0.0, it failed to download and install the simplecov gem.

@jonas054
Copy link
Collaborator

It's a very interesting idea. I've played around with it a little bit, and I found that it produces code that's even more clear than the usual vars... = *node. I'm not sure about the performance implications, but it seems that you've thought about that, doing the main processing in the constructor.

To not lose anything in readability, we may want to allow named throw-away markers like _receiver instead of _, etc.

So... great work! It comes in at a late stage, but it's conceivable that we could gradually move away from other extraction techniques and start using the Pattern class.

@alexdowad
Copy link
Contributor Author

@jonas054 Thanks for your encouragement. I will make it support wildcards like _receiver, as you say.

The biggest thing missing right now is a way to do an "and" -- to say "both of these patterns must match at the same place in the tree".

@alexdowad
Copy link
Contributor Author

OK, we now have "named wildcards", like _receiver.

@alexdowad
Copy link
Contributor Author

A couple other adjustments which make Pattern more powerful: rather than being a wildcard itself, $ can now be prepended onto any pattern (including a wildcard) to capture what it matches.

Pattern now always returns true if the match succeeds; captures are yielded rather than returned.

@jonas054
Copy link
Collaborator

The new capture functionality (prepending $ to any pattern) is great, but I'm not convinced that yielding the captures improves usability. For example, I tried converting the line expr1, expr2 = *node in and_or.rb, and when I have to do the rest of the processing inside a block, two things change:

  • one step more indentation throughout the method
  • I must do an explicit return inside the block, because the implicit return is now the value of the block, i.e., true.

Also, it would be nice if $... gave me an array containing the rest of the nodes.

@alexdowad
Copy link
Contributor Author

Aha, I didn't think of $.... I will make that work as well.

The reason why I made it yield captures instead of returning them is that you can't use multiple assignment in a conditional. Any idea on how to make a conditional based on whether a pattern matches or not, but also get the captures into variables?

Maybe you can either use Pattern#match in a conditional, or get the captures, but not both? Or maybe each Pattern should have 2 methods, one which yields captures and one which returns them?

@alexdowad
Copy link
Contributor Author

OK, @jonas054, you can use $... now. I have added several specs to make sure it works correctly. You can also use $... with a final pattern at the end, like: (send _receiver $... int), to pick out a range of children up to, but not including, the last one.

I haven't changed the thing about yielding the captures; I'd like to hear your suggestion on how to do it better first.

@alexdowad alexdowad force-pushed the pattern branch 2 times, most recently from 3e65f9a to 5171ea7 Compare October 30, 2015 13:19
@alexdowad
Copy link
Contributor Author

Another added feature: unification is now performed on named wildcards. So a pattern like (send _x :+ _x) will match 1 + 1, 2 + 2, and so on, but not 1 + 2.

@jonas054
Copy link
Collaborator

This is a diff on the previous commit, but I think you get the point. This worked for me:

       def yield_captures(captures)
         captures = (1..captures).map { |n| "capture#{n}" }.join(',')
-        "yield(#{captures}) if block_given?"
+        ['if block_given?',
+         "  yield(#{captures})",
+         %(elsif !"#{captures}".empty?),
+         "  return #{captures}",
+         'end'].join("\n            ")
       end

If a block is given, it works the same as now. If not, it returns the captures if the pattern contained anything to capture, otherwise it returns true. Failure to match always returns nil of course.

What do you think? Can you incorporate this functionality and add testing for it?

@alexdowad
Copy link
Contributor Author

@jonas054 OK, that seems like a good idea.

@alexdowad alexdowad force-pushed the pattern branch 2 times, most recently from 0a55678 to 0193939 Compare October 30, 2015 13:38
@alexdowad
Copy link
Contributor Author

OK, you got it! Pattern#match now behaves as you suggested. And yes, I added tests (easy using shared examples).

@alexdowad alexdowad force-pushed the pattern branch 2 times, most recently from 48b2406 to 7d388a1 Compare October 30, 2015 14:05
@jonas054
Copy link
Collaborator

To me it makes more sense to get a single node back when only one is matched, e.g.,

key = Pattern.new('(pair $_ _)').match(pair)

instead of

key, = Pattern.new('(pair $_ _)').match(pair)

@alexdowad
Copy link
Contributor Author

Another update. Now Pattern#match returns a bare value (not wrapped in an array) if you use only a single capture and don't provide a block.

@alexdowad
Copy link
Contributor Author

Coverage decreased by 0.001%! Horrors!

@jonas054
Copy link
Collaborator

Here's a failing example that I believe shows a bug:

      context 'within a sequence ending with ellipsis' do
        let(:pattern) { '(hash _ ...)' }
        let(:ruby) { '{}' }
        it_behaves_like :nonmatching
      end

It shouldn't match since the AST is (hash). The pattern (hash _) behaves correctly and doesn't match.

@bbatsov
Copy link
Collaborator

bbatsov commented Nov 3, 2015

I finally got to take a closer look at this PR and things are looking pretty good if you ask me. I'm not sure about the performance, but the core functionality is pretty sound.

It's not necessary to get everything perfect the first time around, you know. :-) I'm guessing many good ideas will arise from actual usage of the patterns.

@bbatsov
Copy link
Collaborator

bbatsov commented Nov 3, 2015

Another small thingy - maybe this should be named AstPattern or something like this for clarity's sake.

@alexdowad
Copy link
Contributor Author

Sure, we can do that. How about NodePattern?

I'm busy now but can do this within a couple hours.

@bbatsov
Copy link
Collaborator

bbatsov commented Nov 3, 2015

Yeah, NodePattern sounds good.

# '(send ...)' # matches (send ...)
# '{send class}' # matches (send ...) or (class ...)
# '({send class})' # matches (send) or (class)
# '(send const)' # matches (send (const))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be # matches (send (const ...)).

@alexdowad alexdowad force-pushed the pattern branch 2 times, most recently from 2e9af6d to de8bcb7 Compare November 3, 2015 08:36
@alexdowad
Copy link
Contributor Author

Renamed to NodePattern. Fixed documentation as suggested by @jonas054.

@jonas054
Copy link
Collaborator

jonas054 commented Nov 3, 2015

You should also rename the files to node_pattern.rb and node_pattern_spec.rb.

@bbatsov
Copy link
Collaborator

bbatsov commented Nov 3, 2015

def_matcher should probably be def_node_matcher as well.

@alexdowad
Copy link
Contributor Author

And def_node_search?

@bbatsov
Copy link
Collaborator

bbatsov commented Nov 3, 2015

Yeah - node everywhere!

@alexdowad alexdowad force-pushed the pattern branch 2 times, most recently from db50c20 to 994917b Compare November 3, 2015 16:00
@alexdowad
Copy link
Contributor Author

Updated according to latest feedback.

@jonas054
Copy link
Collaborator

jonas054 commented Nov 3, 2015

We haven't discussed def_node_search. What does it do?

@alexdowad
Copy link
Contributor Author

Sorry, I thought it would be obvious from the code. Give it a pattern string, it defines a method which searches a node's descendants and yields all the descendants which match the pattern.

@jonas054
Copy link
Collaborator

jonas054 commented Nov 3, 2015

OK. So it replaces Util#on_node then, I guess.

@alexdowad
Copy link
Contributor Author

I didn't know about #on_node. I see it has something which #def_node_search doesn't -- you can prevent it from recursing down past a certain type of node. Not sure how to deal with that.

@alexdowad
Copy link
Contributor Author

Added method documentation to def_node_matcher and def_node_search.

bbatsov added a commit that referenced this pull request Nov 4, 2015
Add Pattern helper to make pattern-matching on ASTs more concise
@bbatsov bbatsov merged commit 7b5265c into rubocop:master Nov 4, 2015
@bbatsov
Copy link
Collaborator

bbatsov commented Nov 4, 2015

I'll merge this so we can get back to #2359 and evolve this more organically down the road.

Thanks for the hard work, @alexdowad!

@jonas054
Copy link
Collaborator

jonas054 commented Nov 4, 2015

Good idea @bbatsov!
Fantastic work @alexdowad!

@bbatsov
Copy link
Collaborator

bbatsov commented Nov 4, 2015

P.S. I plan to do a release around the end of the week (e.g. Friday).

@alexdowad alexdowad deleted the pattern branch November 4, 2015 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants