Skip to content
This repository

Support for Cypher DSL #110

Closed
andreasronge opened this Issue December 15, 2011 · 14 comments

5 participants

Andreas Ronge Andres Taylor Dmytrii Nagirniak Peter Ehrlich dre-hh
Andreas Ronge

Design Goals

  • Should be easy to understand how the DSL maps to the Cypher language
  • Should make it easy to use prefixed Neo4j.rb relationships and lucene indices
  • Should have a Rubyish syntax

Using Strings

The DSL should support using strings that are sent directly to the Cypher Java Api.

Neo4j.query do
  START "foo=node(1)"
  MATCH "foo-[foo_bar]->bar"
  RETURN "bar, foo_bar"
end

It might be a bit controversial to have method names with capital letters like START, but I think it's easier to read.

Using Hashes and Bindings

It does also support using hashes instead of strings. It's also possible to mix.
Example:

Neo4j.query do
  START :foo => node(1)
  # following is same as MATCH "foo-[foo_bar?]->bar"
  MATCH foo.outgoing.as(:foo_bar).to(:bar)
  RETURN bar, foo_bar
end

The symbols above will define instance variables which can be used later used.
Example, START :foo => node(1) will create instance variable foo which can be used later.
This means that we can actually check the syntax of the DSL and make sure that all variables are defined.
The second line MATCH foo.outgoing.as(:foo_bar).to(:bar) will use the instance variable foo and
define two new instance variables foo_bar and bar which is used in the RETURN method.

Using Method Chaining

For simpler queries method chaining is supported.
Example of using method chaining instead of the block above:

Neo4j.query.start{:foo => node(1)}.match{foo.outgoing.as(:foo_bar)}.to(:bar)}.returns{bar, foo_bar}

Starting from any node

Method chaining will probably be more used when you already have the start node since the query will be simpler.

node = Person.find(:name => 'andreas')
node.query.match{friends}.to(:bar)}.returns{friends.since, bar}

We know the declared relationships on the node so we don't have to specify outgoing but instead can friends as shown in the example above.

Optional Relationships

Neo4j.query do
  START "a=node(2)"
  MATCH "a-[?]->x"
  RETURN "a,x"
end

Same as

Neo4j.query do
  START :a=>node(2)
  MATCH a.outgoing?.to(:x)
  RETURN a, x

With types

Neo4j.query do
  START :a=>node(2)
  MATCH a.outgoing?.as(:foo)
  RETURN a, foo

Using Lucene index

Neo4j.query do
  START :user => index(Person, :name => "Anakin"),
               "foo=>node(42)"
  # etc ...
end

We will then know which type of index should be used (node/relationship and exact or fulltext).

Using Relationship

Neo4j.rb can have prefix relationships, example:

class Actor
  include Neo4j::NodeMixin
  has_n(:acted_in).to(Movie)

This example will prefix acted_in relationship as Actor#acted_in

Neo4j.query do
  START :actor => :Actor(:name => "Anakin")
  MATCH actor.outgoing(Actor.acted_in).as(:foo_bar).to(:bar)  
  # or maybe this since we know it is an Actor
  MATCH actor.acted_in.as(:foo_bar).to(:bar)  
end

Multiple Relationships

The following query:

Neo4j.query do
  START "john=node:Person_exact(name = 'John')"
  MATCH "john-[:friend]->()-[:friend]->fof"
  RETURN "john, fof"
end

can also be written as this:

Neo4j.query do
  START :john => index(Person, :name => 'John')
  MATCH john.outgoing(:friends).to().outgoing(:friends).as(:fof)
  RETURN john, fof
end

Where

The following query

Neo4j.query do 
  START "user=node(5,4,1,2,3)"
  MATCH "user-[:friend]->follower"
  WHERE "follower.name =~ /S.*/"
  RETURN "user, follower.name"

can also be written like this

Neo4j.query do 
  START :user=>node(5,4,1,2,3)
  MATCH user.outgoing(:friend).to(:follower)
  WHERE follower.name == /S.*/
  RETURN "user, follower.name"

Yes, this is actually possible but maybe a bit crazy.
The follower object implements the method_missing method and returns an object which overloads
the == operator on the object.

This also means that we get validation on the RegExp (but it migh have a different syntax in Java, so we should also allow a String as a regexp somehow)

Setting depth

Neo4j.query do
  START :actor => :Actor(:name => "Anakin")
  MATCH actor.outgoing(Actor.acted_in).as(:foo_bar).to(:bar).depth(:any)
  # or 
  #  MATCH actor.acted_in.as(:foo_bar).to(:bar).depth(1..3)
end

This is just an early draft. Feedback is very welcome !

Andres Taylor

Looks really cool. Great job!

Dmytrii Nagirniak
Collaborator

What I'm missing now is ability to start the query from known node.

So instead of Neo4j.query{ START :user => node(user.neo_id) } I'd rather write user.query....

Other thing is that this DSL seems to be too verbose. The Cypher most of the time is one liner.

This DSL advocates multiple lines even for the simples queries.

I would prefer something like blog.query.match { posts(:p).comments(:c) }.return { c } }.

Andreas Ronge

Yes, we should support starting from any node as well. I've updated the issue above.
Not sure about cypher queries will most of the time be one liners.
Here is an example of a typical cypher query I use in a project:

  START n=node:admin_Facility_exact("name:*")
  MATCH (n)<-[:subfacilities*0..3]-()<-[:uses]-(p), (n)<-[:uses]-(o)<-[:member_of]-(p2)
  WHERE p._classname = "Person" AND p.number != p2.number
  RETURN n.name, p.name, p2.name

This also brings up another issue - WHERE p_classname = "Person" is probably going to be a common thing to do.
We should make it easier to express that.

I think we want to chain methods when you already have a start node, but use the block DSL when you don't have a start node or need to write more complex queries.

Andreas Ronge

It should also be easier to express things like the line below and avoid using lucene index to find all instances of a class.

START n=node:admin_Facility_exact("name:*")

Instead we can use our class node (rule node) and traverse to find all instances.

Maybe it can be express like this instead:
Facility.query_all{..}

Dmytrii Nagirniak
Collaborator

Ok. It all makes sense. But my 2 cents:

  • WHERE p_classname = "Person" I'm also pretty sure it will be common use case.
  • We should NOT use ALL UPPPER CASE as it is supposed to be a constant in Ruby. 100% against it. It doesn't make sense to go against the language as Cypher itself is not case-sensitive (at least you can write lowercase start, where, return etc).
  • Facility.query_all {...} - I think we should provide the entry-point into the DSL the same way no matter where you start. I would think Neo4.query..., my_facility.query..., Facility.query... etc would be reasonable.

Also my 2 cents re DSL:

  • ...match{foo.outgoing.as(:foo_bar)} I would rather write ...match{ foo.foo_bar } (the foo_bar indicates outgoing relationship by default, but we could change it to incoming: foo.foo_bar(:in)).

The foo_bar should be generated based on the existing relationship on the Model. So if you misspell it, it would raise.

But we shouldn't probably try to make it perfect the first time.
We'll see more common patterns only when we'll start using (I don't have enough of those yet).

Cheers.

Dmytrii Nagirniak
Collaborator

Another thing to remember is that we should be able to pass in parameters natively: blog.query.where(since) { |since| created_at >= since }

There's a lot to learn for the similar API from squeel.

Andreas Ronge

Yes, that would be nice.
But it requires a lot of operator overloading which is a bit limited (not all Ruby operators are allowed to overload).
{ created_at >= since } should be translated into a string "something.created_at >= since".

Have to do some more thinking and study the squeel syntax.

Regarding .match{foo.outgoing.as(:foo_bar)} that means traverse any outgoing relationship, not just the foo_bar relationship types.
But we should support foo.foo_bar as well.

Yes, I know having method names in upper case is controversial. But we sort of build our own language and I just thought it was easier to read. But I'm willing to change to lower case anyway, since I know it might upset people :-)

Andreas Ronge

Regarding Facility.query_all - we can instead add a query method on traversals, e.g. Facility.all.query which means traverse from the Rule/Class node with the _all relationship type. (http://neo4j.rubyforge.org/guides/rules_and_functions.html)
If you create your own rules (like scope in active record) it's possible to combine rules with Cypher.

For example

class Facility < Neo4j::Model
  rule(:ready) { ...}
  rule(:used) { !userd_by.empty?}

To query all Facilities that are both ready and used we can do this:
Facility.ready.used.query{ ...}
This will also narrow down the scope of the traversals and make them faster.

Andreas Ronge

How about using the >>, << and <=> operators ?
Example

Neo4j.query do
  # Same as: start "foo=node(1)"
  start(:foo) = node(1)

  # Related nodes: match (n) -- (x) 
  match foo <=> x

  # match "foo-[:foo_bar]->bar"
  match foo >> [:foo_bar] >> :bar # or foo >> ":foo_bar" >> bar

  # match "foo-[foo_bar]->bar"
  match foo >> [any, :foo_bar] >> :bar  # or foo >> "foo_bar" >> bar

  # match "foo-[*]->bar"
  match foo >> [any] >> :bar # same as foo >> [] >> :bar, or foo >> bar

  # Variable length relationships
  # match a-[:KNOWS*1..3]->x
  match a >> [:knows, 1..3] >> x # or a >> ":knows*1..3" 

  # MATCH (n)-[r:friends]->()
  match n >> [:r, :friends) # (bind outgoing friends to variable r)

  # match (a)-[:KNOWS]->(b)-[:KNOWS]->(c)
  match a >> [:knows] >> b >> [:knows] >> c
end
Dmytrii Nagirniak
Collaborator

I like it. We could also allow simple strings and relations:

match "a->[:plain_cypher_string] ->b"
match a >> Person.friends >> x

But we can alos use > together with >> to convey a bit different meaning. (Maybe > would mean depth of 1, while >> - unlimited or similar).

Not thinking about the details right now, just the DSL.

Also something like this would be nice:

# self is a Company model

def all_companies
  query do
    start(:u) # The value should be implied as `self`
    match c >> Company.groups >> UserGroup.participations >> User.participations >> u
    return distinct c
  end
end

This is actually what I currently have (the ugly):

def all_companies
  res = Neo4j.query("""
      START u=node({s})
      MATCH c-[:`#{Company.groups}`]->()-[:`#{UserGroup.participations}`]->()-[:`#{User.participations}`]->u
      RETURN distinct c
    """, 's'=>neo_id)
  res.map{|r| r['c'].wrapper }
end
Peter Ehrlich

Hello! I've been building up my own cypher library which begins to implement many of the things here. It also has some pretty cool innovations, which I'll show.

You can see the source, here: https://github.com/pehrlich/neo4j_helper/blob/master/lib/neo4j_helper/cypher.rb

# post.rb
def comments 
  self.cypher.match("(self)-[:older_comment*1..#{count}]->(comment)").returning(:comment)
end

    # posts_controller.rb
    def comments
      render json: Post.comments.paginate(pagination)
    end

    # application controller (inspired by guthub)
    def pagination
      out = {}
      out[:per_page] = params[:per_page] if params[:per_page]
      out[:page] = params[:page] if params[:page]
      out[:skip] = params[:skip] if params[:skip] # unused
      out
    end

chained methods

Allows match, where, limit, etc. Doesn't yet support duplicates. Allows order, limit, returing, and so on to be applied in any order

Starts at self

I found myself repeating this pattern a lot: "self = node(#{self.id}". So I moved cypher to belong to model, and made that an assumed default if the start method is not called.

returning content

  • Everything comes out as hashes of ruby objects with symbolized keys. No bombarding beginners with java objects.
  • There are two ways to get content out.
    • The simpler is the #mapped method, which accepts symbols returns all the results mapped to with symbols as keys. This would replace #returning in the above
      • returning..paginate: With #returning, a query object is returned rather than data, allowing lazy querying.

Next Up

  • I'm not happy with how data is returned from the query. It is possible to return multiple objects from the query. The current implementation is a hack for prefetching data (ie, what Arel's :include paremeter does right). I'm going to explore two solutions:
  • The good one: storing the data behind the scenes, so that if you set `returning(:comments, :post), when call comment.post, the post would be already fetched. I just thought of this this morning haven't checked to see if this is already implemented.
    • The above implementation could be quite complicated under the hood, as to be perfect it would need to detect whether two differently worded queries would deliver the same results. I don't know how easy this is.
  • The ok one: Allow #returning to receive a block. This block receives a hash of the returned row, and is used to format that row in to a desirable shape. I'm thinking something like this:
# return a comment for rendering by the view.  A more proper use might be instantiating a 
# container class with the comment and rel.
.returning(:comment, :rel) do |comment, rel|
    comment.to_json[:voted] = rel.voted?
end
  • Named scopes would fit very nicely, and probably be easy to make.

Regarding other stuff-- I'm not so sure I'm a fan of replacing cypher syntax with a ruby equivalent. ALL the learning materials on the web are currently in cypher, and as a language it is not so bad to learn or read. With a few simple methods like I've shown, its easy to remove some of the boilerplate, leaving the user to focus on the most expressive bits. Changing these I fear would limit the usage of the language. It is in WIP itself and rapidly changing, and not supporting everything is as good as supporting nothing; having it forces another decision to be made and syntax to learn for anyone starting with neo4j. But.. prove me wrong!

Andreas Ronge
Owner

I've started to implement the DSL, see README https://github.com/andreasronge/neo4j-core
I think it will be great.

Andreas Ronge andreasronge closed this March 28, 2012
dre-hh

it is grreat!

Andreas Ronge

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.