Mixed Types [Rough Draft Concept] #3

Open
Shroder opened this Issue Dec 7, 2011 · 0 comments

Projects

None yet

1 participant

@Shroder
Owner

I watched "The Graph Traversal Programming Pattern" off of the Gremlin site. I'm going to try connecting my understanding of those concepts to this topic. It was very interesting to watch, thank you for sharing Alan.

Some parts of the presentation were a little confusing, but this is what I gathered:
Graph DBs use vertices and edges, we use the terms nodes and links. Conceptually, these are the same, although implementation is different.
Graph DBs use indices differently, there is no index lookup, the index is contained within the vertices
Vertices have in edges and out edges
Traversals are meant to have the ability to move through vertices of different types (eg. move through nodes of different types by edge)
I think Marko's comparison to mySQL may be flawed. From what he described, it sounded like there was only one node table and one lookup table. Trying to traverse this type of model that has numerous self-joins would kill the database. Even if it were broken out, I imagine the result will still be very poor however. Multi-level traversal is obviously a weak point.

In regards to Alan's e-mail on Gremlin and moving to Java. I think it's a good idea to push in that direction. One thing I'm curious about is if there is any room for a traditional RDMS in the world of Graph DBs. InfoGraph comes to mind, but I don't know how they are using external Graphs with their own. Either way, I hope the Mesh conceptually remains independent of a particular database. I know that's not where it is at the moment.

It looks like the API already has some support for edges.

For example:

gremlin> // lets traverse to marko's 30+ year old friends' created projects
gremlin> v.out('knows').filter{it.age > 30}.out('created').name
==>ripple
==>lop

So here we get the project name of all friend's created projects. In the Mesh we would have something like this:

$marko = new Node("People", 1234)
$marko->People("gt:age:30", "rel:knows", "dir:LTR")->Projects("rel:created", "dir:RTL")

Where this becomes a problem is when we're dealing with multiple node types. Say, we're writing an app for Amazon where we want to get all media that someone likes, and for reason we have our media split up into different tables named Books, Magazines, CDs, and DVDs (otherwise this would be really easy). This is much more complicated with the Mesh API since it is focused on types rather then how they relate. We could either create a new API to handle relationships, or work it into the current API.

If we altered the existing API, it may just be a matter of adding the ability to link to the collection of mixed-types or creating a dummy node type. The edge filter is already there, so we don't need to worry about implementing it, however I am suggesting a change to the plugin, as shown in the following example.

// What feels most natural:
$marko->People("gt:age:30", "edge:out:LIKE")->Media
// But it could also be:
$marko->People("gt:age:30", "edge:out:LIKE")->Collection($predefinedMediaCollection)

The collection could be defined on the fly by:
1.) Registering the custom cluster as a type

$cluster= new Cluster("Books", "Magazines", "CDs", "DVDs")
// or

$cluster[] = new Cluster("Books", "eq:genre:SCI")
$cluster[] = new Cluster("Magazines", "eq:genre:GOVT")

$marko = new Cluster("People", 123)
$marko->registerType("Media", $cluster)
$marko->People("gt:age:30", "edge:out:LIKE")->Media

2.) Defining the mixed-type cluster in the chain:

$marko->People("gt:age:30", "edge:out:LIKE")->Cluster("...")

This leaves a way to fetch all vertices from an edge. Something like this could be used:

$marko->People("gt:age:30", "edge:out:LIKE")->Any;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment