Skip to content

Basic Graph Traversals

xedin edited this page Sep 13, 2010 · 51 revisions

Gremlin serves as a host language to XPath 1.0. Gremlin wraps XPath with language constructs that make it Turing-complete as well as oriented towards graph traversals. XPath is generalized from its intended use for traversing a document object model (tree structure). Gremlin applies XPath to graphs with an arbitrary topology. This section will present basic graph traversals by way of examples on the simple property graph diagrammed below.

gremlin> $_g := tg:open()                         
==>tinkergraph[vertices:0]
gremlin> g:load('data/graph-example-1.xml')       
==>true
gremlin> $_ := g:id('1')
==>v[1]

In Gremlin, there exists a root list. This list can be a singleton. For example, assume a root list with vertex 1 from the above diagram as its only element. The root can be queried in Gremlin using a directory structure-like notation. More specifically, XPath notation. For example, the current root is identified with the following expression.

gremlin> .
==>v[1]

The symbol v denotes that the element is a vertex and 1 denotes the elements unique identifier. To determine all of the outgoing edges from the root, the following statement suffices.

gremlin> ./outE
==>e[7][1--knows-->2]
==>e[8][1--knows-->4]
==>e[9][1--created-->3]

As a convenience, Gremlin prints the outgoing and incoming vertex identifiers along with the edge label. To acquire the vertices at the head of these edges (known as the incoming vertices), apply another segment in the path.

gremlin> ./outE/inV
==>v[2]
==>v[3]
==>v[4]

It is important to note that in Gremlin, vertices are adjacent to edges and edges are adjacent to vertices. The reason for this will become apparent later when making use of element properties in path expressions. The reserved terms for denoting adjacency selection are outE, inE, bothE, outV, inV, and bothV (see Cheat Sheet). The components of a property graph are diagrammed in the example sub-graph below.

The process of traversing a graph, in this manner, can continue indefinitely (granted, if there are loops in the graph).

gremlin> ./outE/inV/outE/inV
==>v[3]
==>v[5]

Moreover, it is possible to make use of Gremlin’s language statements to repeat patterns. For example, the previous example can be denoted as follows.

gremlin> repeat 2
           $_ := ./outE/inV
           end
==>v[3]
==>v[5]
gremlin> .
==>v[3]
==>v[5]

The variable $_ is a reserved variable that denotes the root list. In this way, the root list can be redefined.

If the Gremlin graph data structure was only a directed graph, then outgoing/incoming edges and outgoing/incoming vertices would be the limits of what could be expressed. However, given that vertices and edges can have properties, it is possible to use these properties within a path expression. For example, suppose you want to know the name of vertex 1.

gremlin> ./@name
==>marko

The @name construct denotes the property key name and returns the value of that key (see Known Bugs). The first component of the path is vertex 1. Thus, the name of vertex 1 is “marko.” Another, more complex example that uses vertex and edge properties is to determine the name of the vertices that vertex 1 knows and that are older than 30 years of age, is expressed as such.

gremlin> ./outE[@label='knows']/inV[@age > 30]/@name
==>josh

In this expression, the [ ] notation serves to filter results of previous step in the path (see Path Functions). Thus, ./outE is filtered to only those edges that have a label of “knows.” With respect to the diagrammed graph, this leaves only two edges. Next, the incoming vertices at the head of these two edges are determined and then filtered to only those whose age property is greater than 30. Given the diagram, this only leaves vertex 4. In the final segment of the path expression, the name of vertex 4 is selected and what is returned is “josh.”

To conclude, lets do a more complicate graph traversal that uses backtracking and an in-line regular expression.

gremlin> ./outE[@label='knows']/inV[@age > 21]/@name[matches(.,'jo.{2}|JO.{2}')]/../@age
==>32

With the root vertex being vertex 1, this path expression returns the age of those vertices that vertex 1 knows, are older than 21, and whose names are 4 characters and start with a ‘jo’ or ‘JO’. While contrived, it demonstrates regular expression matching using the function boolean matches(string, string) on string properties as well as backtracking /.. to a vertex previously visited.

This expression does the same thing without backtracking. Both are provided in order to demonstrate the many ways in which to express the same thing.

gremlin> ./outE[@label='knows']/inV[@age > 21 and matches(@name, 'jo.{2}|JO.{2}')]/@age   
==>32