# Configure the Gremlin server
You must run the below before starting.

In [None]:
%%capture output
%%graph_notebook_config
{
    "host": "127.0.0.1",
    "port": 8182,
    "ssl": false,
    "gremlin" : {
       "message_serializer": "graphbinary"
    }
}

# Gremlin 101

We are going to cover some more advanced queries and concepts in this notebook.

Please refer to https://www.kelvinlawrence.net/book/PracticalGremlin.html for more information using the air routes dataset.

The datasets provided may be slightly different and therefore results may differ slightly.

Start by loading the air routes dataset - this notebook comes with a small version of the dataset mounted in the docker instance for ease of use.

In [None]:
%%gremlin

g.V().drop().iterate()
g.io("/opt/aerospike-firefly/air-routes-small-latest.graphml").with(IO.reader, IO.graphml).read().iterate()

return "Success"

# Common pitfalls
There are many things that people do in relational databases that seem like reasonable things to do in a graph database, but are not.

In [None]:
%%gremlin

// This is a common mistake, it will return the number of vertices in the graph, however in a large graph this will timeout and bog down the system.
g.V().count().next()

In [None]:
%%gremlin

// This is another common mistake, it will return the number of vertices in the graph with the label 'airport', to do this it must first grab all vertices and filter on that label, this will timeout and bog down the system.
g.V().hasLabel("airport").count().next()

In [None]:
%%gremlin

// This is a third common mistake, it will require the system to load all vertices and edges into memory, this will timeout and bog down the system.
g.V().has("code", "SFO").next()

But wait the last query there, if we can't do that, how do we even get an airport!?
- Okay well there's actually a secondary index created on the `code` property in this docker container, this makes this query both possible and fast.
- Also our dataset here is considered to be small, so we can run these queries without issue, but be wary of having a large dataset and running these queries.

# Basic queries

## Getting data from a vertex

In [None]:
%%gremlin

// Get all data from a vertex
g.V().has("code", "SFO").elementMap().toList()

In [None]:
%%gremlin

// Get just the country
g.V().has("code", "SFO").values("country").next()

In [None]:
%%gremlin

// Get the label
g.V().has("code", "SFO").label().next()

In [None]:
%%gremlin

// Get the edges
g.V().has("code", "SFO").bothE().count().toList()

In [None]:
%%gremlin

// Get the edge data
g.V().has("code", "SFO").bothE().elementMap().toList()

## Let's fly somewhere from San Francisco


In [None]:
%%gremlin

g.V().has("code", "SFO").out().values("code").toList()

In [None]:
%%gremlin

g.V().hasLabel("airport").has("code", "SFO").out().values("code").toList()

## Let's find some paths from San Francisco

In [None]:
%%gremlin

// Let's find all the paths from SFO to LHR that have only 1 connection
g.V().has("code", "SFO").out().out().has("code", "AUS").path().by("code").toList()

In [None]:
%%gremlin

// Let's find all the paths from SFO to LHR that have only 1 connection and also report the distance for each hop.
g.V().has("code", "SFO").outE().inV().outE().inV().has("code", "AUS").path().by("code").by("dist").toList()

In [None]:
%%gremlin

// Let's find all airports we can get to from SFO with 1 connection and see how far each flight is.
g.V().has("code", "SFO").outE().inV().outE().inV().path().by("code").by("dist").toList()

# Repeat Queries
Repeat is a commonly used structure, it allows you to repeat a block of code a number of times or until a condition is met.

In [None]:
%%gremlin

// You can repeat a block in a query a number of times:
g.V().has("code", "SFO").repeat(out()).times(2).path().by("code").toList()

In [None]:
%%gremlin

// Adding a timer makes this safer
g.V().has("code", "SFO").repeat(timeLimit(20).out()).until(has("code", "AUS")).path().by("code").toList()

In [None]:
%%gremlin

// Advanced repeats with emit(). Look online for more information on this if you need to emit data during the repeat block.
g.V().has("code", "SFO").repeat(out().simplePath()).emit().times(3).has('code','AUS').
       limit(5).path().by('code').toList()

# Remove duplicates
Doing something like stepping out twice from SFO will return duplicates, you can remove these with `dedup()`

In [None]:
%%gremlin

g.V().has("code", "SFO").out().out().count().toList()

In [None]:
%%gremlin

// You can remove duplicates from the above query with 'dedup' query
g.V().has("code", "SFO").out().out().dedup().count().toList()

# Anonymous and Anonymous Traversals
An anonymous traversal is a traversal that is injected inside a traversal, without being bound to a variable.

We have actually used this before, but without calling it out. Let's look at an example.

In [None]:
%%gremlin

// The traversal is familiar for the first line, after that we use fold, this takes all the results and puts them into a list.
// After that we call project, this allows us to project the results into a map with the provided keys.None
// Finally we call by, this allows us to specify how we want to map the results into the map.
// Notice in the by() call we have effectively traversals, these are anonymous traversals.
// The first one is a count of all the vertices in the list, the second is a count of all the vertices in the list that have the country property set to US.
// project() is a slow step but it can be helpful.
g.V().has("code", "SFO").out().out().dedup().
    fold().
    project("totalAirportCountFromSFO", "USAirportCountFromSFO").
        by(unfold().count()).
        by(unfold().has("country", "US").count()).toList()



# Profiling a query
Let's profile the query and see what it looks like. Profile is useful for seeing how a query is executed and where it is spending time.

In [None]:
%%gremlin

g.V().has("code", "SFO").out().out().
    dedup().fold().
    project("totalAirportCountFromSFO", "USAirportCountFromSFO").
        by(unfold().count()).
        by(unfold().has("country", "US").count()).profile()

# How do I work through a query that isn't working?

A common problem people encounter is debugging gremlin. This is not very easy to do.

Here are some tips:
- Start with something that works and keep adding more of the traversal until it stops working.
- Use `profile()` when it stops working, this will tell you your 'traverser count' per step, which tells you where you may have an incorrect filter or something else.

In [None]:
%%gremlin

// Referring to the query above, we would start with:
g.V().has("code", "SFO").next()

// If that works we'd try
g.V().has("code", "SFO").out().toList()

// If that works we'd try
g.V().has("code", "SFO").out().out().toList()

// If that works we'd try
g.V().has("code", "SFO").out().out().dedup().toList()

// Let's assume it stopped working here. We'd then profile it and see what the traverser count is at each step.
g.V().has("code", "SFO").out().out().dedup().profile()