# Querying our Graph

We did all the hard work to build this graph, let's get to the fun part of Querying it. We're going to write some simple queries first to build up our knowledge, then jump into some more complex stuff.

## Query Anatomy

Before we can get into the functional parts of a query, we need to go over what all the boilerplate stuff looks like. Here is a sample empty query.

```
CREATE QUERY sample_query() FOR GRAPH Patents {
    PRINT "Hello World";
}
```

Note that we have a CREATE statement defining what we're creating (QUERY). After that is the name of the query `sample_query`, then within the parenthesis would be any inputs that we have for our query. A query does not need inputs, so we'll leave that empty for the sample and get to those later on. Next we specify which graph this query is referencing with the `FOR GRAPH Patents` segment. 

A Query can have as many PRINT statements as desired and up to a single RETURN statement. Anything specified in a PRINT statement will be contained in the JSON output of your query. Values in the RETURN do not get output in the JSON, but can be referenced by other queries if the query is being called as a sub-query. Again, we'll get to those more advanced use cases later.

### Creating a Query

We can either create a query through the GraphStudio interface, via the REST interface, or through a connector like pyTigerGraph. I'm going to be using pyTigerGraph to create this query.

In [1]:
!pip install pyTigerGraph -q

/bin/bash: pip: command not found


In [None]:
import pyTigerGraph as tg

# connection parameters
# hostName is the TigerGraph solution URL
hostName = "https://patent-free.i.tgcloud.io"
graphName = "Patents"
userName = "tigergraph"
password = "tigergraph"

# establish the connection to the TigerGraph Solution
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password)

# set the name of the graph that we want to connect to
conn.graphname = graphName

# create a secret
secret = conn.createSecret()
# use the secret to get a token
authToken = conn.getToken(secret)[0]

# connect to graph with token
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName, apiToken=authToken)

#### Installing a Query

This next cell will create and install our query. Installing a query takes about 1-2 minutes and will create an endpoint that we can use to call the installed query. We can also run queries without installing them first. This is called **Interpreted Mode**. There are some limitations to running a query in **Interpreted Mode**, but those are typically more complex operations that we don't have to worry about yet.

These first two cells with CREATE, INSTALL, and RUN the query.

In [6]:
conn.gsql("""
USE GRAPH Patents
CREATE QUERY sample_query() FOR GRAPH Patents {
    PRINT "Hello World";
}
""")

conn.gsql("""
USE GRAPH Patents
INSTALL QUERY sample_query
""")

'Using graph \'Patents\'\nStart installing queries, about 1 minute ...\nsample_query query: curl -X GET \'https://127.0.0.1:9000/query/Patents/sample_query\'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.\nSelect \'m1\' as compile server, now connecting ...\nNode \'m1\' is prepared as compile server.\n\nQuery installation finished.'

In [7]:
conn.runInstalledQuery("sample_query")

[{'"Hello World"': 'Hello World'}]

We can see that the output of our query is that of our PRINT statement. The print statement will also output the name of the variable that is being printed as the KEY in the JSON. In the case of this query, we're not printing a variable, but a string directly. This will cause the KEY to assume the same value as the string being printed.

#### Running an Interpreted Query

If you just want to quickly try out a query without waiting for it to install, then you can run it as an Interpreted query. The limitations of an Interpreted query are documented [here](https://docs.tigergraph.com/gsql-ref/current/querying/appendix-query/interpreted-gsql-limitations). Here's how we run an Interpreted query from pyTigerGraph.

In [12]:
conn.runInterpretedQuery("""
INTERPRET QUERY interpret_query() FOR GRAPH Patents {
    PRINT "Hello World" AS Output;
}
""")

[{'Output': 'Hello World'}]

Note that we have a pyTigerGraph function specifically for running interpreted queries. The only change that needs to be made to the query itself is changing `CREATE QUERY` to `INTERPRET QUERY`.

I also added an `AS` clause to the print line to show that you can change the JSON key of any PRINT statement.

We're going to start off by reviewing the Operations that we can utilize in our SELECT statement and the order that they are executed in. This chart shows those operations and their order (top to bottom), I'll break down into detail what each operation is and does as it comes up, so for now this image is to serve as a reference.

<img src="images/select_statement_data_flow.png" width="500px" />

## Select Statement

**Pattern:**
```
result_vertex_set = SELECT alias_vertex_set
```

**Example:**
```
result = SELECT s
```

On it's own, the SELECT doesn't do much but there is one important thing to note. Both the **return** and the **alias** are VERTEX_SETs. These are sets meaning that they can only contain unique values. This isn't the most relevant right now, but it will be important to remember for later on.

## FROM Clause

This is where we specify from what we are selecting, and the pattern that it must follow. We'll start off simple with just selecting one type of Vertex.

**Pattern:**
```
source_vertex_set = {vertex_type.*}
result_vertex_set = SELECT alias_vertex_set FROM source_vertex_set:alias_vertex_set
```

**Example:** 
```
start = {Inventor.*};
result = SELECT s FROM start:s;
```

### Specifying a vertex_set

Let's take a look at this more closely. Before the SELECT statement is the line `start = {Inventor.*}`. This line is declaring a vertex_set. It is needed because our SELECT statement needs a *source_vertex_set*.

`Inventor.*` specifies all vertices of type Inventor, and wrapping a collection of vertices in `{}` converts it into a vertex_set.

We then use that `start` vertex_set as the source_vertex_set in the SELECT statement. `start:s` means that we are assigning the alias `s` to the *vertex_set* `start` for the duration of the SELECT statement. Any time we are referring to `s` we are referring to the contents of the `start` vertex_set, which, in this case is all of our Inventor vertices.

`SELECT s` means that we are selecting the vertex set by the alias `s` and assigning that to our result_vertex_set. The result of this query will be a vertex_set containing all the vertices in the `start` vertex_set, which is again, all of our Inventor vertices.

Technically we don't need a SELECT statement for an operation like this. `start` is already a vertex_set containing all Inventor vertices which is what our SELECT will output. Let's take a look at pattern matching to get the most value out of the SELECT statement.

### Pattern Matching

**Pattern:**
```
source_vertex_set = {vertex_type.*}
result_vertex_set = SELECT alias_vertex_set FROM source_vertex_set:alias_vertex_set - (edge_type:edge_type_alias) - destination_vertex_type:destination_alias
```

**Example:** 
```
start = {Inventor.*};
result = SELECT a FROM start:s - (filed_application:f) - Application:a;
```

This example SELECT will return a vertex_set of Application vertices.

Let's take a more in-depth look at the syntax of what's going on here.

`start:s` - we've seen this before, that's just letting us know what we're starting with our start vertex_set
<br>
` - (filed_application:f) - Application:a` - This is an edge traversal 

 ` - () - ` is the pattern for the edge traversal on either side of the pattern are the source and destination vertex_sets and their aliases, in the center of the `()` is the edge_type and edge_type_alias.

 I'm going to strip out the aliases from the example and we'll further investigate what exactly is going on. Remember that `start` is the same as all Inventor vertices.

 `start - (filed_application) - Application`

 We're starting from any **Inventor** vertex, and following any **filed_application** edges connected to those vertices that also destination in an **Application** vertex. This query is traversing the **filed_application** edge between the **Inventor** vertex_type and the **Application** vertex_type.

The aliases in the query are optional except for one. That one can be in any *vertex_set* spot and represents the selection set that you are trying to capture.

For example:

- `SELECT s FROM start:s - (filed_application) - Application` - will SELECT the vertex_set of Inventors
- `SELECT a FROM start - (filed_application) - Application:a` - will SELECT the vertex_set of Applications connected via filed_application edges to the start set of Inventors
- `SELECT e FROM start - (filed_application:e) - Application` - is NOT VALID because the result of a SELECT statement must be vertex_set, and e is an alias for an edge_set

This on it's own isn't the most useful. Again, we could just make a vertex_set of `{Application.*}` to get all Application vertices, but this is the core building block of all of our future queries.

## WHERE Clause

With what we know so far, we've just been selecting all of a particular type of vertex. The WHERE clause will allow us to filter our selections and begin to do some of our proper graph work

**Pattern:**
```
result_vertex_set = SELECT alias_vertex_set FROM source_vertex_set:alias_vertex_set WHERE alias_vertex_set.attribute == comparison_value
```

**Example:** 
```
start = {Inventor.*};
result = SELECT s FROM start:s WHERE s.first == "Joseph";
```

We're not limited to just `==` here, make sure to [check out the documentation](https://docs.tigergraph.com/gsql-ref/current/querying/select-statement/readme#_where_clause) to see the different comparison operators that you can use to interface with different types of values.

The above example is pretty straightforward, we start with a vertex_set of all Inventors, then from that vertex_set select any Inventors where their *first* attribute is equal to the string "Joseph". This is handy for filtering our Inventors, but what if we only want to query applications submitted by Inventors with the first name Joseph?

```
start = {Inventor.*};
result = SELECT a FROM start:s - () - Application:a
    WHERE s.first == "Joseph";
```

The above SELECT statement does just that. `result` will be equal to a vertex_set containing any Application vertices that are connected to an Inventor vertex whose `first` attribute is equal to "Joseph". Let's see it in action.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY joseph_applications() FOR GRAPH Patents {
    start = {Inventor.*};
    result = SELECT a FROM start:s - () - Application:a
        WHERE s.first == "Joseph";
    
    PRINT result;
}
""")

That's good, we're printing the vertex_set of Application vertices, but what if we also want to see the list of Inventor vertices where `first` == "Joseph"? Let's try what seems like it would work at first. (you're supposed to get an error here)

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY joseph_applications() FOR GRAPH Patents {
    start = {Inventor.*};
    result = SELECT a FROM start:s - () - Application:a
        WHERE s.first == "Joseph";
    
    PRINT result;
    PRINT s;
}
""")

`An undefined variable s in current scope` - **Our vertex_set_aliases are only available within the body of the SELECT statement.**

That is important. The `PRINT s` statement is after the SELECT statement, not within it, so `s` is no longer available. Moving the PRINT inside the SELECT does not work and would not be recommended anyway.

This is because `s` refers to an individual vertex from within the `start` vertex_set. Think of the SELECT statement like a big for loop over the source_vertex_set. It's a much more complicated and parallelized process than that.

To distill it down to it's simplest form, **the SELECT is a for loop over any edges matching the pattern specified in the FROM statement**.

Each instance of a vertex_set_alias (`s` or `a` in our example) represents the Vertex or Edge at a given point in the pattern for a particular match of the pattern.

In the example `s` will always be an Inventor vertex with the first name of Joseph, and `a` will always be an Application vertex connected by an edge to a `s` vertex.

To prove this, and to answer the initial question of listing our Inventors named Joseph as well as any matching Applications, we have to introduce our next clause.

## ACCUM Clauses

Here is where we get to really start having some fun. Accumulators are one of the most powerful (and satisfying) tools the GSQL query language has to offer.

Remember how I said that the SELECT is a big for loop? Accumulators essentially provide functionality per iteration of that for loop. That doesn't sound very sophisticated, but remember that this for loop is actually massively parallel and most of our "loops" are actually run simultaneously across multiple threads.

Let's take a simple look and see how we can count how many Applications matching a Joseph we retrieve.

In [22]:
conn.runInterpretedQuery("""
INTERPRET QUERY accumulators() FOR GRAPH Patents {
    
    SumAccum<INT> @@applications;
  
    start = {Inventor.*};
    result = SELECT a FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            @@applications += 1;
    
    PRINT @@applications AS num_Applications;
}
""")

[{'num_Applications': 30}]

### Global Accumulators

We have some new bits of code to talk about here, but it's all pretty straightforward.

`SumAccum<INT> @@applications;` - This line is just declaring our accumulator. Let's break it down.

- `SumAccum` - the type of accumulator (see all types [here](https://docs.tigergraph.com/gsql-ref/current/querying/accumulators))
- `<INT>` - the type that the accumulator stores
- `@@applications` - the name of the accumulator. **Global accumulators start with `@@` local accumulators start with `@`** (will explain later)

What we can now tell is that we're creating a Global SumAccumulator that stores integers and is named `@@applications`. (a sum accumulator just stores the sum of all the values that are added to it)

To actually add values to that accumulator, we need to reference it from within an ACCUM clause in our SELECT statement. For our example, that looks like this:
```
ACCUM
    @@applications += 1;
```

- `ACCUM` - signifies that what comes next is an accumulator operation
- `@@applications += 1;` - is the actual accumulation operation

In this case, for each edge matching the pattern, we are adding a value of `1` to our `@@applications` accumulator. Because of this, we would not be able to count our Josephs with a SumAccum.

Let's explore how we would do that.

In [23]:
conn.runInterpretedQuery("""
INTERPRET QUERY accumulators() FOR GRAPH Patents {
    
    SumAccum<INT> @@applications;
    SetAccum<Vertex<Inventor>> @@inventors;
  
    start = {Inventor.*};
    result = SELECT a FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            @@applications += 1,
            @@inventors += s;
    
    PRINT @@applications AS num_Applications;
    PRINT @@inventors.size() AS num_Inventors;
}
""")

[{'num_Applications': 30}, {'num_Inventors': 28}]

This all follows the same format as our SumAccum, except we're using a SetAccum. A SetAccum stores only unique values of it's specified type.

In this case we are setting our SetAccum to only accept Vertices of the vertex type Inventor. That's what `<Vertex<Inventor>>` is specifying. Technically we don't need to specify the specific vertex type (`<Vertex>` is valid) but it will prevent us from accidentally adding unwanted vertices to an accumulator should we make a mistake when writing a query.

Lastly, `.size()` gets the number of entries in the SetAccum. 

### Local Accumulators

What happens now if we want to group any applications that a Joseph has submitted with the Inventor who submitted them? Local accumulators are accumulators that "attach" to a vertex for the duration of a query. Essentially the accumulator will be referenced like an attribute on the vertex it is linked to.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY local_accumulators() FOR GRAPH Patents {
    
    SetAccum<Vertex<Application>> @applications;
  
    start = {Inventor.*};
    result = SELECT s FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            s.@applications += a;
    
    PRINT result;
}
""")

The local accumulator is declared just like the global one except with only one `@` symbol before the variable name.

Because this accumulator is linked to a vertex or edge, we need to specify that when we add a value to it. This is done by referencing the accumulator as if it were an attribute of the vertex or edge it is attached to. `s.@applications`

From the query output, we can see that `@application` is printed as if it is an attribute of the element it is attached to. We can also see that it lists the vertex_id of any vertex that has been added to it.

There are many different types of accumulators to explore, so I highly recommend you check them out in our [documentation](https://docs.tigergraph.com/gsql-ref/current/querying/accumulators).

## POST-ACCUM Clause

POST-ACCUM is like ACCUM, but slightly different. You know how ACCUM is like a for loop over each edge that matches the patter? Well POST-ACCUM is a for loop over each vertex. This is necessary for certain scenarios where you only want an operation to take place once per vertex, such as an update or deletion of the vertex.

Let's take a look at ACCUM and POST-ACCUM side by side to see the difference.

In [31]:
conn.runInterpretedQuery("""
INTERPRET QUERY post_accumulators() FOR GRAPH Patents {
    
    SumAccum<INT> @@accum_inventors;
    SumAccum<INT> @@post_accum_inventors;
  
    start = {Inventor.*};
    result = SELECT s FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            @@accum_inventors += 1
        POST-ACCUM
            @@post_accum_inventors += 1;
    
    PRINT @@accum_inventors;
    PRINT @@post_accum_inventors;
}
""")

[{'@@accum_inventors': 30}, {'@@post_accum_inventors': 28}]

You can see that the values for the two accumulators differ based on where they were called. Additionally, you cannot read the value of an accumulator from within and ACCUM statement. If you are looking to read the values of accumulators set in the ACCUM clause, you will need to do that in the POST-ACCUM clause.

## HAVING Clause

This is a quick one. The HAVING clause is basically another WHERE clause that is able to filter on the values of accumulators. Because the HAVING clause takes place after the ACCUM and POST-ACCUM clauses, it can access the values of accumulators.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY having_clause() FOR GRAPH Patents {
    
    SumAccum<INT> @applications;
  
    start = {Inventor.*};
    result = SELECT s FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            s.@applications += 1
        HAVING
            s.@applications > 1;
    
    PRINT result;
}
""")

The above query uses a Local SumAccum (`@applications`) to keep track of the number of applications that an Inventor has published, and then the HAVING statement filters down the results_vertex_set to only include Inventors where their application count is more than one.

## ORDER BY Clause

One of the consequences of doing the SELECT statement massively in parallel is that we don't know what order vertices or edges will be processed in. Because of this, running a query multiple times may result in the order of vertices or edges in your output being different each run.

To counter this, we can use the ORDER BY clause. This clause will allow us to order the results_vertex_set based on multiple sorting criteria. Let's have a look.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY order_by() FOR GRAPH Patents {
    
    SumAccum<INT> @applications;
  
    start = {Inventor.*};
    result = SELECT s FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            s.@applications += 1
        ORDER BY
            s.@applications DESC, s.last ASC;
    
    PRINT result;
}
""")

The above query will return a vertex_set of Inventors named Joseph ordered by how many applications they have filed, and secondarily ordered alphabetically by last name.

ORDER BY is needed for our next clause.

## LIMIT Clause

Exactly like it's name implies, the LIMIT clause sets the maximum limit on vertices that will be allowed in the results_vertex_set. Like we discussed before, without an ORDER BY clause, the output order of a query is random. Putting a limit on that randomly ordered set would result in random vertices being included each time the query is run. This is why ORDER BY is highly recommended in order to use the LIMIT clause.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY limit_offset() FOR GRAPH Patents {
    
    SumAccum<INT> @applications;
  
    start = {Inventor.*};
    result = SELECT s FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            s.@applications += 1
        ORDER BY
            s.@applications DESC, s.last ASC
        LIMIT 3;
    
    PRINT result;
}
""")

As we can see only 3 results are returned, because we have set the LIMIT to 3. The LIMIT clause also has a neat built in feature to make operations like pagination easy.

### LIMIT with OFFSET

The LIMIT clause also allows you to specify an offset after which to start the limit. For example, the query above printed the top 3 Inventors from our set, but what if I wanted to see the 3 next inventors on the list?

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY limit_offset() FOR GRAPH Patents {
    
    SumAccum<INT> @applications;
  
    start = {Inventor.*};
    result = SELECT s FROM start:s - () - Application:a
        WHERE s.first == "Joseph"
        ACCUM
            s.@applications += 1
        ORDER BY
            s.@applications DESC, s.last ASC
        LIMIT 3 OFFSET 3;
    
    PRINT result;
}
""")

That's all it takes, the additional OFFSET specifies which place in the results_vertex_set that the limit will start at. You can also represent the OFFSET as just part of the LIMIT clause. For example, `LIMIT 3, 3` is identical to `LIMIT 3 OFFSET 3`.

## Query Inputs

Up until now, all of our queries have been hardcoded to look for "Joseph". In the real world, we want that to be a dynamic input, to not just be limited to one word per query. Just like functions in your favorite programming language, GSQL queries can have inputs. These inputs can be any of the [data types](https://docs.tigergraph.com/gsql-ref/current/querying/data-types) that TigerGraph supports.

Let's make a String input for the first name that we want to search on our Inventors.

In [55]:
inputName = "Samuel"

conn.runInterpretedQuery("""
INTERPRET QUERY named_applications(STRING inName) FOR GRAPH Patents {
    start = {Inventor.*};
    result = SELECT s FROM start:s
        WHERE s.first == inName
        LIMIT 3;
    
    PRINT result;
}
""",
    params={ "inName": inputName }
)

[{'result': [{'v_id': 'Samuel,Danishefsky J',
    'v_type': 'Inventor',
    'attributes': {'id': 'Samuel,Danishefsky J',
     'first': 'Samuel',
     'middle': 'J',
     'last': 'Danishefsky'}},
   {'v_id': 'Samuel,Tenembaum Sergio',
    'v_type': 'Inventor',
    'attributes': {'id': 'Samuel,Tenembaum Sergio',
     'first': 'Samuel',
     'middle': 'Sergio',
     'last': 'Tenembaum'}},
   {'v_id': 'Samuel,Brodor ',
    'v_type': 'Inventor',
    'attributes': {'id': 'Samuel,Brodor ',
     'first': 'Samuel',
     'middle': '',
     'last': 'Brodor'}}]}]

The first thing to notice here is that we now have our input definition within the `()` after the query name. This definition is straightforward, `variable_type variable_name` followed by a comma and additional inputs if any. That input name will now be available as a variable within the context of this query. You can call this variable like you would in any language.

We also need to specify the input in our `conn.runInterpretedQuery()` function. That function actually has two inputs even though we had only been using the query body input so far. The second input is used to feed in query inputs or `params`. 

It's easiest to structure these params as a dictionary of **query_input_name: input_value**. You can see this in the above example.

Feel free to change up the `inputName` and see how that effects the query results.

## Multi-hop Patterns

So far we have only explored single-hop SELECT statements. These single hops mean that we only traverse one edge in the SELECT. Our previous queries were patterned `Inventor - () - Application` meaning that we traverse the edge between Inventor and Application. Sometimes we want to traverse more than one edge. For example if I want to find all Applications that were filed by and Inventor from a particular City. 

Application doesn't have an edge that links it directly to a City, but Inventor does. We can start at a City, traverse to any Inventors connected to that City, then traverse to any Applications connected to that Inventor all in one SELECT.

The pattern would look like this:
`City - () - Inventor - () - Application`

In [None]:
inputName = "Boston"

conn.runInterpretedQuery("""
INTERPRET QUERY named_applications(VERTEX<City> inCity) FOR GRAPH Patents SYNTAX v2 {
    
    start = {inCity};
    result = SELECT a FROM start:s - () - Inventor - () - Application:a;
    
    PRINT result;
}
""", params={ "inCity": inputName }
)

The main thing to note here is that we have to use `SYNTAX v2` and need to specify that after our `FOR GRAPH` declaration. From there we can specify our multi-hop pattern along the same way we specify a single-hop pattern. With the edge names explicitly labeled, this is what the pattern looks like:

`start:s - (from_city) - Inventor - (filed_application) - Application:a`

Another important line to note here is `start = {inCity}`. You'll note from earlier that our SELECT statement operates on vertex_sets. That means that any time we want to use a vertex input in a SELECT statement, it must be in the vertex_set format. `{}` converts a single vertex or SetAccum of vertices into a vertex_set.

## Directed Traversals

So far we haven't traversed any directional edges. The pattern for directional edges is almost identical to what we've been using so far, but with an added direction specifier. Let's take a look.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY directional_traversal() FOR GRAPH Patents {

    start = {Application.*};

    result = SELECT a FROM start:s - (has_parent) -> Application:a;

    PRINT result;
}
""")

The subtle difference here is the ` - () -> ` pattern for traversing a directed edge in it's forward direction. If you were looking traverse the edge in its reverse direction, you would need to specify the reverse edge name if your directed edge has one.

## Repeated Patterns

In the above example, we are looking for the parent Application to an Application. What if we wanted to find the grandparent, or great-grandparent? We could just keep writing out the pattern for the desired number of traversals: `start:s - () -> Application - () -> Application:a ...` but that looks sloppy and there's a better way.

`start:s - (has_parent>*) -> Application:a` will iterate through each instance of a has_parent edge connected to any Application connected to the start Application by a has_parent edge. The `>` signifies directionality and the `*` is what's telling us to iterate over as many edges as we can. `*3` would specify exactly 3 traversals deep. You can learn more about repeating patterns [here](https://docs.tigergraph.com/gsql-ref/current/tutorials/pattern-matching/repeating-a-pattern). To use repeating patterns, you will need to use `SYNTAX v2`.

Let's put together what we've learned so far in order to find our Applications with the most levels deep of parent Applications.

Our query will need to iterate over each instance of Application and count the number of has_parent edge chains per Application. That value will then need to be stored so that it can be sorted off of. These all sound like things we've done before, so this shouldn't be too hard.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY count_parents() FOR GRAPH Patents SYNTAX v2{
    
    SumAccum<INT> @parents;
    start = {Application.*};

    result = SELECT s FROM start:s - (has_parent>*) - Application:a
        ACCUM
            s.@parents += 1
        ORDER BY
            s.@parents DESC
        LIMIT 3;

    PRINT result;
}
""")

## The Order of a Query

This is something that we touched on briefly when designing the schema. Anything that we might want to filter on come query time should be its own Vertex type. This was due to being able start the query at the filter level and only touch any vertices connected to that filter vertex. 

I want to write a query that will find the most common USPCClass among all of our Applications. We could start from Application like we have been doing for our previous queries, but here we may benefit by starting from USPCClass. It's possible to have an Application that isn't tied to a USPCClass, but it's impossible to have a USPCClass that isn't connected to an Application. Therefor we effectively skip having to process any Applications that don't have a USPCClass making this order of traversal potentially faster.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY common_class() FOR GRAPH Patents {

    SumAccum<INT> @applications;
    start = {USPCClass.*};

    result = SELECT s FROM start:s - () - Application:a
        ACCUM
            s.@applications += 1
        ORDER BY
            s.@applications DESC;

    PRINT result;
}
""")

## Average Time to Complete an Application

Something that we haven't taken a look at yet is working with edge attributes. The *has_code* edge contains a datetime attribute that stores when that EventCode took place. By looking at the difference between the earliest and latest datetime for an Application, we can infer how long it took to process. Let's see what a query to do that looks like.

In [None]:
conn.runInterpretedQuery("""
INTERPRET QUERY application_time() FOR GRAPH Patents {

    MinAccum<DATETIME> @first_date;
    MaxAccum<DATETIME> @last_date;
    SumAccum<FLOAT> @duration;
    AvgAccum @@avg_days;

    start = {Application.*};

    result = SELECT s FROM start:s - (has_code:e) - EventCode:c
        ACCUM
            s.@first_date += e.date,
            s.@last_date += e.date
        POST-ACCUM
            INT duration = datetime_diff(s.@last_date, s.@first_date),  //returns time in seconds between 2 dates
            FLOAT days = duration/86400.0,  //divides seconds by seconds in a day to convert to days
            @@avg_days += days,
            s.@duration += days
        ORDER BY
            s.@duration DESC
        LIMIT 5;

    PRINT @@avg_days;
    PRINT result [result.@duration];
}
""")

### MinAccum and MaxAccum

The first thing you probably noticed in this query where two new accumulators, MinAccum and MaxAccum. These accumulators store the minimum and maximum values fed through them respectively. In this example, we're using them to store the minimum DATETIME (oldest date) and the maximum DATETIME (most recent date) of the dates that an Application's status updates happened on.

### Vertex Expression Set

The other thing you should have noticed was that our result didn't print out every single attribute of Application, but only the `@duration` vertex attached accumulator that we created. This was because of the `PRINT result [result.@duration]` line. The `[]` specifies to only print vertex attributes that exist within that list.

## Neighbors

You don't always need a full pattern matching if you're looking for some basic information about your graph. The `.neighbors()` function will return all vertices connected via an edge to the starting vertex. You can also provide a list of edge types as an input to only select vertices connected via that edge type. Let's see a quick example.

One thing to note is that the `neighbors()` function is not available in interpreted mode, so we'll have to install this query.

In [133]:
conn.gsql("""
USE GRAPH Patents
CREATE QUERY neighbors(VERTEX<City> inCity) FOR GRAPH Patents {
    BagAccum<VERTEX> @inventors;
    start = {inCity};

    result = SELECT s FROM start:s
        POST-ACCUM
            s.@inventors = s.neighbors(["from_city"]);
    
    PRINT result;
}
INSTALL QUERY neighbors
""")

'Using graph \'Patents\'\nSuccessfully created queries: [neighbors].\nStart installing queries, about 1 minute ...\nneighbors query: curl -X GET \'https://127.0.0.1:9000/query/Patents/neighbors?inCity=VALUE\'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.\nSelect \'m1\' as compile server, now connecting ...\nNode \'m1\' is prepared as compile server.\n\nQuery installation finished.'

In [None]:
inputName = "Paris"

conn.runInstalledQuery("neighbors", params={ "inCity": inputName })

## Inventor - Attorney Pairings

Let's see if any Inventors' Applications frequently get paired up with the same Attorney. 

In [168]:
conn.runInterpretedQuery("""
INTERPRET QUERY pairings() FOR GRAPH Patents SYNTAX v2 {
    TYPEDEF TUPLE <VERTEX<Attorney> attorney, INT cnt> attorneyCount;

    HeapAccum<attorneyCount>(200, cnt DESC) @attorneys;
    SumAccum<INT> @tot_attorneys;
    start = {Inventor.*};

    results = SELECT i FROM start:i - () - Application:a - () - Attorney:t
        ACCUM
            i.@attorneys += attorneyCount(t, 1),
            i.@tot_attorneys += 1
        ORDER BY 
            i.@tot_attorneys DESC;

    PRINT results [results.@attorneys, results.@tot_attorneys];
}
""")

[{'results': [{'v_id': 'Samuel,Dufour ',
    'v_type': 'Inventor',
    'attributes': {'results.@attorneys': [{'attorney': '70451', 'cnt': 1},
      {'attorney': '58410', 'cnt': 1},
      {'attorney': '58855', 'cnt': 1},
      {'attorney': '46670', 'cnt': 1},
      {'attorney': '57871', 'cnt': 1},
      {'attorney': '32297', 'cnt': 1},
      {'attorney': '52082', 'cnt': 1},
      {'attorney': '55030', 'cnt': 1},
      {'attorney': '71691', 'cnt': 1},
      {'attorney': '68912', 'cnt': 1},
      {'attorney': '74998', 'cnt': 1},
      {'attorney': '65642', 'cnt': 1},
      {'attorney': '73513', 'cnt': 1},
      {'attorney': '61546', 'cnt': 1},
      {'attorney': '58621', 'cnt': 1},
      {'attorney': '31430', 'cnt': 1},
      {'attorney': '58474', 'cnt': 1},
      {'attorney': '72250', 'cnt': 1},
      {'attorney': '61595', 'cnt': 1},
      {'attorney': '69133', 'cnt': 1},
      {'attorney': '66619', 'cnt': 1},
      {'attorney': '66226', 'cnt': 1},
      {'attorney': '62676', 'cnt': 1},
