# Worksheet 1 - Basic Read Queries

### Exercise 1: Find the first name of Dave's friends

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the friends of Dave (i.e. traverse the `friends` edge)
* Return the friends `first_name`

The correct answer is four results: "Jim", "Josh", "Hank", "Kelly"

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave')
.out('friends')
.values('first_name')

### Exercise 2: Find the first name of the friends of Dave's friends

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the friends of Dave (i.e. traverse the `friends` edge)
* Find the friends of that person (i.e. traverse the `friends` edge)
* Return the friends `first_name`

The correct answer contains three results: "Hank", "Denise", "Paras"

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave')
.out('friends')
.out('friends')
.dedup()
.values('first_name')

### Exercise 3: Find out how the friends of Dave's friends are connected

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the friends of Dave (i.e. traverse the `friends` edge)
* Find the friends of that person (i.e. traverse the `friends` edge)
* Return the path

The correct answer contains three results:

- `Dave` -> `Josh` -> `Hank`
- `Dave` -> `Kelly` -> `Denise`
- `Dave` -> `Jim` -> `Paras`

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave')
.out('friends')
.out('friends')
.dedup()
.path()
.by(elementMap())

### Exercise 4: Which friends should we recommend for Dave?

A common use case for graphs in social networks is to recommend new connections. There is a significant amount of research in this area (example [here](https://www.science.org/doi/10.1126/sciadv.aax7310#:~:text=The%20triadic%20closure%20mechanism%20uses,features%20of%20empirical%20social%20networks)) but mainly there are two prevailing mechanisms at work in social networks that we can leverage to help provide efficient recommendations to a user.  The first of these mechanisms is called homophily, which is the tendency of similar people to be connected.  Homophily is a driving factor in many social networks, with an important outcome being that people connected to you, or connected to people that are connected to you, tend to be similar to you.  This leads to the second mechanism in a graph, the concept of a triadic closure.  Triadic closure is a way to create or recommend new connections based on common friends or acquaintances.  


In this exercise, we are going to leverage triadic closure to recommend friends for Dave.  To accomplish this, we will need to leverage the previously written queries but extend them to:

* Find all the friends of friends that do not have a connection to Dave

The correct answer contains three results: "Hank", "Denise", "Paras"

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave').as('dave')
.out('friends')
.out('friends')
.where(neq('dave'))
.dedup()
.values('first_name')

# Worksheet 2 - Loops and Repeat Queries

### Exercise 1: Find the friends of Dave's Friends using a loop.

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the friends of Dave (i.e. traverse the `friends` edge)
* Find the friends of that person (i.e. traverse the `friends` edge)
* Return the friends `first_name`

The correct answer is a three results: "Hank", "Denise", "Paras"

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave')
.repeat(
    out('friends')
    .simplePath()
)
.times(2)
.dedup()
.values('first_name')

### Exercise 2: Find all `person` nodes connected to Dave.

Starting at a single node and trying to find all connected children (a.k.a. root to leaf) or trying to find the parent of any child node (a.k.a leaf to root) are two very common hierarchical graph query patterns.  Commonly, these queries supported bill of materials, information organization, or compliance use cases.

In this exercise, we will be applying that same query pattern to find the hierarchy of people within our social network.  We'll accomplish this by writing a "root to leaf" type query where the root node is our `Dave` node in the social network.

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Keep traversing the outgoing `friends` edge until there are no more outgoing `friends` edges
* Return all the paths

The correct answer has 5 results

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave')
.repeat(
    out('friends')
)
.until(not(out('friends')))
.path()

### Exercise 3: Find all the ways Dave and Denise are connected.

A common extension to the path traversal query we wrote in Loop-3 is to return not just "if" someone is connected but "how" they are connected.

In this exercise, we will be making a slight modification to the previous query to return "how" Dave and Denise are connected, not just that they are.

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the friends of Dave (i.e. traverse the `friends` edge)
* Keep traversing the `friends` edge until you find `Denise`
* Return the path

The correct answer has 3 results

In [None]:
%%gremlin

g.V().hasLabel('person')
.has('first_name','Dave')
.repeat(
    out('friends')
    .simplePath()
)
.until(
    has('first_name','Denise')
)
.path()

# Worksheet 3 - Ordering, Functions, and Grouping

### Exercise 1: What are the 3 highest rated restaurants?

Using the data model above, write a query that will:

* Find the 3 highest average restaurant rating
* Find the associated `cuisine`
* Return the restaurant name, the cuisine name, and the average rating
* Order the results by average rating descending

The results for this query are:

|Restaurant name|Cuisine|Avg Rating|
|---|---|---|
|Lonely Grape|bar|5.0|
|Perryman's|bar|4.5|
|Rare Bull|steakhouse|4.333333|

In [None]:
%%gremlin

g.V()
.hasLabel('cuisine')
.in('serves')
.group()
.by(identity())
.by(in('about').values('rating').mean())
.unfold()
.order()
.by(values, desc)
.limit(3)
.unfold()
.project('restaurant name','cuisine','avg rating')
.by(select(keys).values('name'))
.by(select(keys).out('serves').values('name'))
.by(select(values))

### Exercise 2: Find the top 3 highest rated restaurants in the city where Dave lives.

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the `city` that Dave lives in
* Find the average rating of restaurants in that city
* Find the top 3 average ratings
* Return the restaurant name, address, and average rating
* Order by the average rating descending

The results for this query are:

|Restaurant name|Address|Avg Rating|
|---|---|---|
|Dave's Big Deluxe|	490 Ivan Cape|4.0|
|Pick & Go|4881 Upton Falls|3.75|
|Without Chaser|	01511 Casper Fall|3.5|

In [None]:
%%gremlin

g.V().has('person','first_name','Dave')
.out('lives')
.in('within')
.where(inE('about'))
.group()
.by(identity())
.by(in('about').values('rating').mean())
.unfold()
.order()
.by(values,desc)
.limit(3)
.unfold()
.project('restaurant name','address','avg rating')
.by(select(keys).values('name'))
.by(select(keys).values('address'))
.by(select(values))

### Exercise 3: Which Mexican or Chinese restaurant near Dave is the highest rated?

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the `city` that Dave lives in
* Find the restaurants in that city that serve 'Mexican' or 'Chinese' food
* Find the average rating of those restaurants
* Return the restaurant name, address, and average rating
* Order by the average rating descending
* Return the top 1 result

The results for this query are:

|Restaurant name|Address|Avg Rating|
|---|---|---|
|With Salsa|24320 Williamson Causeway|3.5|

In [None]:
%%gremlin

g.V().has('person','first_name','Dave')
.out('lives')
.in('within')
.where(out('serves').has('name',within('Mexican','Chinese')))
.where(inE('about'))
.group()
.by(identity())
.by(in('about').values('rating').mean())
.unfold()
.order()
.by(values,desc)
.limit(1)
.unfold()
.project('restaurant name','address','avg rating')
.by(select(keys).values('name'))
.by(select(keys).values('address'))
.by(select(values))

### Exercise 4: What are the top 3 restaurants, recommended by his friends, where Dave lives? (Personalized Recommendation)

Using the data model above, write a query that will:

* Find a `person` node(s) with a `first_name` of "Dave"
* Find the `city` that Dave lives in
* Find Dave's friends
* Find reviews written by Dave's friends in the city "Dave" lives in
* Find the average rating of those restaurants
* Return the restaurant name, address, and average rating
* Order by the average rating descending
* Return the top 3

The results for this query are:

|Restaurant name|Address|Avg Rating|
|---|---|---|
|Dave's Big Deluxe|490 Ivan Cape|4.0|
|With Salsa|24320 Williamson Causeway|4.0|
|Satiated|370 Hills Estates|3.666667|

In [None]:
%%gremlin

g.V().has('person','first_name','Dave').as('dave')
.out('lives')
.in('within')
.where(in('about').in('wrote').both('friends').where(eq('dave')))
.group()
.by(identity())
.by(in('about').values('rating').mean())
.unfold()
.order()
.by(values,desc)
.limit(3)
.unfold()
.project('restaurant name','address','avg rating')
.by(select(keys).values('name'))
.by(select(keys).values('address'))
.by(select(values))

# Worksheet 4 - Create, Update and Delete Queries

### Exercise 1: Create a new person `Leonhard Euler`  and connect them to `Dave`.

Using the data model above, write a query that will:

* Create a new `person` node with a name of `Leonhard Euler` 
* Connect the new node to "Dave" via a `friends` edge
* Return the new connection

The results for this query is ID of the new edge

In [None]:
%%gremlin

g.addV('person').property('name','Leonhard Euler')
 .addE('friends').to(__.V().has('person','first_name','Dave'))
 
//OR

//g
// .mergeV([(T.id):'leo', (T.label):'person', name: 'Leonhard Euler')
// .mergeE([(T.label):'friends',(from):Merge.outV,(to):Merge.inV])
//    .option(Merge.outV, [(T.label): 'person', name: 'Leonhard Euler'])
//    .option(Merge.inV, [(T.label): 'person', first_name: 'Dave', last_name: 'Bech'])

### Exercise 2: Upsert a list of followers and add an edge to `Dave`.

Using the data model above, write a query that will:

* Given the following list:
    ```
    [{first_name: 'Taylor', last_name: 'Hall'},
    {first_name: 'Kelvin', last_name: 'Fernsby'},
    {first_name: 'Ian', last_name: 'Rochester'}]
    ```
* Add or update `person` nodes for each item in the list
* Add or update a `follows` relationship between each new node and "Dave"
* If the edge is created write a property `creation` with a value `Created`
* If the edge already exists write a property `creation` with a value `Updated`
* Return the new edge elements
* This query should be re-runable without creating new nodes or edges

The results for this query are the three edge elements

In [None]:
%%gremlin

g.V().hasLabel("person")
.has("first_name","Taylor").has("last_name","Hall")
.fold().coalesce(unfold(),addV('person').property('first_name','Taylor').property('last_name','Hall'))
.V().hasLabel("person")
 .has("first_name","Kelvin").has("last_name","Fernsby")
 .fold().coalesce(unfold(),addV('person').property('first_name','Kelvin').property('last_name','Fernsby'))
.V().hasLabel("person")
 .has("first_name","Ian").has("last_name","Rochester")
 .fold().coalesce(unfold(),addV('person').property('first_name','Ian').property('last_name','Rochester'))

.V().hasLabel("person")
 .has("first_name","Taylor").has("last_name","Hall")
 .outE('follows')
     .where(inV().has('person','first_name','Dave'))
 .fold().coalesce(unfold().property('creation','updated'), 
     addE('follows').from(__.V().has('person','first_name','Taylor')).to(__.V().has('person','first_name','Dave')).property('creation','created')
 )
.V().hasLabel("person")
 .has("first_name","Kelvin").has("last_name","Fernsby")
 .outE('follows')
     .where(inV().has('person','first_name','Dave'))
 .fold().coalesce(unfold().property('creation','updated'), 
     addE('follows').from(__.V().has('person','first_name','Kelvin')).to(__.V().has('person','first_name','Dave')).property('creation','created')
 )
.V().hasLabel("person")
 .has("first_name","Ian").has("last_name","Rochester")
 .outE('follows')
     .where(inV().has('person','first_name','Dave'))
 .fold().coalesce(unfold().property('creation','updated'), 
     addE('follows').from(__.V().has('person','first_name','Ian')).to(__.V().has('person','first_name','Dave')).property('creation','created')
 )
.V().hasLabel('person')
    .outE('follows').elementMap()

### Exercise 3: Delete all `follows` edges and remove any connected nodes with no other edges.

Using the data model above, write a query that will:

* Find all the `follows` edges and connected nodes and remove the edges
* For each of the connected nodes see if they have any other edges
* If they have edges then ignore them
* If they have no edges then remove them

In [None]:
%%gremlin

g.E().hasLabel('follows').aggregate('edges')
.bothV()
.hasLabel('person')
.where(out().count().is(eq(1))).aggregate('nodes')
.select('edges').unfold().drop()
.select('nodes').unfold().drop()