# Intro to Graph Databases

# Solution to ORM
See solution [at this github repository](https://github.com/kaspercphbusiness/PicoOrm)

In [None]:
%%bash
docker run \
    -d --name neo4j \
    --rm \
    --publish=7474:7474 \
    --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/fancy!99Doorknob \
    neo4j

In [1]:
%%bash
docker ps -a

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                      NAMES
b856f2cdbb92        neo4j               "/sbin/tini -g -- /d…"   8 days ago          Up 8 days           0.0.0.0:7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp   neo4j
d7f725f4c199        mysql               "docker-entrypoint.s…"   8 days ago          Up 8 days           0.0.0.0:3306->3306/tcp, 33060/tcp                          my_mysql


Navigate to http://localhost:7474/browser/ to work with a browser based Neo4j console:

![](./images/neo4j_login.png)

### A Simple Graph

The simplest graph has just a single node with some named values called Properties. 

Start by drawing a circle for the node. Add a name of the corresponding person, her job and her birthday.

  * **Nodes** are the name for data records in a graph
  * Data is stored as **Properties**
  * **Properties** are simple key/value pairs
  
  
![](./images/single_node.gv.svg)

###  Labels

Labels associate a set of nodes. Think of them as the type of your nodes.

Nodes can be grouped together by applying a Label to each member. In our social graph, we will label each node that represents a `Person`.

Apply the label `Person` to the node we created for *Odessa*

  * Color `Person` nodes green
  * A node can have zero or more labels
  * Labels do not have any properties

### More Nodes

Schema-free, nodes can have a mix of common and unique properties

Like any database, storing data in Neo4j can be as simple as adding more records. We will add a few more nodes:

  * Emil has a klout score of 99 (https://en.wikipedia.org/wiki/Klout)
  * Johan, from Sweden, who is learning to surf
  * Ian, from England, who is an author
  * Rik, from Belgium, has a cat named Orval
  * Allison, from California, who surfs
  * Similar nodes can have different properties
  * Properties can be strings, numbers, or booleans

Neo4j can store billions of nodes.

### Relationships

Relationships connect nodes in the graph. The real power of Neo4j is in connected data. To associate any two nodes, add a Relationship which describes how the records are related.

In our social graph, we simply say who `KNOWS` whom:

  * Emil `KNOWS` Johan and Ian
  * Johan `KNOWS` Ian and Rik
  * Rik and Ian `KNOWS` Allison
  * Relationships always have direction **OBS** That is, we model directed graphs in Neo4J.
  * Relationships always have a type
  * Relationships form patterns of data
  
![](images/simple_graph.gv.svg)

### Relationship properties

Store information shared by two nodes.

In a property graph, relationships are data records that can also contain properties. Looking more closely at Emil's relationships, note that:

  * Emil has known Johan since 2001
  * Emil rates Ian 5 (out of 5)
  * Everyone else can have similar relationship properties


![](images/simple_graph.gv.svg)

## Intro to Cypher

*Cypher* is Neo4j's graph query language. It is purpose built for working with graph data.

It makes it easy to work with graphs as it uses:
  
  * patterns to describe graph data
  * familiar SQL-like clauses

*Cypher* is a declarative language. That is, your patterns describe **what** to find, not **how** to find it.



## The driver for python is:

`pip3 install neo4j` - there is an older one around named `neo4j-driver`.


In [2]:
import sys
from neo4j import GraphDatabase

uri = "bolt://localhost:7687"
auth=("neo4j", "fancy!99Doorknob")
driver = GraphDatabase.driver(uri, auth=auth)

def neo(command):
    try:
        with driver.session() as session:
            result = session.run(command)
        return result # result is a resultset/cursor for neo4j
    except Exception as ex:
        print(str(ex), file=sys.stderr)
        
def neov(command):
    try:
        return neo(command).values()
    except Exception as ex:
        return 'Shit happened'
"done"

'done'

### Creating Nodes

Let's use Cypher to generate a small social graph.

In [3]:
neov('''
CREATE (a:Person { name: "Emil", from: "Sweden", klout: 99 })
''')

[]

That is it. The `CREATE` clause cossesponds to you drawing a circle on the whiteboard. You use it to create data, where:

  * `()` parenthesis indicate a node
  * `a:Person` a variable `a` and label `Person` for the new node
  * `{}` brackets add properties to the node

## Finding nodes

To find the node representing Emil:

In [3]:
neov('''
MATCH (a:Person) 
WHERE a.name = "Emil" 
RETURN a
''')

[[<Node id=0 labels={'Person'} properties={'name': 'Emil', 'from': 'Sweden', 'klout': 99}>]]

The `MATCH` clause specifies a pattern of nodes and relationships.

  * `(a:Person)` a single node pattern with label 'Person' which will assign matches to the variable 'a'
  * `WHERE` clause to constrain the results
  * `a.name = "Emil"` compares name property to the value `"Emil"`
  * `RETURN` clause used to request particular results

To find all nodes in your database you would relax your query and just search for example for all nodes:

In [None]:
neov('''
MATCH (a)
RETURN a;
''')

### Counting Nodes

In [None]:
neov('''
MATCH (a) 
RETURN count(*);
''')

### Create More Nodes and Relationships

In [None]:
neov('''
CREATE (js:Person { name: "Johan", from: "Sweden", learn: "surfing" })
''')

In [5]:
neov('''
MATCH (z:Person)
RETURN id(z);
''')

[]

After creating two nodes -one for Emil and one for Johan respectively- manually, we have two disconnected nodes in the dataset. You could add a `KNOWS` relationship between them by matching the nodes and creating a new relationship.

In [None]:
neov('''
MATCH (a),(b)
WHERE a.name = "Emil" AND b.name = "Johan"
CREATE (a)-[:KNOWS {since: 2001}]->(b)
''')

However, the `CREATE` clause can create many nodes and relationships at once too.

In [None]:
neov('''
MATCH (a:Person),(b:Person) 
WHERE a.name = "Emil" AND b.name = "Johan"
CREATE (ir:Person { name: "Ian", from: "England", title: "author" }),
(rvb:Person { name: "Rik", from: "Belgium", pet: "Orval" }),
(ally:Person { name: "Allison", from: "California", hobby: "surfing" }),
(a)-[:KNOWS {rating: 5}]->(ir),(b)-[:KNOWS]->(ir),(b)-[:KNOWS]->(rvb),
(ir)-[:KNOWS]->(b),(ir)-[:KNOWS]->(ally),(rvb)-[:KNOWS]->(ally)
''')

In [None]:
neov('''
MATCH (a)
RETURN a;
''')

### How to see the entire graph?

In [None]:
neov('''
MATCH (a:Person)-[r:KNOWS]-(b:Person)
RETURN a, r, b
limit 2
''')

### Pattern Matching

Patterns -think of them as ASCII art representation of patterns in the graph- describe *what* to find in the graph. For instance, a pattern can be used to find Emil's friends:

In [None]:
neov('''
MATCH (a:Person)-[r:KNOWS]-(friends)
WHERE a.name = "Emil" 
RETURN a, r, friends
limit 2
''')

  * `MATCH` clause to describe the pattern from known Nodes to found Nodes
  * `(a)` starts the pattern with a Person (qualified by WHERE)
  * `-[:KNOWS]-` matches `KNOWS` relationships (in either direction)
  * (friends)will be bound to Emil's friends

#### How many friends does Emil have?

In [None]:
neov('''
MATCH (a:Person)-[r:KNOWS]-(friends)
WHERE a.name = "Emil" 
RETURN a.name,count(friends)
''')

### Recommendations

Using patterns and pattern matching we can quickly generate recommendations. For example, Johan is learning to surf, so he may want to find a new friend who already does:

*Notice:* Recommendation is not a neo4j concept - but the usual *recommendation* as you know from web-shops. "Customers who liked X also like Y".

In [None]:
neov('''
MATCH (a:Person)-[:KNOWS]-()-[:KNOWS]-(surfer)
WHERE a.name = "Johan" AND surfer.hobby = "surfing"
RETURN DISTINCT surfer
''')

  * `()` empty parenthesis to ignore these nodes
  * `DISTINCT` because more than one path will match the pattern
  * `surfer` will contain Allison, a friend of a friend who surfs

### Deleting Nodes and Relations


You can delete nodes with the help of the `DELETE` directive.

```cypher
MATCH (a)
DELETE a
```

However, you cannot delete nodes, which are still connected with relations. That will result in an error message similar to the following:

```
Cannot delete node<512>, because it still has relationships. To delete this node, you must first delete its relationships.
```

Consequently, either delete the relations first or do both steps in one query

```cypher
MATCH (a)-[r]-(b)
DELETE a,b,r
```

In [4]:
# delete all nodes with label Person
neov('''
MATCH (g:Person)
DETACH DELETE g
''')

[]

# Your turn!

Connect to the Neo4J web client and **read** and execute the movie graph tutorial

```
:play movie graph
```

Subsequently, create queries, which answer the following questions:

  1. How many movies from the 90ies are in our database and what are their names?
  * How many movies between 2000 and 2010 are in our database and what are their names?
  * Who produced "V for Vendetta"?
  * Which movies were directed by "Lana Wachowski"?
  * In which movies acted "Carrie-Anne Moss"?
  * Who were coactors of "Carrie-Anne Moss"?
  * In which way are people related to the movie "V for Vendetta"?
  * What is the shortest path between "Carrie-Anne Moss" and "Natalie Portman"?
  * With which other persons did "Natalie Portman" already work together? 
  * Create a recommendation to find actors with which "Natalie Portman" never worked together but her former colleagues did.

In [None]:
neov('''
CALL db.schema()
''')

##### Student Solutions

1. How many movies from the 90ies are in our database and what are their names?

```cypher
MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title
```

```cypher
MATCH (a:Movie) WHERE a.released >= 1990 AND a.released <= 1999 return collect(a.title), count(a)
```

```cypher
MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN count(nineties);
```

```cypher
match (a:Movie) where a.released > 1989 and a.released < 2000 return a.title;
```


2. How many movies between 2000 and 2010 are in our database and what are their names?

```cypher
MATCH (nineties:Movie) WHERE nineties.released > 2000 AND nineties.released < 2010 RETURN nineties.title
```

```cypher
MATCH (a:Movie) WHERE a.released >= 2000 AND a.released <= 2010 return collect(a.title), count(a)
```

```cypher
match (a:Movie) where a.released > 1999 and a.released < 2011 return a,count(a)
```

3. Who produced "V for Vendetta"?

```cypher
MATCH (v {title: "V for Vendetta"})<-[:PRODUCED]-(producers) RETURN producers.name
```

```cypher
MATCH (title {title: "V for Vendetta"})<-[:PRODUCED]-(producers) RETURN producers.name
```

```cypher
MATCH (a:Person)-[:PRODUCED]->(b:Movie) WHERE b.title ="V for Vendetta" RETURN a.name
```

```cypher
MATCH (m:Movie)-[:PRODUCED]-(producer) WHERE m.title = "V for Vendetta" RETURN producer
```

```cypher    
MATCH (director:Person)-[:DIRECTED]->(movie:Movie)WHERE movie.title = "V for Vendetta" RETURN director;
```

4. Which movies were directed by "Lana Wachowski"?

```cypher
MATCH (p {name:"Lana Wachowski"})-[:DIRECTED]->(movie) RETURN movie.title
```

```cypher
MATCH (director:Person {name: "Lana Wachowski"})-[:DIRECTED]->(movies) RETURN movies.title
```

```cypher
MATCH (a:Person)-[:DIRECTED]->(b:Movie) WHERE a.name ="Lana Wachowski" RETURN b.title
```

```cypher
MATCH (director:Person)-[r:DIRECTED]->(movie:Movie) WHERE director.name = "Lana Wachowski" RETURN movie.title;
```

```cypher
MATCH (x:Person)-[:DIRECTED]->(m:Movie) WHERE x.name = "Lana Wachowski" RETURN m
```

5. In which movies acted "Carrie-Anne Moss"?

```cypher
MATCH (p {name:"Carrie-Anne Moss"})-[:ACTED_IN]->(movie) RETURN movie.title
```

```cypher
MATCH (a:Person)-[:ACTED_IN]->(b:Movie) WHERE a.name ="Carrie-Anne Moss" return b.title
```

```cypher
MATCH (carrie:Person {name: "Carrie-Anne Moss"})-[:ACTED_IN]->(movies) RETURN movies.title
```

```cypher
MATCH (actor:Person)-[r:ACTED_IN]->(movie:Movie) WHERE actor.name = "Carrie-Anne Moss" RETURN movie.title;
```

```cypher
MATCH (x:Person)-[:ACTED_IN]->(m:Movie) WHERE x.name = "Carrie-Anne Moss" RETURN m
```

6. Who were coactors of "Carrie-Anne Moss"?

```cypher
MATCH (p {name:"Carrie-Anne Moss"})-[:ACTED_IN]->()<-[:ACTED_IN]-(coact) RETURN coact.name
```

```cypher
MATCH (a:Person)-[:ACTED_IN]-(b:Movie)<-[:ACTED_IN]-(c:Person) WHERE c.name = "Carrie-Anne Moss" return a.name
```

```cypher
MATCH (carrie:Person {name:"Carrie-Anne Moss"})-[:ACTED_IN]->(movies)<-[:ACTED_IN]-(coActors) RETURN DISTINCT coActors.name
```

```cypher
MATCH (x:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(co:Person) WHERE x.name = "Carrie-Anne Moss" RETURN co
```

7. In which way are people related to the movie "V for Vendetta"?

```cypher
MATCH (people:Person)-[relatedTo]-(v:Movie {title: "V for Vendetta"}) RETURN v, people, Type(relatedTo), relatedTo
```

```cypher
MATCH (a:Person)-[r]->(b:Movie) WHERE b.title ="V for Vendetta" return a.name,type(r)
```

```cypher
MATCH (people:Person)-[relatedTo]-(:Movie {title: "V for Vendetta"}) RETURN DISTINCT people.name, Type(relatedTo)
```

```cypher
MATCH (x:Movie)-[r]-() WHERE x.title ="V for Vendetta" RETURN DISTINCT Type(r)
```


8. What is the shortest path between "Carrie-Anne Moss" and "Natalie Portman"?

```cypher
MATCH p=shortestPath(
  (carrie:Person {name:"Carrie-Anne Moss"})-[*]-(nat:Person {name:"Natalie Portman"})
)
RETURN p
```

```cypher
MATCH (Carrie:Person { name: 'Carrie-Anne Moss' }),(Natalie:Person { name: "Natalie Portman" }), p = shortestPath((Carrie)-[*..15]-(Natalie))
RETURN p
```

```cypher
MATCH p=shortestPath(
  (carrie:Person {name:"Carrie-Anne Moss"})-[*]-(nat:Person {name:"Natalie Portman"})
)
RETURN p
```

```cypher
MATCH p=shortestPath((C:Person {name:"Carrie-Anne Moss"})-[*]-(D:Person{name:"Natalie Portman"})) RETURN p
```

```cypher
MATCH p = shortestpath((natalie:Person {name: "Natalie Portman"})-[r *1..5]-(carrie:Person {name: "Carrie-Anne Moss"})) RETURN collect(p), length(p), natalie, r, carrie;
```

9. With which other persons did "Natalie Portman" already work together?

```cypher
MATCH (p {name:"Natalie Portman"})-[]->()<-[]-(coact) RETURN coact.name
```

```cypher
MATCH (p1 {name:"Natalie Portman"})-[]->()<-[]-(p2) RETURN p2
```

```cypher
MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)-[r]-(coactor:Person) WHERE actor.name = "Natalie Portman" RETURN DISTINCT coactor;
```

```cypher
MATCH (x:Person {name:"Natalie Portman"})-[]->(:Movie)<-[]-(coWorkers:Person) RETURN coWorkers
```


10. Create a recommendation to find actors with which "Natalie Portman" never worked together but her former colleagues did.

```cypher
MATCH (p1 {name: "Natalie Portman"})-[]->()<-[]-(p2)-[]->()<-[]-(p3) where not (p1)-[]->()<-[]-(p3) RETURN distinct p3.name
MATCH (p1:Person {name: "Natalie Portman"})-[]->(:Movie)<-[]-(p2:Person)-[]->(:Movie)<-[]-(p3:Person) where not (p1)-[]->(:Movie)<-[]-(p3) RETURN distinct p3.name
```

```cypher
MATCH (nat:Person {name:"Natalie Portman"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors),
      (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors)
WHERE NOT (nat)-[:ACTED_IN]->()<-[:ACTED_IN]-(cocoActors) AND nat <> cocoActors
RETURN cocoActors.name AS Recommended
```

```cypher
MATCH (a:Person)-[:ACTED_IN]->(b:Movie)-[:ACTED_IN]-(coa:Person)-[:ACTED_IN]->(coab:Movie)-[:ACTED_IN]-(ccoa:Person) WHERE         a.name = "Natalie Portman" RETURN DISTINCT ccoa
```

```cypher    
MATCH (p:Person {name:"Natalie Portman"})-[]->(:Movie)<-[]-(co:Person), (co)-[]->(:Movie)<-[]-(coco)
WHERE NOT (p)-[]->(:Movie)<-[]-(coco) AND p <> coco
RETURN coco.name
```

# Spatial Queries

Neo4j with version greater 3.0 supports functions to specify points in a 2D coordinate system and to calculate the geodesic distance between two points directly. https://neo4j.com/docs/developer-manual/current/cypher/functions/spatial/


Let's get some locations for airports...

In [None]:
neov("""
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/training/master/modeling/data/airports.csv" AS row
RETURN row
LIMIT 1
""")

## Notice:
The above *did not* add any data to the database

### The next one do add to the database

### But lattitude and longitude are strings


In [None]:
neov('''
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/training/master/modeling/data/airports.csv" AS row
WITH row
WHERE NOT row.iata_code IS NULL
MERGE (a:Airport {iata:row.iata_code,
    name: row.name,
    latt:toFloat(row.latitude_deg), 
    long:toFloat(row.longitude_deg)})
''')

In [None]:
neov('''
match (a:Airport)
return a
limit 3
''')

# Spatial distance
To compute the distance between two points, we have to generate `Point` objects out of the latitude and longitude properties, on which we can call the `distance` function.

In [None]:
# copenhagen latt:55.6867243, long:12.5700724
neov(""" 
MATCH (a:Airport)
WITH point({ longitude: a.long, latitude: a.latt }) AS aPoint,
    point({ longitude: 12.5700724, latitude: 55.6867243 }) as cph, a
WITH round(distance(aPoint, cph)) / 1000 as distance, a
ORDER BY distance DESC
RETURN DISTINCT a.iata, a.name, distance
limit 5
""")

In [None]:
neov('''
unwind filter(m in split("her er @joe and @billy"," ") where m starts with "@") as x
return right(x,size(x)-1)
''')

In [None]:
neov('''
CREATE (js:Person { name: "Johan", from: "Sweden", learn: "surfing", lucky:[1,2,3,4,5]})
''')

In [None]:
neov('''
match (a:Person)
return a.lucky
''')