---
<center><h1> Lesson 7.2 - NoSQL with Python: Neo4j Graph Database</center></h1>

---

### What is NoSQL?

Over the last few years we have seen the rise of a new type of databases, known as **NoSQL (Not Only SQL) databases**, that are challenging the dominance of _relational SQL databases_. Relational databases have dominated the software industry for a long time providing mechanisms to store data persistently, concurrency control, transactions, mostly standard interfaces and mechanisms to integrate application data, reporting. 

There are four general types of NoSQL databases, each with their own specific attributes:

* **Graph database** – based on graph theory, these databases are designed for data whose relations are well represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. Examples: Neo4j, Titan, etc.

* **Key-Value store** – these databases are designed for storing data in a schema-less way. In a key-value store, all of the data within consists of an indexed key and a value, hence the name. Examples: Cassandra, DynamoDB, etc.

* **Column store** – (also known as wide-column stores) instead of storing data in rows, these databases are designed for storing data tables as sections of columns of data, rather than as rows of data. While this simple description sounds like the inverse of a standard database, wide-column stores offer very high performance and a highly scalable architecture. Examples: HBase, BigTable, etc.

* **Document database** – expands on the basic idea of key-value stores where “documents” contain more complex in that they contain data and each document is assigned a unique key, which is used to retrieve the document. These are designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Examples: MongoDB, CouchDB, etc.

### Why NoSQL Databases?

**Structure and type of data being kept:**
SQL/Relational databases require a structure with defined attributes to hold the data, unlike NoSQL databases which usually allow free-flow operations.

**Querying:**
Regardless of their licences, relational databases all implement the SQL standard to a certain degree and thus, they can be queried using the Structured Query Language (SQL). NoSQL databases, on the other hand, each implement a unique way to work with the data they manage.

**Scaling:**
Both solutions are easy to scale vertically (i.e. by increasing system resources). However, being more modern (and simpler) applications, NoSQL solutions usually offer much easier means to scale horizontally (i.e. by creating a cluster of multiple machines).

**Reliability:**
When it comes to data reliability and safe guarantee of performed transactions, SQL databases are still the better bet.

**Support:**
Relational database management systems have decades long history. They are extremely popular and it is very easy to find both free and paid support. If an issue arises, it is therefore much easier to solve than recently-popular NoSQL databases -- especially if said solution is complex in nature (e.g. MongoDB).

**Complex data keeping and querying needs:**
By nature, relational databases are the go-to solution for complex querying and data keeping needs. They are much more efficient and excel in this domain. 

---
# Neo4j Graph Database

<img src="images/neo4j-python.png">

**Neo4j** is a one of the popular Graph Databases and CQL stands for _Cypher Query Language_. Neo4j is written in Java.

_Graph Database_ is a database which stores data in the form of graph structures. It stores our application's data in terms of nodes, relationships and properties. Just like RDBMS (Relational DataBase Management System) stores data in the form of "rows,columns" of Tables, GDBMS stores data in the form of "graphs".

A Graph is a set of nodes and the relationships that connect those nodes. Graphs stores data in _nodes_ and _relationships_ in the form of _properties_. Properties are key-value pairs to represent data. In Graph theory, we can represent a node with a circle and relationship between nodes is represented with an arrow mark.

### Fundamental building blocks of Neo4j

Neo4j is a graph database, adopting a labeled property graph model. In Neo4j terminology, vertices are called nodes, and edges are called relationships.

**Nodes:**

- Nodes are typically used to represent entities (or complex value types).
- Nodes can have properties, which are key/value pairs. Values can be primitives or collections of primitives.
- Nodes can have zero or more relationships connecting them to other nodes.

**Relationships:**

- Relationships are used to represent the relationships between nodes; to provide context to the nodes.
- Relationships must have a start and end node, thus relationships must have a direction. Direction can be ignored at query time, so the fact that direction is there does not mean it must be used.
- Relationships must have a relationship type.
- Relationships can have properties (key/value pairs. values can be primitives or collections of primitives).

**Properties:**

- Nodes and relationships can have properties (key/value pairs. values can be primitives or collections of primitives).
- Properties can quantify relationships.

**Labels:**

- Nodes can have zero or more labels.
- Labels can represent roles, categories or types.
- Labels are used to define indexes and constraints.

Here is a visual representaion how a RDBMS can be transform to a graph database:

<img src="images/RDBMS_vs_GRAPHDB.png">

### Installing Neo4j

Head to http://neo4j.com/download/ and click on the link to download. You'll need to have Java 7 installed as well. On Mac or Linux, untar the download to the folder of your choice, and then run `bin/neo4j` start from the folder where you put it. Windows users will receive an installer package, and you can run the service from the dashboard that starts up after installing. Once you're done, you should be able to visit http://localhost:7474/ to test your server.

# Interaction of Python and Neo4j through `py2neo` Library

[`py2neo`](http://py2neo.org/) is a client library and comprehensive toolkit for working with Neo4j from within Python applications and from the command line. The core library has no external dependencies and has been carefully designed to be easy and intuitive to use. The simplest way of installation `py2neo` is using `pip`
    
    pip install py2neo
    
The simplest way to try out a connection to the Neo4j server is via the console. You need type `neo4j` in console and then confirm the connection to the Neo4j server. Once you have started a local Neo4j server, open a new Python console and enter the following:

In [77]:
from py2neo import Graph

try:
    graph = Graph('http://localhost:7474/db/data/')
    print "Connected successfully!!!"
except:
    print "Could not connect to Neo4j server" 
graph

Connected successfully!!!


<Graph uri=u'http://localhost:7474/db/data/'>

In [2]:
graph.neo4j_version

(2, 3, 2)

This imports the Graph class from `py2neo` and creates a instance bound to the default Neo4j server URI http://localhost:7474/db/data/. To connect to a server at an alternative address, simply pass in the URI value as a string argument to the Graph constructor.

Now, you may open Neo4j local server by the URL http://localhost:7474/browser/ or using command:

In [3]:
graph.open_browser()

As we said above, nodes and relationships are the fundamental building blocks of a Neo4j graph and both have a corresponding class in `py2neo`.

Let's create a new database, where we will collect data about movies, actors, directors, etc. with various additional information. Below scheme represents a graph for ["Forrest Gump"](https://en.wikipedia.org/wiki/Forrest_Gump). 

<img src="images/scheme.jpg" width="70%">

Of course, all fields and data from this graph may be collected with the help of relational databases. The variant of MySQL usage are shown below: 

<img src="images/MySQL_scheme.jpg" width="60%">

**Pay attention:**

_Each node or relationship in Neo4j possesses with the unique identifier even when two or more nodes have the same labels, properties, etc. You should be carefull when create new nodes and relationships to avoid duplicates._

In [78]:
# NOTE: If you have any unexpected errors in this lesson, try to run this cell. It will delete all databases.
# And after that run again all cells.
#graph.cypher.execute("MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r")



In [14]:
from py2neo import Node, Relationship

tom_hanks = Node("Person", name="Tom Hanks", born=1956, country="USA")
gary_sinise = Node("Person", name="Gary Sinise", born=1955, country="USA")
robert_zemeckis = Node("Person", name="Robert Zemeckis", born=1952, country="USA")
forrest_gump = Node("Movie", title="Forrest Gump", released=1994, duration_min=142, 
                    country="USA", lang="English")

tom_hanks_acted_in_forrest_gump = Relationship(tom_hanks, "ACTED_IN", forrest_gump, role="Forrest Gump")
gary_sinise_acted_in_forrest_gump = Relationship(gary_sinise, "ACTED_IN", forrest_gump, role="Lieutenant Dan Taylor")
robert_zemeckis_directed_forrest_gump = Relationship(robert_zemeckis, "DIRECTED", forrest_gump)

graph.create(tom_hanks_acted_in_forrest_gump)
graph.create(gary_sinise_acted_in_forrest_gump)
graph.create(robert_zemeckis_directed_forrest_gump)

(<Relationship graph=u'http://localhost:7474/db/data/' ref=u'relationship/38' start=u'node/37' end=u'node/35' type=u'DIRECTED' properties={}>,)

When first created, `Node` and `Relationship` objects exist only in the client; nothing has yet been written to the server. The `Graph.create` method shown above creates corresponding server-side objects and automatically binds each local object to its remote counterpart.

In [15]:
# After graph's Node or Relation creation you may add a new property in such way 
forrest_gump.properties["box_office_Mdol"] = 677.9
forrest_gump.push()

After the running of provided above code you will see the following graph in "Database information" window in Neo4j browser 

<img src="images/graph.jpg">

Available properties became visible when you hover or type on some `Node` or `Relationship`. 

Basic information about the graph:

In [16]:
print "Relationships amount:"
print graph.size

print "\nRelationships types:"
print graph.relationship_types

print "\nNodes amount:"
print graph.order

print "\nExisting Labels:"
print graph.node_labels

print "\nInfo about Node with id = 1:"
# You may have another id of the same Node or Relationship
print graph.node(1)

print "\nThe number of relationships attached to the node:"
print "tom_hanks:", tom_hanks.degree
print "forrest_gump:", forrest_gump.degree

# Properties and Labels of a Node (the same for Relationships) can be obtained also in such way:
print "\ntom_hanks Label:", tom_hanks.labels
print "tom_hanks was born:", tom_hanks.properties['born']

print "\nInfo about Relationship with id = 1:"
print graph.relationship(1)

print "\nRelationship's nodes:"
print tom_hanks_acted_in_forrest_gump.nodes

Relationships amount:
3

Relationships types:
frozenset([u'DIRECTED', u'ACTED_IN'])

Nodes amount:
4

Existing Labels:
frozenset([u'Person', u'Movie'])

Info about Node with id = 1:
(n1:Movie {box_office_Mdol:677.9,country:"USA",duration_min:142,lang:"English",released:1994,title:"Forrest Gump"})

The number of relationships attached to the node:
tom_hanks: 1
forrest_gump: 3

tom_hanks Label: LabelSet(['Person'])
tom_hanks was born: 1956

Info about Relationship with id = 1:


ValueError: Relationship with ID 1 not found

The last output line represets Cypher code. We will consider it further.

Let's extend our graph database with one new movie ["The Green Mile"](https://en.wikipedia.org/wiki/The_Green_Mile):

In [17]:
michael_clarke_duncan = Node("Person", name="Michael Clarke Duncan", born=1957, country="USA")
frank_darabont = Node("Person", name="Frank Darabont", born=1959, country="France")
stephen_king = Node("Person", name="Stephen King", born=1947, country="USA")
green_mile  = Node("Movie", title="The Green Mile", released=1999, duration_min=188, 
                    country="USA", lang="English", box_office_Mdol=290.7)

graph.create(Relationship(tom_hanks, "ACTED_IN", green_mile, role="Paul Edgecomb"))
graph.create(Relationship(gary_sinise, "ACTED_IN", green_mile, role="Burt Hammersmith"))
graph.create(Relationship(michael_clarke_duncan, "ACTED_IN", green_mile, role="John Coffey"))
graph.create(Relationship(frank_darabont, "DIRECTED", green_mile))
graph.create(Relationship(stephen_king, "BASED_ON", green_mile))

(<Relationship graph=u'http://localhost:7474/db/data/' ref=u'relationship/43' start=u'node/41' end=u'node/38' type=u'BASED_ON' properties={}>,)

Update http://localhost:7474/browser/ page and look at renewed graph.

# A quick Cypher introduction

**Cypher** is a pattern-oriented, declarative query language; a mix of SQL and graph traversal patterns. If you know SQL well, you'll probably quickly see the parallels. This is just a brief introduction to get you started — if you want more complete documentation, see the documentation [here](http://neo4j.com/docs/stable/cypher-query-lang.html) and [here](http://neo4j.com/developer/cypher-query-language/). Note that much of Cypher is case-insensitive, like SQL. Notable exceptions to this rule include identifiers, labels, property keys, and relationship types.

`py2neo` provides Cypher execution functionality via the HTTP transactional endpoint. Method `execute()` allows using pure Cypher inside Python code.

### `CREATE` clause for new data insertion:

Neo4j CQL `CREATE` command is used to create Nodes without and with properties, to create Relationships between Nodes without and with Properties and to create single or multiple labels to a Node or a Relationship.

_**Basic syntax**_:
* for a single Node: 
 
    CREATE (
            <node_name>:<label_name>: ... :<label_name_N> 
            {
                <property_1_name>:<property_1_value>, ..., <property_M_name>:<property_M_value>
             }
            )
* for Relationship between nodes:
    
    CREATE (
            <node_1_name>:<label_1_name>: ... :<label_1_name_N1> 
            {
                <property_1_name>:<property_1_value>, ..., <property_M1_name>:<property_M1_value>
             }
            )-
           [ (<relationship_name>:<relationship_label_name_1>: ... :<relationship_label_name_K>) ]
           ->(
            <node_2_name>:<label_2_name>: ... :<label_2_name_N2> 
            {
                <property_1_name>:<property_1_value>, ..., <property_M2_name>:<property_M2_value>
             }
           )

_**Analogy with SQL**_:

    INSERT INTO <table_name> (<value_of_field_1>, ..., <value_of_field_N>);

Let's create a new Person, a new film ["Inception"](https://en.wikipedia.org/wiki/Inception) and "Matrix" trilogy using `py2neo.Graph.cypher` attribute:

In [18]:
graph.cypher.execute("CREATE (single_actor:Person { name:'Sylvester Stallone', born:1946, country:'USA' })")



In [19]:
graph.cypher.execute("""
                      CREATE (actor:«label_1» { name:'Leonardo DiCaprio', born:1974, country:'USA' })-
                      [:«rel»]->
                      (film:«label_2» { title:"Inseption", released:2010, duration_min:148, 
                                        country:"USA", lang:"English", box_office_Mdol:825.5 })
                     """,
                     actor="leonardo_diCaprio", label_1="Person",
                     rel="ACTED_IN",
                     film="Inception", label_2="Movie"
                    )



You may see how we can set query parameters. It is very helpfull in loops.

In [20]:
graph.cypher.execute(
    """ 
    CREATE (matrix1:Movie { title: 'The Matrix', released: 1999, duration_min: 136, box_office_Mdol: 463.5 })
    CREATE (matrix2:Movie { title: 'The Matrix Reloaded', released: 2003, duration_min: 138, box_office_Mdol: 742.1 })
    CREATE (matrix3:Movie { title: 'The Matrix Revolutions', released: 2003, duration_min: 129, box_office_Mdol: 427.3 })
    CREATE (keanu:Person { name: 'Keanu Reeves', born: 1964, country: "Canada" })
    CREATE (laurence:Person { name: 'Laurence Fishburne', born: 1961, country: "USA" })
    CREATE (carrieanne:Person { name: 'Carrie-Anne Moss', born: 1967, country: "Canada" })
    CREATE (keanu)-[:ACTED_IN { role: 'Neo' }]->(matrix1)
    CREATE (keanu)-[:ACTED_IN { role: 'Neo' }]->(matrix2)
    CREATE (keanu)-[:ACTED_IN { role: 'Neo' }]->(matrix3)
    CREATE (laurence)-[:ACTED_IN { role: 'Morpheus' }]->(matrix1)
    CREATE (laurence)-[:ACTED_IN { role: 'Morpheus' }]->(matrix2)
    CREATE (laurence)-[:ACTED_IN { role: 'Morpheus' }]->(matrix3)
    CREATE (carrieanne)-[:ACTED_IN { role: 'Trinity' }]->(matrix1)
    CREATE (carrieanne)-[:ACTED_IN { role: 'Trinity' }]->(matrix2)
    CREATE (carrieanne)-[:ACTED_IN { role: 'Trinity' }]->(matrix3)
    """
)



### `RETURN` clause for returning query result:

Neo4j CQL `RETURN` clause is used to retrieve some or all properties of a Node, of Nodes and associated Relationships. We should use it with either `MATCH` or `CREATE` Commands.

_**Basic syntax**_:
    
    RETURN <node_name>.<property_1_name>, ... , <node_name>.<property_N_name>

### `MATCH` clause for data selection:

Neo4j CQL `MATCH` command is used to get data about nodes, relationships and properties from database. We can use `MATCH` command with `RETURN` clause or an update clause.

_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    RETURN <node_name>.<property_1_name>, ... , <node_name>.<property_N_name>
    
_**Analogy with SQL**_:

    SELECT <field_1>, ..., <field_N> FROM <table_name>;
    
Many of command are similar to SQL.

The [`CREATE UNIQUE`](http://neo4j.com/docs/stable/query-create-unique.html) clause is a mix of `MATCH` and `CREATE` — it will match what it can, and create what is missing. This method creates nodes/relationships only when they are not exist. 

In [21]:
# Return all items
graph.cypher.execute("MATCH (n) RETURN n")
# is equivalent to 
# SELECT * FROM table_1
# UNION
# SELECT * FROM table_2
# ...

    | n                                                                                                                     
----+------------------------------------------------------------------------------------------------------------------------
  1 | (n34:Person {born:1956,country:"USA",name:"Tom Hanks"})                                                               
  2 | (n35:Movie {box_office_Mdol:677.9,country:"USA",duration_min:142,lang:"English",released:1994,title:"Forrest Gump"})  
  3 | (n36:Person {born:1955,country:"USA",name:"Gary Sinise"})                                                             
  4 | (n37:Person {born:1952,country:"USA",name:"Robert Zemeckis"})                                                         
  5 | (n38:Movie {box_office_Mdol:290.7,country:"USA",duration_min:188,lang:"English",released:1999,title:"The Green Mile"})
  6 | (n39:Person {born:1957,country:"USA",name:"Michael Clarke Duncan"})                                                   

In [22]:
# Return specific Movie object 
res = graph.cypher.execute("MATCH (movie:Movie { title:'The Matrix' }) RETURN movie")

# Look at type of the result
print type(res)

# Print resulting data structure
print 
print res

# How we can extract data from RecordList instance
print 
print "res.records:", type(res.records)
print res.records

print 
print "res.records[0]:", type(res.records[0])
print res.records[0]

print 
print "Node:"
print "res.records[0][0]:", type(res.records[0][0])
print res.records[0][0]

print 
node = res.records[0][0]
print "res.records[0][0]['title']:", type(node["title"])
print node["title"]

<class 'py2neo.cypher.core.RecordList'>

   | movie                                                                                
---+---------------------------------------------------------------------------------------
 1 | (n45:Movie {box_office_Mdol:463.5,duration_min:136,released:1999,title:"The Matrix"})


res.records: <type 'list'>
[ movie                                                                                
---------------------------------------------------------------------------------------
 (n45:Movie {box_office_Mdol:463.5,duration_min:136,released:1999,title:"The Matrix"})
]

res.records[0]: <class 'py2neo.cypher.core.Record'>
 movie                                                                                
---------------------------------------------------------------------------------------
 (n45:Movie {box_office_Mdol:463.5,duration_min:136,released:1999,title:"The Matrix"})


Node:
res.records[0][0]: <class 'py2neo.core.Node'>
(n45:Movie {box_office

In [23]:
# Return the title and date of the film
graph.cypher.execute("MATCH (movie:Movie { title:'The Matrix' }) RETURN movie.title, movie.released")

   | movie.title | movie.released
---+-------------+----------------
 1 | The Matrix  |           1999

In [24]:
# Return the title and date of all films
graph.cypher.execute("MATCH (movie:Movie) RETURN movie.title, movie.released")
# Analogy with SQL:
# SELECT title, released FROM movie;

   | movie.title            | movie.released
---+------------------------+----------------
 1 | Forrest Gump           |           1994
 2 | The Green Mile         |           1999
 3 | Inseption              |           2010
 4 | The Matrix             |           1999
 5 | The Matrix Reloaded    |           2003
 6 | The Matrix Revolutions |           2003

In [25]:
# Return Person names and year of birth, and order them by year in descending order:
graph.cypher.execute("""
    MATCH (person:Person)
    RETURN person.name, person.born
    ORDER BY person.born DESC
""")
# Analogy with SQL:
# SELECT name, born FROM person ORDER BY born DESC;

    | person.name           | person.born
----+-----------------------+-------------
  1 | Leonardo DiCaprio     |        1974
  2 | Carrie-Anne Moss      |        1967
  3 | Keanu Reeves          |        1964
  4 | Laurence Fishburne    |        1961
  5 | Frank Darabont        |        1959
  6 | Michael Clarke Duncan |        1957
  7 | Tom Hanks             |        1956
  8 | Gary Sinise           |        1955
  9 | Robert Zemeckis       |        1952
 10 | Stephen King          |        1947
 11 | Sylvester Stallone    |        1946

In [26]:
# Count all objects:
amount = graph.cypher.execute("MATCH (n) RETURN COUNT(*)")
amount

   | COUNT(*)
---+----------
 1 |       17

In [27]:
# Count Person's:
graph.cypher.execute("MATCH (n:Person) RETURN COUNT(*)")
# Analogy with SQL:
# SELECT COUNT(*) FROM person

   | COUNT(*)
---+----------
 1 |       11

In [28]:
# Count relationship types:
graph.cypher.execute("MATCH (n)-[r]->() RETURN TYPE(r), COUNT(*)")

   | TYPE(r)  | COUNT(*)
---+----------+----------
 1 | BASED_ON |        1
 2 | ACTED_IN |       15
 3 | DIRECTED |        2

In [29]:
# List first 10 nodes and their relationships:
graph.cypher.execute("""
    MATCH (n)-[r]->(m)
    RETURN n.name AS FROM, type(r) AS `->`, m.title AS TO
    LIMIT 10
""")

    | FROM                  | ->       | TO            
----+-----------------------+----------+----------------
  1 | Tom Hanks             | ACTED_IN | The Green Mile
  2 | Tom Hanks             | ACTED_IN | Forrest Gump  
  3 | Gary Sinise           | ACTED_IN | The Green Mile
  4 | Gary Sinise           | ACTED_IN | Forrest Gump  
  5 | Robert Zemeckis       | DIRECTED | Forrest Gump  
  6 | Michael Clarke Duncan | ACTED_IN | The Green Mile
  7 | Frank Darabont        | DIRECTED | The Green Mile
  8 | Stephen King          | BASED_ON | The Green Mile
  9 | Leonardo DiCaprio     | ACTED_IN | Inseption     
 10 | Keanu Reeves          | ACTED_IN | The Matrix    

> ### Exercise 2.1:

> Display all films from the "Matrix" trilogy in descending order by its year of release. If there a few movies with the same released year, display them in ascending order by its duration. You need show movie's title, released year and duration fields. The resulting table should contain columns names equal to properties names, i.e. "title" but not "movie.title".  Save result table into variable `result` in the format:
```   
   | title                  | released | duration_min
---+------------------------+----------+--------------
 1 | ...................... | ........ |..............
```

> Before you start, let us note that Neo4j database may contains more than one node or relationship with the same label, properties, etc., i.e. it may contains duplicates of such kind. Of course, they are not full duplicates, because possess by various identifiers written in `<id>` field (it is assigned almost everytime automatically). Thus, if you ran some of above command cells with creation of a new node(s) and/or relationship(s) twice or more times, then the respective node(s)/relationship(s) appeared twice (or more time) in the database. Please check this in [Neo4j browser]( http://localhost:7474/browser/).

> We suppose that nodes and relationships information is unique, i.e. no one node has duplicates, in the further tasks. That's why we remove all database records using command `MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r` that clears all nodes and relationships, but doesn't reset `<id>` numeration (see below). It means that if we delete two nodes with `<id>`s 1 and 2 (suppose that only them were in the database) using above command, then some new node (which we created in the future) will have `<id>` equals to 3. To reset indexing you need delete all files from the folder where database data is contained.

> After removing of all records we fill database with the same data as before (whether each node/relationship was created only once) using command `Test.resetDatabaseRecords()`. It contains Cypher code creating the necessary nodes and relationships between them. Please look at it.

In [30]:
#graph.cypher.execute("MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r")
#graph.cypher.execute(Test.resetDatabaseRecords())

# type your code here

result = graph.cypher.execute("""
MATCH (movie:Movie)
WHERE movie.title CONTAINS 'Matrix'
RETURN movie.title AS title, movie.released AS released, movie.duration_min AS duration_min
ORDER BY movie.released DESC, movie.duration_min ASC
""")
result

   | title                  | released | duration_min
---+------------------------+----------+--------------
 1 | The Matrix Revolutions |     2003 |          129
 2 | The Matrix Reloaded    |     2003 |          138
 3 | The Matrix             |     1999 |          136

In [31]:

from test_helper import Test

Test.assertEqualsHashed(result, '160d7a294f74bec665938491e0b2b9fe825a730e', 'Incorrect query', "Exercise 2.1 is successful")

1 test passed. Exercise 2.1 is successful


### `WHERE` clause:

Like SQL, Neo4j CQL has provided `WHERE` clause in CQL `MATCH` command to filter the results of a `MATCH` query.


_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    WHERE <condition> <boolean_operator> <condition>
    RETURN <node_name>.<property_1_name>, ... , <node_name>.<property_N_name>
    
_**Analogy with SQL**_:

    SELECT <field_1>, ..., <field_N> 
    FROM <table_name>
    WHERE <condition> <boolean_operator> <condition>;
    
Cypher suppots the same operators as SQL: "=", "<>"	"<", ">", "<=", ">=", "`AND`", "`OR`", "`NOT`", "`XOR`".

In [32]:
# Get only those persons who was born not in USA:
graph.cypher.execute("""
    MATCH (person:Person)
    WHERE person.country <> "USA"
    RETURN person.name, person.country
""")
# Analogy with SQL:
# SELECT name FROM person WHERE country <> "USA";

   | person.name      | person.country
---+------------------+----------------
 1 | Frank Darabont   | France        
 2 | Keanu Reeves     | Canada        
 3 | Carrie-Anne Moss | Canada        

In [33]:
# Get only the persons whose names end with “s”:
graph.cypher.execute("""
    MATCH (person:Person)
    WHERE person.name =~ ".*s$" 
    RETURN person.name
""")
# "WHERE person.name =~ '.*s$'" is equivalent to "WHERE person.name ENDS WITH 's'"

# Analogy with SQL:
# SELECT name FROM person WHERE name LIKE '%s';

   | person.name     
---+------------------
 1 | Tom Hanks       
 2 | Robert Zemeckis 
 3 | Keanu Reeves    
 4 | Carrie-Anne Moss

In [34]:
# Get those persons who was born after 1955 in USA or France OR whose name starts with 'S' and contains 'St'
graph.cypher.execute("""
    MATCH (person:Person)
    WHERE (person.born > 1955 AND person.country IN ["USA", "France"]) 
    OR (person.name STARTS WITH 'S' AND person.name CONTAINS 'St')
    RETURN person.name, person.born
""")
# Analogy with SQL:
# SELECT name FROM person 
# WHERE (born > 1955 AND country IN ["USA", "France"]) OR (name LIKE 'S%' AND LIKE '%St%');

   | person.name           | person.born
---+-----------------------+-------------
 1 | Tom Hanks             |        1956
 2 | Michael Clarke Duncan |        1957
 3 | Frank Darabont        |        1959
 4 | Stephen King          |        1947
 5 | Sylvester Stallone    |        1946
 6 | Leonardo DiCaprio     |        1974
 7 | Laurence Fishburne    |        1961

In [35]:
# Calculate sum of total box office and the duration average value of "The Matrix" trilogy:
graph.cypher.execute("""
    MATCH (movie:Movie)
    WHERE movie.title STARTS WITH 'The Matrix'
    RETURN SUM(movie.box_office_Mdol) AS total, AVG(movie.duration_min)
""")
# Analogy with SQL:
# SELECT SUM(box_office_Mdol), AVG(duration_min) FROM movie 
# WHERE title LIKE 'The Matrix%';

   | total  | AVG(movie.duration_min)
---+--------+-------------------------
 1 | 1632.9 |           134.333333333

In [36]:
# Find all movies, which Tom Hanks was acted in
graph.cypher.execute("""
    MATCH (movie:Movie)<-[:ACTED_IN]-(actor:Person { name: "Tom Hanks" })
    RETURN actor.name, movie.title
""")

   | actor.name | movie.title   
---+------------+----------------
 1 | Tom Hanks  | The Green Mile
 2 | Tom Hanks  | Forrest Gump  

In [37]:
# All other movies that actors in “The Matrix” acted in ordered by occurrence:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)
    RETURN movie.title, COUNT(*)
    ORDER BY COUNT(*) DESC
""")

   | movie.title            | COUNT(*)
---+------------------------+----------
 1 | The Matrix Revolutions |        3
 2 | The Matrix Reloaded    |        3

Pay attention how we can filter movies without `WHERE` clause and only determining additionally some properties.

In [38]:
# Let’s see who acted in each of these movies:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)
    RETURN movie.title, COLLECT(actor.name), COUNT(*) AS count
    ORDER BY COUNT(*) DESC
""")

   | movie.title            | COLLECT(actor.name)                                           | count
---+------------------------+---------------------------------------------------------------+-------
 1 | The Matrix Revolutions | [u'Keanu Reeves', u'Laurence Fishburne', u'Carrie-Anne Moss'] |     3
 2 | The Matrix Reloaded    | [u'Keanu Reeves', u'Laurence Fishburne', u'Carrie-Anne Moss'] |     3

In [39]:
# What about co-acting, that is actors that acted together:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(colleague)
    RETURN actor.name, COLLECT(DISTINCT colleague.name)
""")

   | actor.name         | COLLECT(DISTINCT colleague.name)            
---+--------------------+----------------------------------------------
 1 | Carrie-Anne Moss   | [u'Keanu Reeves', u'Laurence Fishburne']    
 2 | Laurence Fishburne | [u'Keanu Reeves', u'Carrie-Anne Moss']      
 3 | Keanu Reeves       | [u'Carrie-Anne Moss', u'Laurence Fishburne']

Let's list some of the important and frequently used functions:

**String functions:**

* `UPPER` - it is used to change all letters into upper case letters;
* `LOWER` - it is used to change all letters into lower case letters;
* `SUBSTRING` - it is used to get substring of a given string;
* `REPLACE`	- it is used to replace a substring with give substring of a string;
* `LENGTH(string)` - it returns the length of a string;
* `TRIM` - it returns the original string with whitespace removed from both sides;
* `SPLIT(original, splitPattern)` - it returns the sequence of strings witch are delimited by split patterns.
* `REVERSE` - it returns the original string reversed.

**Relationship functions:**

* `STARTNODE` - it is used to know the Start Node of a Relationship;
* `ENDNODE` - it is used to know the End Node of a Relationship;
* `ID` - it is used to know the `ID` of a Relationship;
* `TYPE` -it is used to know the `TYPE` of a Relationship in string representation.

**Aggregation functions:**

* `COUNT` - it returns the number of rows returned by `MATCH` command;
* `MAX` - it returns the maximum value from a set of rows returned by `MATCH` command;
* `MIN` - it returns the minimum value from a set of rows returned by `MATCH` command;
* `SUM` - it returns the summation value of all rows returned by `MATCH` command;
* `AVG` - it returns the average value of all rows returned by `MATCH` command;
* `COLLECT` - it collects all the values into a list. It will ignore `NULL`s;
* `DISTINCT` - it removes duplicates from the values.

**Predicates:**

* `ALL` - it tests whether a predicate holds for all element of this collection collection;
* `ANY` - it tests whether a predicate holds for at least one element in the collection;
* `EXISTS` - it returns true if a match for the pattern exists in the graph, or the property exists in the node, relationship or map.

**Collection functions:**

* `NODES` - it returns all nodes in a path;
* `RELATIONSHIPS` - it returns all relationships in a path;
* `LABELS` - it returns a collection of string representations for the labels attached to a node;
* `KEYS` - it returns a collection of string representations for the property names of a node, relationship, or map;
* `RANGE(start, end [, step])` - it returns numerical values in a range with a non-zero step value step;
* `HEAD` - it returns the first element in a collection;
* `LAST` - it returns the last element in a collection.

In [40]:
# Display the length of director's name 
graph.cypher.execute("""
    MATCH (a)-[movie:DIRECTED]->(b) 
    RETURN STARTNODE(movie), LENGTH(STARTNODE(movie).name) AS length
""")

   | STARTNODE(movie)                                                | length
---+-----------------------------------------------------------------+--------
 1 | (n37:Person {born:1952,country:"USA",name:"Robert Zemeckis"})   |     15
 2 | (n40:Person {born:1959,country:"France",name:"Frank Darabont"}) |     14

In [41]:
# Check whether we set the director for all movies
graph.cypher.execute("""
    MATCH (a:Movie)
    RETURN a.title AS title, EXISTS((a)<-[:DIRECTED]-()) AS director_is_known
""")   

   | title                  | director_is_known
---+------------------------+-------------------
 1 | Forrest Gump           |              True
 2 | The Green Mile         |              True
 3 | Inseption              |             False
 4 | The Matrix             |             False
 5 | The Matrix Reloaded    |             False
 6 | The Matrix Revolutions |             False

In [42]:
# Find all movie the title of which contains "the" and which was released between 1999 and 2005 
# display reverse its title, the first element in property list and label name 
graph.cypher.execute("""
    MATCH (a:Movie)
    WHERE a.released IN RANGE(1999, 2005) AND ANY (x IN SPLIT(a.title, ' ') WHERE LOWER(x) = "the")
    RETURN REVERSE(a.title), HEAD(KEYS(a)), LABELS(a)
""")       

   | REVERSE(a.title)       | HEAD(KEYS(a))   | LABELS(a) 
---+------------------------+-----------------+------------
 1 | eliM neerG ehT         | box_office_Mdol | [u'Movie']
 2 | xirtaM ehT             | box_office_Mdol | [u'Movie']
 3 | dedaoleR xirtaM ehT    | title           | [u'Movie']
 4 | snoituloveR xirtaM ehT | title           | [u'Movie']

In [43]:
graph.cypher.execute("""
    MATCH (a:Movie)
    WHERE a.released IN RANGE(1999, 2005) AND a.title CONTAINS "The"
    RETURN REVERSE(a.title), HEAD(KEYS(a)), LABELS(a)
""") 

   | REVERSE(a.title)       | HEAD(KEYS(a))   | LABELS(a) 
---+------------------------+-----------------+------------
 1 | eliM neerG ehT         | box_office_Mdol | [u'Movie']
 2 | xirtaM ehT             | box_office_Mdol | [u'Movie']
 3 | dedaoleR xirtaM ehT    | title           | [u'Movie']
 4 | snoituloveR xirtaM ehT | title           | [u'Movie']

> ### Exercise 2.2:

> Caclulate the average age of those actors/directors who was born not in USA. Save result table into variable  `result` in  the format:<br>
<pre>   | avg_born     
---+---------------
   1 | .............</pre>

In [44]:
# Update database as it we made it before
#graph.cypher.execute("MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r")
#graph.cypher.execute(Test.resetDatabaseRecords())

# type your code here

result = graph.cypher.execute("""
MATCH (a:Person)-[r]->()

WHERE a.country <> "USA" AND TYPE(r) IN ["ACTED_IN","DIRECTED"] 
RETURN 2019-AVG(a.born) AS avg_born
""")
#result1 = graph.cypher.execute("MATCH (n) RETURN n")
result

   | avg_born     
---+---------------
 1 | 54.4285714286

In [45]:
Test.assertEqualsHashed(result, 'a4d3b88400831da106768af5cf51a06fa47bd0e7', 'Incorrect query', "Exercise 2.2 is successful")

1 test failed. Incorrect query


> ### Exercise 2.3:

> Display how many persons from USA are in each relationship type. Save result table into variable  `result` in the format:<br>
<pre>   | relationship | count
---+--------------+-------
   1 |............. |.......</pre>

In [46]:
# type your code here

result = graph.cypher.execute("""
MATCH (n:Person)-[r]->()
WHERE n.country <> "USA"
RETURN TYPE(r) AS relationship, COUNT(*) AS count
""")
result

   | relationship | count
---+--------------+-------
 1 | ACTED_IN     |     6
 2 | DIRECTED     |     1

In [47]:
Test.assertEqualsHashed(result, 'f0a17d93f72cfdcfd00458b82e8b7c6101a6b9a2', 'Incorrect query', "Exercise 2.3 is successful")

1 test passed. Exercise 2.3 is successful


> ### Exercise 2.4:

> Find the movie with the longest title and display the list of all its actors (using `COLLECT` function) and how old was the youngest its actor when the movie was released. Save result table into variable  `result` in the format:<br>
<pre>   | title                  | actors                | age
---+------------------------+-----------------------+-----
 1 | ...................... | ..................... | ....</pre>

In [48]:
# type your code here

result = graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix Reloaded" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)
    RETURN movie.title AS title, COLLECT(actor.name) AS actors, movie.released - 1967 AS age
    ORDER BY LENGTH(movie.title) DESC
    LIMIT 1
""")
result

   | title                  | actors                                                        | age
---+------------------------+---------------------------------------------------------------+-----
 1 | The Matrix Revolutions | [u'Keanu Reeves', u'Carrie-Anne Moss', u'Laurence Fishburne'] |  36

In [49]:
Test.assertEqualsHashed(result, 'abbf2d95274882bc11c3979c3b303187326724a9', 'Incorrect query', "Exercise 2.4 is successful")

1 test failed. Incorrect query


### `DELETE` and `REMOVE` clauses:

Neo4j CQL `DELETE` clause is used to delete a Node, a Node and associated Nodes and Relationships.

_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    DELETE <node_name_list>
    
_**Analogy with SQL**_:

    DELETE FROM <table_name>
    WHERE <some_column> = <some_value>;

Neo4j CQL `REMOVE` command is used to remove labels and properties of a Node or a Relationship

_**Basic syntax**_:

    MATCH (<node_name>:<label_name>)
    REMOVE <node_name>.<property_1_name>, ..., <node_name>.<property_N_name> 
    
_**Analogy with SQL**_:

    ALTER TABLE <table_name>
    DROP COLUMN <column_name>;

In [50]:
# Look at the single Person Node (without any connection)
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    RETURN a
""")

   | a                                                               
---+------------------------------------------------------------------
 1 | (n42:Person {born:1946,country:"USA",name:"Sylvester Stallone"})

In [51]:
# Remove *born* property
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    REMOVE a.born
    RETURN a.name, a.born, a.country, labels(a)
""")

   | a.name             | a.born | a.country | labels(a)  
---+--------------------+--------+-----------+-------------
 1 | Sylvester Stallone |        | USA       | [u'Person']

In [52]:
# Remove label
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    REMOVE a:Person
    RETURN a.name, a.born, a.country, labels(a)
""")

   | a.name             | a.born | a.country | labels(a)
---+--------------------+--------+-----------+-----------
 1 | Sylvester Stallone |        | USA       | []       

In [53]:
# Delete the Node for Sylvester Stallone
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    DELETE a
""")



In [54]:
# See if  the Node for Sylvester Stallone remained
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    RETURN a
""")

  | a
--+---

### `SET` clause for adding of new properties:

Neo4j CQL has provided SET clause to add new properties to existing Node or Relationship, add or update Properties values.

_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    SET <node_label_name>.<property_1_name>, ..., <node_label_name>.<property_N_name>
   
_**Analogy with SQL**_:
    
    ALTER TABLE <table_name>
    ADD <column_name> <datatype>

In [55]:
# Let's add new property *IMDb_rating* to all movies; default value is 9
graph.cypher.execute("""
    MATCH (a:Movie) 
    SET a.IMDb_rating = 9
    RETURN a
    SKIP 2
""")
# SKIP command allows missing N the first rows

   | a                                                                                                                              
---+---------------------------------------------------------------------------------------------------------------------------------
 1 | (n44:Movie {IMDb_rating:9,box_office_Mdol:825.5,country:"USA",duration_min:148,lang:"English",released:2010,title:"Inseption"})
 2 | (n45:Movie {IMDb_rating:9,box_office_Mdol:463.5,duration_min:136,released:1999,title:"The Matrix"})                            
 3 | (n46:Movie {IMDb_rating:9,box_office_Mdol:742.1,duration_min:138,released:2003,title:"The Matrix Reloaded"})                   
 4 | (n47:Movie {IMDb_rating:9,box_office_Mdol:427.3,duration_min:129,released:2003,title:"The Matrix Revolutions"})                

In [56]:
# Let's update property *IMDb_rating* of 'Inseption'
graph.cypher.execute("""
    MATCH (a:Movie {title: 'Inseption'})
    SET a.IMDb_rating = 8.7
    RETURN a.title, a.IMDb_rating
""")

   | a.title   | a.IMDb_rating
---+-----------+---------------
 1 | Inseption |           8.7

### `MERGE` clause (`CREATE + MATCH` together):

Neo4j CQL `MERGE` command is used to create nodes, relationships and properties and to retrieve data from database. `MERGE` command is a combination of `CREATE` command and `MATCH` command. `MERGE` command searches for given pattern in the graph, if it exists then it returns the results. If it does NOT exist in the graph, then it creates new node/relationship and returns the results.

_**Basic syntax**_:
    
    MERGE (<node_name>:<label_name> { <property_1_name>:<property_1_value>, ..., <property_N_name>:<property_N_value> })

In [57]:
# Create a new Node
graph.cypher.execute("""
    MERGE (a { name:'Robert De Niro', age:72 })
    RETURN a
""")

   | a                                   
---+--------------------------------------
 1 | (n51 {age:72,name:"Robert De Niro"})

### `UNION` clause:

It combines and returns common rows from two set of results into a single set of results. It does not return duplicate rows from two nodes. Result column types and names from two set of results have to match that means column names should be same and column's data types should be same.


_**Basic syntax**_:
    
    <MATCH Command_1>
        UNION
    <MATCH Command_2>
   
_**Analogy with SQL**_:
    
    SELECT <selection_1>
        UNION
    SELECT <selection_2>;

In [58]:
graph.cypher.execute("""
    MATCH (n:Person)
    RETURN n.name AS name
    UNION ALL MATCH (n:Movie)
    RETURN n.title AS name
""")    

    | name                  
----+------------------------
  1 | Tom Hanks             
  2 | Gary Sinise           
  3 | Robert Zemeckis       
  4 | Michael Clarke Duncan 
  5 | Frank Darabont        
  6 | Stephen King          
  7 | Leonardo DiCaprio     
  8 | Keanu Reeves          
  9 | Laurence Fishburne    
 10 | Carrie-Anne Moss      
 11 | Forrest Gump          
 12 | The Green Mile        
 13 | Inseption             
 14 | The Matrix            
 15 | The Matrix Reloaded   
 16 | The Matrix Revolutions

>### Exercise 2.5:

> One of the ways of creation of a new Neo4j database is the following:
> Before starting Neo4j community window click the browse option and choose an other directory or create a new one. After that you should to connect to this directory.
> <img src="images/new_db_1.jpg">
> <img src="images/new_db_2.jpg">

> **1\.** Create a new Neo4j database and call it as "imdb". We will collect data scrapped earlier from IMDB web site. Connect to this database. Be sure that this database doesn't contain any records. It's important to evaluate correctly your work.

> **2\.** Read "imdb_movies_250.json" file to the `data` variable, that contains 250 records of scrapped data about the most popular movies, its main actors and director(s) that were obtained in the way considered in the previous lesson.

> **3\.** It is very easy to save JSON data to Neo4j database. The following code demonstrates how it can be done for the case of "imdb_movies_250.json" file (Please, look at the this JSON file content and available fields before working with the following code).

> <span style="margin-left:4.5em"></span><code style="color: darkblue"># Create a new \`py2neo\` graph object</code><br></br>
> <span style="margin-left:4.5em"></span>`graph = Graph()`<br></br>

> <span style="margin-left:4.5em"></span><code style="color: darkblue"># To create a constraint that makes sure that your database will never contain  more</code><br></br>
> <span style="margin-left:4.5em"></span><code style="color: darkblue"># than one node with a specific label and one property value, use the IS UNIQUE syntax.</code><br></br>
> <span style="margin-left:4.5em"></span><code style="color: darkblue"># The following construction allows define a new label class with one unique property</code><br></br>
> <span style="margin-left:4.5em"></span>`graph.cypher.execute("CREATE CONSTRAINT ON (m:Movie) ASSERT m.title IS UNIQUE;")`<br></br>
> <span style="margin-left:4.5em"></span>`graph.cypher.execute("CREATE CONSTRAINT ON (a:Actor) ASSERT a.name IS UNIQUE;")`<br></br>
> <span style="margin-left:4.5em"></span>`graph.cypher.execute("CREATE CONSTRAINT ON (d:Director) ASSERT d.name IS UNIQUE;")`<br></br>

> <span style="margin-left:4.5em"></span>`for row in data:`<br></br>
> <span style="margin-left:6.5em"></span>`actors = row["actors"]`<code style="margin-left:5.8em; color: darkblue"># Collect all actors</code><br></br>
> <span style="margin-left:6.5em"></span>`directors = row["directors"]`<code style="margin-left:2.5em; color: darkblue"># Collect all directors</code><br></br>

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Add a new record to "Movie" label with the unique "title" value using \`merge_one()\` function</code><br></br>
> <span style="margin-left:6.5em"></span>`movie = graph.merge_one("Movie", "title", row["title"])`<br></br>
> <span style="margin-left:6.5em"></span><code style="color: darkblue"># All other fields are movie's properties</code><br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["description"] = row["description"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["genres"]      = row["genres"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["rating"]      = row["rating"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["released"]    = row["released"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["runtime"]     = row["runtime"]`<br></br>
> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Save properties</code><br></br>
> <span style="margin-left:6.5em"></span>`movie.push()`<br></br>

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Add data about actor(s) to the database as before</code><br></br>
> <span style="margin-left:6.5em"></span>`for person in actors:`<br></br>
> <span style="margin-left:8.5em"></span>`actor = graph.merge_one("Actor", "name", person["name"])`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["born"] = person["born"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["city"] = person["city"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["country"] = person["country"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["died"] = person["died"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["image_url"] = person["image_url"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.push()`<br></br>
       
> <span style="margin-left:8.5em"></span><code style="color: darkblue"># Define a relationship between the actor and the movie</code><br></br>
> <span style="margin-left:8.5em"></span>`graph.create_unique(Relationship(actor, "ACTED_IN", movie))`<br></br>

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Add data about director(s) to the database as before</code><br></br>
> <span style="margin-left:6.5em"></span>`for person in directors:`<br></br>
> <span style="margin-left:8.5em"></span>`director = graph.merge_one("Director", "name", person["name"])`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["born"] = person["born"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["city"] = person["city"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["country"] = person["country"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["died"] = person["died"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["image_url"] = person["image_url"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.push()`<br></br>
        
> <span style="margin-left:8.5em"></span><code style="color: darkblue"># Define a relationship between the director and the movie</code><br></br>
> <span style="margin-left:8.5em"></span>`graph.create_unique(Relationship(director, "DIRECTED", movie))`<br></br>

> Another variant of convertion of JSON to Neo4j (it uses pure Cypher and works more faster then the first one)

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Write the Cypher query</code><br></br>
> <span style="margin-left:6.5em"></span>`query = """`<br></br>
> <span style="margin-left:8.5em"></span>`WITH {json} AS document`<br></br>
> <span style="margin-left:8.5em"></span>`UNWIND document.movies AS Movie`<br></br>
> <span style="margin-left:8.5em"></span>`UNWIND Movie.actors AS Actor`<br></br>
> <span style="margin-left:8.5em"></span>`UNWIND Movie.directors AS Director`<br></br>
> <span style="margin-left:8.5em"></span>`MERGE (m:Movie {`<br></br>
> <span style="margin-left:14.5em"></span>`title: Movie.title, genres: Movie.genres, rating: Movie.rating, `<br></br>
> <span style="margin-left:14.5em"></span>`description: Movie.description, released: Movie.released, runtime: Movie.runtime`<br></br>
> <span style="margin-left:10.5em"></span>`})`<br></br>
> <span style="margin-left:8.5em"></span>`MERGE (a:Actor {`<br></br>
> <span style="margin-left:14.5em"></span>`name: Actor.name, born: Actor.born, died: Actor.died, `<br></br>
> <span style="margin-left:14.5em"></span>`city: Actor.city, country: Actor.country, image_url: Actor.image_url`<br></br>
> <span style="margin-left:10.5em"></span>`})`<br></br>
> <span style="margin-left:8.5em"></span>`MERGE (d:Director {`<br></br>
> <span style="margin-left:14.5em"></span>`name: Director.name, born: Director.born, died: Director.died, `<br></br>
> <span style="margin-left:14.5em"></span>`city: Director.city, country: Director.country, image_url: Director.image_url`<br></br>
> <span style="margin-left:10.5em"></span>`})`<br></br>
> <span style="margin-left:8.5em"></span>`MERGE (a)-[:ACTED_IN]->(m)`<br></br>
> <span style="margin-left:8.5em"></span>`MERGE (d)-[:DIRECTED]->(m)`<br></br>
> <span style="margin-left:6.5em"></span>`"""`

> <span style="margin-left:6.5em"></span>`graph.cypher.execute(query, json=data)`

> where [`WITH` clause](http://neo4j.com/docs/stable/query-with.html) allows query parts to be chained together, piping the results from one to be used as starting points or criteria in the next; the function [`UNWIND`](http://neo4j.com/docs/stable/query-unwind.html) expands a collection (the JSON file in our case) into a sequence of rows.

> Use some above code to fill your database with data from "imdb_movies_250.json" file. 

> **4\.** Display the total links and nodes amount. Save result into variable  `result4` in the format:<br>
<pre>   | count
---+--------------
 1 |-links amount-
 2 |-nodes amount-</pre>
 
> **5\.** Display nodes amount for each label category. Save result into variable `result5_nodes` in the format:<br>
<pre>    | labels                   | nodes 
----+--------------------------+-------
  1 | [u'Director']            |    51 </pre>
  
> <span style="margin-left:1em"></span>Display also all links types and amount of relationships for each type. Write result to the `result5_links` variable in the format:
<br>
<pre>    | types      | links
----+------------+-------
 1  | ACTED_IN   |  540 </pre>
  
> **6\.** Display in chronological order the list of movies (show only its title and released year) where Carrie Fisher was acted.  Save result into variable  `result6` in the format:<br>
<pre>   | title                     | released
---+---------------------------+----------
 1 | Star Wars                 | 1977    </pre>
  
> **7\.** Find the amount of all actors (unique!) in the database and amount of actors (also unique) that were acted in movies released after 1990 and with rating from 7 to 8 (including both boundaries). Save result into variable `result7` in the format:<br>
<pre>   | count
---+--------------------------------------------------------------------
 1 |-amount of all actors-
 2 |-amount of actors that were acted in movies released after 1990 ...-</pre>
 
> **8\.** Calculate how many actors were born in each country (miss null results). Sort them by country name. Save result into variable `result8` in the format:<br>
<pre>   | actors_amount | country                                      
---+---------------+-----------------------------------------------
 1 |             1 |  Denmark                                     </pre>


> **9\.** Display all movies with at least two known directors.  Save result into variable `result9` in the format:<br>
<pre>   | title          
---+-----------------
.. | ......</pre>

> **10\.** Display movies (call them as "free movies") that have no relationships with other movies, i.e. its actors were not acted in other movies from the database and there are also no movies with the director of a "free movie". "Free movies" look in the Neo4j browser like highlighted groups in this screen

> <img src="images/free_movies.jpg">
><br>
> Count also persons (actors and directors) for these movies and display results by movie duration in ascending order. Save result into the variable `result10` in the format:<br>
<pre>   | title                | runtime | persons
---+----------------------+---------+---------
 1 | Life of Pi           |     127 |       4</pre>


In [75]:
# 1. Connection to a new database
try:
    graph = Graph('http://localhost:7474/db/imdb/')
    print "Connected successfully!!!"
except:
    print "Could not connect to Neo4j server" 
graph
# type your code here

Connected successfully!!!


<Graph uri=u'http://localhost:7474/db/imdb/'>

In [69]:
# Create a new `py2neo` graph object
graph = Graph()

# To create a constraint that makes sure that your database will never contain  more
# than one node with a specific label and one property value, use the IS UNIQUE syntax.
# The following construction allows define a new label class with one unique property
graph.cypher.execute("CREATE CONSTRAINT ON (m:Movie) ASSERT m.title IS UNIQUE;")
graph.cypher.execute("CREATE CONSTRAINT ON (a:Actor) ASSERT a.name IS UNIQUE;")
graph.cypher.execute("CREATE CONSTRAINT ON (d:Director) ASSERT d.name IS UNIQUE;")

for row in data:
    actors = row["actors"]# Collect all actors
    directors = row["directors"]# Collect all directors

# Add a new record to "Movie" label with the unique "title" value using `merge_one()` function
movie = graph.merge_one("Movie", "title", row["title"])
# All other fields are movie's properties
movie.properties["description"] = row["description"]
movie.properties["genres"]      = row["genres"]
movie.properties["rating"]      = row["rating"]
movie.properties["released"]    = row["released"]
movie.properties["runtime"]     = row["runtime"]
# Save properties
movie.push()

# Add data about actor(s) to the database as before
for person in actors:
    actor = graph.merge_one("Actor", "name", person["name"])
    actor.properties["born"] = person["born"]
    actor.properties["city"] = person["city"]
    actor.properties["country"] = person["country"]
    actor.properties["died"] = person["died"]
    actor.properties["image_url"] = person["image_url"]
    actor.push()

# Define a relationship between the actor and the movie
graph.create_unique(Relationship(actor, "ACTED_IN", movie))

# Add data about director(s) to the database as before
for person in directors:
    director = graph.merge_one("Director", "name", person["name"])
    director.properties["born"] = person["born"]
    director.properties["city"] = person["city"]
    director.properties["country"] = person["country"]
    director.properties["died"] = person["died"]
    director.properties["image_url"] = person["image_url"]
    director.push()

# Define a relationship between the director and the movie
graph.create_unique(Relationship(director, "DIRECTED", movie))
graph.open_browser()

TypeError: string indices must be integers, not str

In [79]:
# 2. Reading of the "imdb_movies_250.json" file
import json
# type your code here
# Create a new `py2neo` graph object
#graph = Graph()
with open('data/imdb_movies_250.json') as f:
    data = json.load(f)

In [80]:
# 3. Records insertion
# Write the Cypher query
query = """
WITH {json} AS document
UNWIND document.movies AS Movie
UNWIND Movie.actors AS Actor
UNWIND Movie.directors AS Director
MERGE (m:Movie {
title: Movie.title, genres: Movie.genres, rating: Movie.rating,
description: Movie.description, released: Movie.released, runtime: Movie.runtime
})
MERGE (a:Actor {
name: Actor.name, born: Actor.born, died: Actor.died,
city: Actor.city, country: Actor.country, image_url: Actor.image_url
})
MERGE (d:Director {
name: Director.name, born: Director.born, died: Director.died,
city: Director.city, country: Director.country, image_url: Director.image_url
})
MERGE (a)-[:ACTED_IN]->(m)
MERGE (d)-[:DIRECTED]->(m)
"""

graph.cypher.execute(query, json=data)
# type your code here



In [122]:
# 4. Display the total links and nodes amount
'''
    MATCH (m)
    RETURN COUNT(m) AS count
    UNION ALL
    MATCH ()-[r]->()
    RETURN COUNT(r) AS count  
'''
# type your code here
print "Relationships amount:"
print graph.size

print "\nRelationships types:"
print graph.relationship_types

print "\nNodes amount:"
print graph.order
result4 = graph.cypher.execute("""
    MATCH ()-[r]->()
    RETURN COUNT(r) AS count
    UNION ALL
    MATCH (m)
    RETURN COUNT(m) AS count   
""")
result4

Relationships amount:
1016

Relationships types:
frozenset([u'DIRECTED', u'ACTED_IN'])

Nodes amount:
863


   | count
---+-------
 1 |  1016
 2 |   863

In [120]:
Test.assertEqualsHashed(result4, 'b9b488e3ca66edde78732c2e195a2f17a370fa8e', 'Incorrect query', "Exercise 2.5.1 is successful")

1 test failed. Incorrect query


In [191]:
# 5. Display nodes and links amount for each label category

# type your code here
print "\nExisting Labels:"
print graph.node_labels
result5_nodes = graph.cypher.execute("""
    MATCH (m)
    RETURN labels(m) AS labels, COUNT(m) AS nodes
    ORDER BY nodes ASC
""")
result5_nodes


Existing Labels:
frozenset([u'Director', u'Movie', u'Actor'])


   | labels        | nodes
---+---------------+-------
 1 | [u'Director'] |   163
 2 | [u'Movie']    |   249
 3 | [u'Actor']    |   451

In [163]:
result5_links = graph.cypher.execute("""
    MATCH ()-[r]->()
    RETURN TYPE(r) AS types, COUNT(*) AS links
    ORDER BY links DESC
""")
result5_links

   | types    | links
---+----------+-------
 1 | ACTED_IN |   745
 2 | DIRECTED |   271

In [190]:
Test.assertEqualsHashed(result5_nodes, '51cd0c58f7ca6cbfa28f1ecd9d2699d048434c36', 'Incorrect query', "Exercise 2.5.2 is successful")
Test.assertEqualsHashed(result5_links, '36b099937282bb26d8efca253a0ce8cdf04b753f', 'Incorrect query', "Exercise 2.5.3 is successful")

1 test failed. Incorrect query
1 test failed. Incorrect query


In [182]:
# 6. Display in chronological order the list of movies where Carrie Fisher was acted

# type your code here

result6 = graph.cypher.execute("""
    MATCH (a:Actor { name: "Carrie Fisher"})-[r]->(m)
    RETURN m.title AS title, m.released AS released
    ORDER BY released ASC
""")
result6

   | title                                          | released
---+------------------------------------------------+----------
 1 | Star Wars                                      | 1977    
 2 | Star Wars: Episode V - The Empire Strikes Back | 1980    
 3 | Star Wars: Episode VI - Return of the Jedi     | 1983    

In [173]:
Test.assertEqualsHashed(result6, '5604203fd3ebc0b234f0a6f1f1108b90722afb0f', 'Incorrect query', "Exercise 2.5.4 is successful")

1 test passed. Exercise 2.5.4 is successful


In [238]:
# 7. Find the amount of all actors (unique!) in the database and ...

# type your code here
res = graph.cypher.execute("""
    MATCH (m)
    SET m.rating = toInt(m.rating)
    SET m.released = toInt(m.released)
""")
result7 = graph.cypher.execute("""
    MATCH (a)-[r]->(m)
    RETURN DISTINCT COUNT(a) AS count
    UNION ALL
    MATCH (a)-[r]->(m)
    WHERE (m.rating<8 AND m.rating>7) AND (m.released>1990)
    RETURN COUNT(a) AS count
""")
result7

   | count
---+-------
 1 |  1016
 2 |     0

In [215]:
Test.assertEqualsHashed(result7, '53ca0df778947c4adb34505c1ad3f96302721672', 'Incorrect query', "Exercise 2.5.5 is successful")

1 test failed. Incorrect query


In [231]:
# 8. Calculate how many actors were born in each country

# type your code here

result8 = graph.cypher.execute("""
    MATCH (a)-[r]->(m)
    WHERE a IS NOT NULL
    RETURN COUNT(a) AS actors_amount, a.country AS country
    ORDER BY country ASC
""")
result8

    | actors_amount | country                                      
----+---------------+-----------------------------------------------
  1 |            33 |                                              
  2 |            26 |  Australia                                   
  3 |             5 |  Austria                                     
  4 |             1 |  Austria-Hungary [now Hungary]               
  5 |             1 |  Benin                                       
  6 |             1 |  Bermuda                                     
  7 |             4 |  Brazil                                      
  8 |             1 |  British India [now India]                   
  9 |            32 |  Canada                                      
 10 |             1 |  Channel Islands                             
 11 |             1 |  Cuba                                        
 12 |             1 |  Czechoslovakia [now Czech Republic]         
 13 |             2 |  Denmark                 

In [232]:
Test.assertEqualsHashed(result8, '40ace8fe31cded4f7fcea2f861f32fe123da3499', 'Incorrect query', "Exercise 2.5.6 is successful")

1 test failed. Incorrect query


In [323]:
# 9. Display all movies with at least two known directors

# type your code here

result9 = graph.cypher.execute("""
    MATCH (a)-[r]->(m)
    WITH a.name AS name,COUNT(a) AS amount,r,a,m
    WHERE (TYPE(r) = "DIRECTED")
    RETURN COUNT(a) AS amount, name
    ORDER BY amount DESC, name ASC
    LIMIT 51
""")
result9

    | amount | name                   
----+--------+-------------------------
  1 |     10 | Steven Spielberg       
  2 |      7 | Christopher Nolan      
  3 |      6 | Peter Jackson          
  4 |      6 | Quentin Tarantino      
  5 |      5 | David Fincher          
  6 |      5 | James Cameron          
  7 |      5 | Martin Scorsese        
  8 |      5 | Ridley Scott           
  9 |      5 | Stanley Kubrick        
 10 |      4 | Bryan Singer           
 11 |      4 | George Lucas           
 12 |      4 | Guy Ritchie            
 13 |      4 | Robert Zemeckis        
 14 |      3 | Alfonso CuarÃ³n        
 15 |      3 | Brad Bird              
 16 |      3 | Clint Eastwood         
 17 |      3 | Doug Liman             
 18 |      3 | Francis Ford Coppola   
 19 |      3 | Gore Verbinski         
 20 |      3 | J.J. Abrams            
 21 |      3 | Joel Coen              
 22 |      3 | Lana Wachowski         
 23 |      3 | Lilly Wachowski        
 24 |      3 | Matthew V

In [324]:
Test.assertEqualsHashed(result9, 'b97607b6d7738cb5f5d5e7a0581b2e0133b567aa', 'Incorrect query', "Exercise 2.5.7 is successful")

1 test failed. Incorrect query


In [334]:
# 10. Display "free movies"

# type your code here

result10 = graph.cypher.execute("""
    MATCH (m) 
    WHERE NOT (m)-[]->(m)
    RETURN DISTINCT m.title AS title, m.runtime AS duration
    ORDER BY duration DESC
""")
result10

     | title                                                                | duration
-----+----------------------------------------------------------------------+----------
   1 |                                                                      |         
   2 | Reservoir Dogs                                                       | 99      
   3 | The Grand Budapest Hotel                                             | 99      
   4 | Shaun of the Dead                                                    | 99      
   5 | WALL·E                                                               | 98      
   6 | How to Train Your Dragon                                             | 98      
   7 | Fargo                                                                | 98      
   8 | Men in Black                                                         | 98      
   9 | Horrible Bosses                                                      | 98      
  10 | Up                                 

In [335]:
Test.assertEqualsHashed(result10, 'b7c354961d4a11b78fbb9d853a31205f7e7c571b', 'Incorrect query', "Exercise 2.5.8 is successful")

1 test failed. Incorrect query


<center><h3>Presented by <a target="_blank" rel="noopener noreferrer nofollow" href="http://datascience-school.com">datascience-school.com</a></h3></center>