### Storing a graph in Neo4j

With our graph database set up, and our methods for interacting with Neo4j written, we can start to
use Python and Neo4j to store and explore our graph data.
In this section, we will be looking at an air travel network between the US and Canada and analyzing
its properties to find efficient routes between locations.

#### Preprocessing data
To begin, let’s take a look at our data (sourced from Stanford University: https://snap.
stanford.edu/data/reachability.html). We have two files, reachability_250.
txt and reachability-meta.csv.

In [2]:
import csv
with open('./data/reachability_250.txt', 'r') as txt:
    reader = csv.reader(txt, delimiter=' ')
    edges = [edge for edge in reader][6:]
    edges = [[edge[0], edge[1], int(edge[2])*-1] for edge in edges]
    
with open('./data/reachability.csv', 'w', newline='') as c:
    writer = csv.writer(c)
    for edge in edges:
        writer.writerow(edge)

#### Moving nodes, edges, and properties to Neo4j

We will be using Python to write to our Neo4j database. We will need to first write some Cypher
queries that we will later get Python to send to Neo4j.
At this stage, we should also make sure that our Neo4j database is empty; if not, the MATCH (n)
DETACH DELETE n query can be run in the Neo4j Browser window before starting.

Our Cypher scripts will read data from CSV files, which on a local installation of Neo4j should be in
a specific folder that Neo4j can access. Follow these steps to get the data into Neo4j:

Now that Neo4j can read the files, let’s write some Cypher to access their content. We can use
LOAD CSV WITH HEADERS to load the reachability-meta.csv data, and the as
function tells Neo4j to run the remaining code for each csv line, where each line is now named
row. The headers are read in and can be used to access the corresponding csv columns, which
we reference in CREATE to build nodes with the correct properties. Node IDs and population
numbers are turned into integers with toInteger(), while our latitude and longitude
attributes are converted into floats using toFloat(). When run from the Neo4j Browser,
the following script will load the node attributes file:

In [None]:
LOAD CSV WITH HEADERS FROM 'file:///reachability-meta.csv' as
row
CREATE (city:City {
    node_id: toInteger(row.node_id),
    name: row.name,
    population: toInteger(row.metro_pop),
    latitude: toFloat(row.latitude),
    longitude: toFloat(row.longitude)
})

Next, we will use Cypher again to load in the edges file. Unlike our node properties file,
reachability.csv doesn’t contain headers, so we use LOAD CSV to import the data into
Neo4j. Our two MATCH functions find the corresponding nodes for our edge using slice notation
(e.g., [0]) to refer to the csv columns. We can also use Cypher’s toInteger() function to
convert our node attributes into integers, for both matching and merging. This identifies the
nodes that we loaded in the previous step. Then, a MERGE step adds an :AIR_TRAVEL edge
between these nodes, with travel_time added as an attribute from the third csv column:

In [None]:
LOAD CSV FROM 'file:///reachability.csv' as row
MATCH (from:City {node_id: toInteger(row[0])})
MATCH (to:City {node_id: toInteger(row[1])})
MERGE (from) - [:AIR_TRAVEL {travel_time: toInteger(row[2])}] - (to)

With our data loaded into Neo4j, we should now confirm that it is as expected and try to access it
from Python. To perform Cypher queries in Python, we will again be using the Neo4jConnect
class we prepared earlier in this chapter.

In [3]:
from graphtastic.database.neo4j import Neo4jConnect

connection = Neo4jConnect('bolt://localhost:7687', 'admin', 'testpython')
cypher = 'MATCH (n) RETURN COUNT(n)'
result = connection.query(cypher).data()
connection.close()
print(result)

[{'COUNT(n)': 456}]


The result show that there are 456 nodes in the graph.

Next, let’s ensure that an edge we know should be present in Neo4j was added correctly. We can
use the same pattern as in the node-counting example to run a Cypher query from Python, but
this time with alternative Cypher code. The edge on the first line of reachability.csv is
between node ID 57 and node ID 0. This time, for our test, let’s create a list of two elements
to represent this edge. We will refer to these elements in our Cypher query:

In [5]:
edge_test = [57, 0]

cypher = f'MATCH (n:City {{node_id:{edge_test[0]}}})' \
'-[r:AIR_TRAVEL]->' \
f'(m:City {{node_id:{edge_test[1]}}}) ' \
'RETURN n.name, m.name, r.travel_time'
print(cypher)

MATCH (n:City {node_id:57})-[r:AIR_TRAVEL]->(m:City {node_id:0}) RETURN n.name, m.name, r.travel_time


In [6]:
connection = Neo4jConnect('bolt://localhost:7687', 'admin', 'testpython')
result = connection.query(cypher).data()
connection.close()
print(result)

[{'n.name': 'Calgary, AB', 'm.name': 'Abbotsford, BC', 'r.travel_time': 84}]


From the print statement, you should see that we matched one edge, from Calgary, AB, to
Abbotsford, BC, with a travel time of 84. Try changing the node IDs in edge_test and
rerunning to check for a few more edges present in reachability.csv.