# Advanced Neo4j Concepts


The following is based on http://neo4j.com/graphgist/6619085 and https://maxdemarzi.com/2015/08/26/modeling-airline-flights-in-neo4j/ and in particular on `:play http://guides.neo4j.com/modeling_airports`.

The original data is extrtacted from https://www.transtats.bts.gov/databases.asp?Mode_ID=1&Mode_Desc=Aviation&Subject_ID2=0

In [None]:
%%bash
docker run \
    -d --name neo4j \
    --rm \
    --publish=7474:7474 \
    --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/class \
    neo4j

## Recap on Modelling in Neo4j



When modeling data it is useful to have a use case of a system's application in mind. For example, we could start with the following question:

> As an air travel enthusiast
>
> I want to know how airports are connected
>
> So that I can find the busiest ones


Consequently, we could create the following model:

![initial_model](http://guides.neo4j.com/modeling_airports/img/initial.png)


### Manually creating the model

Before we start working with a large dataset let’s create some nodes and relationships manually. First we will create some airports:

In [14]:
CREATE (:Airport {code: "LAX"});
CREATE (:Airport {code: "LAS"});
CREATE (:Airport {code: "ABQ"});



We can find `LAX` by changing the `CREATE` to a `MATCH` and returning the matched node:

See https://www.world-airport-codes.com/ for the airport codes.

In [15]:
MATCH (lax:Airport {code: "LAX"})
RETURN lax

+-------------------------------------+
| lax                                 |
+-------------------------------------+
| (:Airport {_id_: 210, code: "LAX"}) |
+-------------------------------------+

1 row available after 10 ms, consumed after another 1 ms

### Create relationships

Now let’s create some connections between those airports.

In [16]:
MATCH (las:Airport {code: "LAS"})
MATCH (lax:Airport {code: "LAX"})
CREATE (las)-[connection:CONNECTED_TO {
  airline: "WN",
  flightNumber: "82",
  date: "2008-1-3",
  departure: 1715,
  arrival: 1820}]->(lax)





We can check that the relationship was created correctly by executing the following query:

In [17]:
MATCH connection = (las:Airport {code: "LAS"})-[:CONNECTED_TO]->(lax:Airport {code: "LAX"})
RETURN connection

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| connection                                                                                                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Airport {_id_: 211, code: "LAS"})-[:CONNECTED_TO {date: "2008-1-3", _id_: 1029, departure: 1715, airline: "WN", arrival: 1820, flightNumber: "82"}[211>210]]->(:Airport {_id_: 210, code: "LAX"}) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


### Create Relationships Idempotently

Idempotently, what is that? 

> *idempotent* ... denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself.

When using the `MERGE` command, we only need to inline the properties that make the `CONNECTED_TO` relationship unique. In this case it is the combination of airline, flightNumber, and date. To idempotently create a specific connection between airports we can run the following query:

In [18]:
MATCH (las:Airport {code: "LAS"})
MATCH (lax:Airport {code: "LAX"})
MERGE (las)-[connection:CONNECTED_TO { airline: "WN", flightNumber: "82", date: "2008-1-3"}]->(lax)
ON CREATE SET connection.departure = 1715, connection.arrival = 1820



Let’s try it with another connection to get the hang of it:

In [19]:
MATCH (las:Airport {code: "LAS"})
MATCH (lax:Airport {code: "ABQ"})
MERGE (las)-[connection:CONNECTED_TO { airline: "WN", flightNumber: "500", date: "2008-1-3"}]->(lax)
ON CREATE SET connection.departure = 1445, connection.arrival = 1710



Try running the query multiple times. The relationship will only be created once.


### Find All the Connections leaving an Airport

We can now find any connections leaving LAS:

In [20]:
MATCH connection = (las:Airport {code: "LAS"})-[:CONNECTED_TO]->(:Airport)
RETURN connection

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| connection                                                                                                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Airport {_id_: 211, code: "LAS"})-[:CONNECTED_TO {date: "2008-1-3", _id_: 1049, departure: 1445, airline: "WN", arrival: 1710, flightNumber: "500"}[211>212]]->(:Airport {_id_: 212, code: "ABQ"}) |
| (:Airport {_id_: 211, code: "LAS"})-[:CONNECTED_TO {date: "2008-1-3", _id_: 1029, departure: 1715, airline: "WN", arrival: 1820, flightNumber: "82"}[211>210]]->(:Airport {_id_: 210, code: "LAX"}

### Exploring data with LOAD CSV

While we are working out the appropriate model for our dataset it is much easier to work with a subset of the data so that we can iterate quickly. A smaller dataset containing the first 10,000 connections lives in `flights_initial.csv`, s https://github.com/neo4j-contrib/training/tree/master/modeling/data.

We can run the following query to see what data we’ve got to work with:

In [23]:
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/training/master/modeling/data/flights_1k.csv" AS row
RETURN row
LIMIT 5

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

This query:

  * loads the file flights_initial.csv
  * iterates over the file, referring to each line as the variable row
  * and returns the first 5 lines in the file

We have got lots of different fields but the ones that will be helpful for answering our question are: `Origin`, `Dest`, and `FlightNum`.


### Importing connections and airports

Run the following query to create nodes and relationships for these connections:

In [24]:
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/training/master/modeling/data/flights_1k.csv" AS row
MERGE (origin:Airport {code: row.Origin})
MERGE (destination:Airport {code: row.Dest})
MERGE (origin)-[connection:CONNECTED_TO {
  airline: row.UniqueCarrier,
  flightNumber: row.FlightNum,
  date: toInteger(row.Year) + "-" + toInteger(row.Month) + "-" + toInteger(row.DayofMonth)}]->(destination)
ON CREATE SET connection.departure = toInteger(row.CRSDepTime), connection.arrival = toInteger(row.CRSArrTime)



This query:

  * iterates through each row in the file
  * creates nodes with the Airport label for the origin and destination airports if they don’t already exist
  * creates a connection relationship between origin and destination airports for each row in the file

By default properties will be stored as strings. We know that year, month, and day are actually numeric values so we will coerce them using the toInteger function.

Now we are ready to start querying the data.

### Finding the most popular airports

We can see some of what we have imported by writing the following query, which finds the airports with the most outgoing connections.

This query:

  * finds every node with the `Airport` label
  * finds all outgoing `CONNECTED_TO` relationships
  * counts them up grouped by airport
  * returns the `Airport` nodes and the `outgoing` count in descending order by `outgoing`
  * limits the number of airports returned to `10`

In [13]:
MATCH (a:Airport)-[:CONNECTED_TO]->()
RETURN a, COUNT(*) AS outgoing
ORDER BY outgoing DESC
LIMIT 10

+-----------------------------------------------+
| a                                  | outgoing |
+-----------------------------------------------+
| (:Airport {_id_: 5, code: "LAS"})  | 241      |
| (:Airport {_id_: 12, code: "MDW"}) | 227      |
| (:Airport {_id_: 6, code: "LAX"})  | 122      |
| (:Airport {_id_: 11, code: "MCO"}) | 116      |
| (:Airport {_id_: 10, code: "MCI"}) | 72       |
| (:Airport {_id_: 15, code: "OAK"}) | 38       |
| (:Airport {_id_: 14, code: "MSY"}) | 35       |
| (:Airport {_id_: 13, code: "MHT"}) | 31       |
| (:Airport {_id_: 2, code: "ISP"})  | 28       |
| (:Airport {_id_: 4, code: "JAX"})  | 23       |
+-----------------------------------------------+

10 rows available after 64 ms, consumed after another 1 ms

### Exercise: Finding connections

Now it is your turn! Try and write queries to answer the following questions:

  * Find the airports that have the most incoming connections
  * Find all the connections into Las Vegas (LAS)
  * Find all the connections from Las Vegas (LAS) to Los Angeles (LAX)

**Hint:** Refer to the Cypher refcard (http://neo4j.com/docs/stable/cypher-refcard/) for Cypher Syntax.


In [14]:
MATCH (a:Airport)<-[:CONNECTED_TO]-()
RETURN a, COUNT(*) AS incoming
ORDER BY incoming DESC
LIMIT 10

+-----------------------------------------------+
| a                                  | incoming |
+-----------------------------------------------+
| (:Airport {_id_: 12, code: "MDW"}) | 56       |
| (:Airport {_id_: 17, code: "BWI"}) | 51       |
| (:Airport {_id_: 18, code: "PHX"}) | 49       |
| (:Airport {_id_: 5, code: "LAS"})  | 48       |
| (:Airport {_id_: 15, code: "OAK"}) | 42       |
| (:Airport {_id_: 22, code: "HOU"}) | 40       |
| (:Airport {_id_: 60, code: "DAL"}) | 33       |
| (:Airport {_id_: 24, code: "BNA"}) | 32       |
| (:Airport {_id_: 11, code: "MCO"}) | 30       |
| (:Airport {_id_: 34, code: "BUR"}) | 30       |
+-----------------------------------------------+

10 rows available after 23 ms, consumed after another 1 ms

In [15]:
MATCH (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport {code: "LAS"})
RETURN origin, destination, connection
LIMIT 10

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| origin                             | destination                       | connection                                                                                                               |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Airport {_id_: 15, code: "OAK"}) | (:Airport {_id_: 5, code: "LAS"}) | [:CONNECTED_TO {date: "2008-1-3", _id_: 998, departure: 1010, airline: "WN", arrival: 1135, flightNumber: "752"}[15>5]]  |
| (:Airport {_id_: 15, code: "OAK"}) | (:Airport {_id_: 5, code: "LAS"}) | [:CONNECTED_TO {date: "2008-1-3", _id_: 999, departure: 830, airline: "WN", arrival: 955, flightNumber: "762"}[15>5]]    |
| (:Airpor

In [16]:
MATCH (o:Airport {code: "LAS"})-[c:CONNECTED_TO]->(d:Airport {code: "LAX"})
RETURN o, d, c
LIMIT 10

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| o                                 | d                                 | c                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Airport {_id_: 5, code: "LAS"}) | (:Airport {_id_: 6, code: "LAX"}) | [:CONNECTED_TO {date: "2008-1-3", _id_: 151, departure: 1420, airline: "WN", arrival: 1525, flightNumber: "3917"}[5>6]] |
| (:Airport {_id_: 5, code: "LAS"}) | (:Airport {_id_: 6, code: "LAX"}) | [:CONNECTED_TO {date: "2008-1-3", _id_: 150, departure: 1025, airline: "WN", arrival: 1135, flightNumber: "3655"}[5>6]] |
| (:Airport {_id_: 5

## Refactoring and Profiling


### Finding specific connections


The model has worked well so far. We have been able to find the popular airports and find the connections between pairs of airports without much trouble.

What about if we want to find all the occurrences of a specific connection?

> As an air travel enthusiast
>
> I want to know the schedule for flight number
>
> So that I know when I will be able to spot those planes taking off and landing


Our next query finds all the instances of connection `WN 1016`:

In [17]:
MATCH  (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport)
WHERE connection.airline = "WN" AND connection.flightNumber = "1016"
RETURN origin.code, destination.code, connection.date, connection.departure, connection.arrival

+----------------------------------------------------------------------------------------------+
| origin.code | destination.code | connection.date | connection.departure | connection.arrival |
+----------------------------------------------------------------------------------------------+
| "MDW"       | "LAS"            | "2008-1-3"      | 740                  | 940                |
| "IND"       | "MDW"            | "2008-1-3"      | 715                  | 710                |
| "LAS"       | "SNA"            | "2008-1-3"      | 1010                 | 1110               |
+----------------------------------------------------------------------------------------------+

3 rows available after 24 ms, consumed after another 5 ms


It is still reasonably quick because we only have 1000 rows, but under the covers we’re actually doing a lot of unnecessary work.


We can *profile* our query by prefixing it with the `PROFILE` keyword:

> `PROFILE`
If you want to run the statement and see which operators are doing most of the work, use PROFILE. This will run your statement and keep track of how many rows pass through each operator, and how much each operator needs to interact with the storage layer to retrieve the necessary data. Please note that profiling your query uses more resources, so you should not profile unless you are actively working on a query. https://neo4j.com/docs/developer-manual/current/cypher/query-tuning/how-do-i-profile-a-query/


In [18]:
PROFILE
MATCH  (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport)
WHERE connection.airline = "WN" AND connection.flightNumber = "1016"
RETURN origin.code, destination.code, connection.date, connection.departure, connection.arrival

+-----------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits | Rows |
+-----------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 3.3" | "COST"  | "INTERPRETED" | 7    | 0      | 3    |
+-----------------------------------------------------------------------------------------+

+------------------+----------------+------+---------+-----------+---------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator         | Estimated Rows | Rows | DB Hits | Cache H/M | Identifiers                                                                             

What we get back is an execution plan which describes the Cypher operators used to execute this query. You can read more about these in the developer manual (https://neo4j.com/docs/developer-manual/current/cypher/#execution-plans)

In this one the query starts with a `NodeByLabelScan` on the `:Airport` label, which means that we first scanned all the airports. Next we followed the `FLIGHT` relationship to `origin` airports, and we can see from the estimated rows count that we followed 1000 of these.

In fact we actually looked at every single flight, which we can confirm by executing the following query:


In [19]:
MATCH ()-[:CONNECTED_TO]->()
RETURN count(*)

+----------+
| count(*) |
+----------+
| 1000     |
+----------+

1 row available after 10 ms, consumed after another 0 ms

So it is clear that our model is not optimal - we are doing far too much work just to find the destinations and origins of one flight.

It is time to **refactor** the model. The following is based on `:play http://guides.neo4j.com/modeling_airports/02_flight.html`


### Ensuring flight uniqueness

When we refactor the model we want to make sure we only create each flight once.

Neo4j allows us to create unique constraints to ensure uniqueness across a label/property pair, but at the moment we can only create constraints on single properties. We want to ensure uniqueness across several properties so we will combine those together into a single dummy property.

The combination of airline, flight number, and date makes a flight unique. As we saw in the previous section, however, some flights can have multiple legs so we will need to consider departure and arrival airports as well. We will create a flightId with this format: `{airline}{flightNumber}{year}-{month}-{day}_{origin}_{destination}`

Run the following query to create a unique constraint on the Flight/id label/property pair:


In [4]:
CREATE CONSTRAINT ON (f:Flight)
ASSERT f.id IS UNIQUE



### Refactoring - Creating flights

We are now ready to introduce `Flight` nodes to our data model. That is, we want to create a data model of the following kind:


![refactored_model](http://guides.neo4j.com/modeling_airports/img/flight_first_class.png)

Run the following query to create Flight nodes for every CONNECTED_TO relationship:

The following query

  * finds all `(origin, connection, destination)` paths
  * creates a `Flight` node if one doesn’t already exist
  * creates an `ORIGIN` relationship to the origin airport and a `DESTINATION` relationship to the destination airport

In [20]:
MATCH (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport)
MERGE (newFlight:Flight { id: connection.airline + connection.flightNumber + "_" + connection.date +  "_" + origin.code + "_" + destination.code }   )
ON CREATE SET newFlight.date = connection.date,
              newFlight.airline = connection.airline,
              newFlight.number = connection.flightNumber,
              newFlight.departure = connection.departure,
              newFlight.arrival = connection.arrival
MERGE (origin)<-[:ORIGIN]-(newFlight)
MERGE (newFlight)-[:DESTINATION]->(destination)



In [21]:
CALL db.schema()

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nodes                                                                                                                                                                                                          | relationships                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [(:Flight {name: "Flight", _id_: -4, indexes: ["number"], cons

### Find all the flights for flight number WN 1016

First let’s create an index on `(Flight, number)` so that we can quickly find the appropriate flights.

In [22]:
CREATE INDEX ON :Flight(number)



In [23]:
MATCH (origin)<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination)
WHERE flight.airline = "WN" AND flight.number = "1016"
RETURN origin, destination, flight

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| origin                             | destination                        | flight                                                                                                                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Airport {_id_: 12, code: "MDW"}) | (:Airport {_id_: 5, code: "LAS"})  | (:Flight {date: "2008-1-3", number: "1016", arrival: 940, _id_: 116, id: "WN1016_2008-1-3_MDW_LAS", departure: 740, airline: "WN"})    |
| (:Airport {_id_: 1, code: "IND"})  | (:Airport {_id_: 12, code: "MDW"}) | (:Flight {date: "2008-1-3", number: "1016", arrival: 710, _id_: 238, id:

Before we delete the `CONNECTED_TO` relationship we should profile the two versions of the query to see whether our refactoring has improved things.


In [9]:
PROFILE
MATCH (origin)<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination)
WHERE flight.airline = "WN" AND flight.number = "1016"
RETURN origin, destination, flight

+-----------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits | Rows |
+-----------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 3.3" | "COST"  | "INTERPRETED" | 4    | 0      | 3    |
+-----------------------------------------------------------------------------------------+

+-----------------+----------------+------+---------+-----------+-------------------------------------------------+-----------------------------------------------------+
| Operator        | Estimated Rows | Rows | DB Hits | Cache H/M | Identifiers                                     | Other                                               |
+-----------------+----------------+------+---------+-----------+-------------------------------------------------+-----------------------------------------------------+
| +ProduceResults |          

In [10]:
PROFILE
MATCH (origin:Airport)-[flight:CONNECTED_TO]->(destination:Airport)
WHERE flight.airline = "WN" AND flight.flightNumber = "1016"
RETURN origin, destination, flight

+-----------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits | Rows |
+-----------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 3.3" | "COST"  | "INTERPRETED" | 34   | 0      | 3    |
+-----------------------------------------------------------------------------------------+

+------------------+----------------+------+---------+-----------+-----------------------------+----------------------------------------------------+
| Operator         | Estimated Rows | Rows | DB Hits | Cache H/M | Identifiers                 | Other                                              |
+------------------+----------------+------+---------+-----------+-----------------------------+----------------------------------------------------+
| +ProduceResults  |             10 |    3 |       0 |       0/0 | destination, flight, o

# Refactoring Edges

`:play http://guides.neo4j.com/modeling_airports/03_flight_booking.html`


### Flight booking

Our system develops and we got a new requirement to satisfy:

> As a frequent traveller
> 
> I want to find flights from `origin` to `destination` on `date`
> 
> So that I can book my business flight

Before we write queries to satisfy this requirement, let’s import some more data.

### Import more flights

We initially loaded 1000 flights. That was a fun initial dataset to play with, but now that we have got a model we are happy with let’s load in a bit more data.

`flights_10k.csv` contains 10000 flights. We can run the following query to import those flights:

In [11]:
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/training/master/modeling/data/flights_10k.csv" AS row
MERGE (origin:Airport {code: row.Origin})
MERGE (destination:Airport {code: row.Dest})
MERGE (newFlight:Flight { id: row.UniqueCarrier + row.FlightNum + "_" + row.Year + "-" + row.Month + "-" + row.DayofMonth + "_" + row.Origin + "_" + row.Dest }   )
ON CREATE SET newFlight.date = toInteger(row.Year) + "-" + toInteger(row.Month) + "-" + toInteger(row.DayofMonth),
              newFlight.airline = row.UniqueCarrier,
              newFlight.number = row.FlightNum,
              newFlight.departure = toInteger(row.CRSDepTime),
              newFlight.arrival = toInteger(row.CRSArrTime)
MERGE (newFlight)-[:ORIGIN]->(origin)
MERGE (newFlight)-[:DESTINATION]->(destination)



Now it is time to write a query to find available flights between two airports on a specific date.

Let’s find all the flights going from Los Angeles (LAS) to Chicago Midway International (MDW) on the 3rd January. Run the following query:

This returns quite quickly but try prefixing it with `PROFILE`. 

What do you notice?

In [12]:
MATCH path = (o:Airport {code: "LAS"})<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(d:Airport {code: "MDW"})
WHERE flight.date = "2008-1-3"
RETURN path

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| path                                                                                                                                                                                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Airport {_id_: 5, code: "LAS"})<-[:ORIGIN {_id_: 1378}[251>5]]-(:Flight {date: "2008-1-3", number: "3232", arrival: 1845, _id_: 251, id: "WN3232_2008-1-3_LAS_MDW", departu

### Profiling the finding flights to book query

The query starts by using an index to find `MDW` but then has to traverse all incoming `DESTINATION` relationships and check the date property on the `:Flight` nodes on the other side. The more flights an airport has the more we will have to scan through, and since we are only working with 50000 flights we should probably find a better way to model our data before importing any more rows.

Can you think of a way that we can change our model to avoid doing all these property lookups?


One way that we can tweak our model to be more aligned with our queries is by bundling flights by day.


## Introducing Airport Day

We want to introduce `:AirportDay` nodes so that we do not have to scan through all the flights going from an airport when we are only interested in a subset of them.

Try and write a query to evolve our current model to include this new concept.


![](http://guides.neo4j.com/modeling_airports/img/airport_day.png)


Before we create anything let’s put a unique constraint on :AirportDay so we don’t create any duplicates:

In [24]:
CREATE CONSTRAINT ON (airportDay:AirportDay)
ASSERT airportDay.id IS UNIQUE



We’ll use the combination of origin and the flight date as our unique key for an `:AirportDay`

In [25]:
MATCH (origin:Airport)<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination:Airport)
MERGE (originAirportDay:AirportDay {id: origin.code + "_" + flight.date})
SET originAirportDay.date = flight.date
MERGE (destinationAirportDay:AirportDay {id: destination.code + "_" + flight.date})
SET destinationAirportDay.date = flight.date
MERGE (origin)-[:HAS_DAY]->(originAirportDay)
MERGE (flight)-[:ORIGIN]->(originAirportDay)

MERGE (flight)-[:DESTINATION]->(destinationAirportDay)
MERGE (destination)-[:HAS_DAY]->(destinationAirportDay)



### Find flights to book

Now let’s try finding those flights between Los Angeles and Chicago Midway International again. To recap, this was our original query:

In [None]:
PROFILE
MATCH path = (origin:Airport {code: "LAS"})<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination:Airport {code: "MDW"})
WHERE flight.date = "2008-1-3"
RETURN path

This is the equivalent query which makes use of `:AirportDay`

In [None]:
PROFILE
MATCH (origin:Airport {code: "LAS"})-[:HAS_DAY]->(:AirportDay {date: "2008-1-3"})<-[:ORIGIN]-(flight:Flight),
      (flight)-[:DESTINATION]->(:AirportDay {date: "2008-1-3"})<-[:HAS_DAY]-(destination:Airport {code: "MDW"})
RETURN *

`:play http://guides.neo4j.com/modeling_airports/04_specific_relationship_types.html`


  
  
  
# Modelling Guidelines


See the `Modeling Guidelines.pdf`




# Your Turn


Work with the following two tutorials:

`:play http://guides.neo4j.com/modeling_airports/04_specific_relationship_types.html`

`:play http://guides.neo4j.com/modeling_airports/05_refactoring_large_graphs.html`
  
  
  

# Spatial Queries

Neo4j with version greater 3.0 supports now functions to specify points in a 2D coordinate system and to calculate the geodesic distance between two points directly. https://neo4j.com/docs/developer-manual/current/cypher/functions/spatial/


Let's get some locations for our airports...

In [26]:
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/training/master/modeling/data/airports.csv" AS row
MATCH (a:Airport {code: row.iata_code})
SET a.latitude = toFloat(row.latitude_deg),
    a.longitude = toFloat(row.longitude_deg)



To compute the distance between two points, we have to generate `Point` objects out of the latitude and longitude properties, on which we can call the `distance` function.

In [27]:
MATCH (a:Airport)-[CONNECTED_TO]-(b:Airport)
WITH point({ longitude: a.longitude, latitude: a.latitude }) AS aPoint, point({ longitude: b.longitude, latitude: b.latitude }) AS bPoint
RETURN DISTINCT round(distance(aPoint, bPoint)) AS distance

+-----------+
| distance  |
+-----------+
| 3320164.0 |
| 1223392.0 |
| 928123.0  |
| 1308332.0 |
| 1108681.0 |
| 2555721.0 |
| 724371.0  |
| 1337388.0 |
| 261043.0  |
| 1352184.0 |
| 829285.0  |
| 2393544.0 |
| 3668627.0 |
| 1566797.0 |
| 1228996.0 |
| 1667164.0 |
| 353700.0  |
| 1763419.0 |
| 1697665.0 |
| 1776068.0 |
| 945937.0  |
| 1074619.0 |
| 1429890.0 |
| 578376.0  |
| 292143.0  |
| 1069229.0 |
| 514672.0  |
| 1311935.0 |
| 587443.0  |
| 779705.0  |
| 874761.0  |
| 1196578.0 |
| 380420.0  |
| 1245612.0 |
| 2080941.0 |
| 1279347.0 |
| 1831537.0 |
| 3279764.0 |
| 2444623.0 |
| 3787422.0 |
| 2412267.0 |
| 655071.0  |
| 3191024.0 |
| 3384909.0 |
| 411309.0  |
| 1986882.0 |
| 2601053.0 |
| 2551718.0 |
| 3462778.0 |
| 3498330.0 |
| 781992.0  |
| 3595542.0 |
| 1219085.0 |
| 1753086.0 |
| 3692440.0 |
| 838143.0  |
| 3193032.0 |
| 359125.0  |
| 2932697.0 |
| 2845977.0 |
| 1010496.0 |
| 938974.0  |
| 1299527.0 |
| 1585616.0 |
| 1766877.0 |
| 317085.0  |
| 1229040.0 |
| 3069997.0 |
| 3798

In [28]:
MATCH (a:Airport)-[CONNECTED_TO]-(b:Airport)
WITH point({ longitude: a.longitude, latitude: a.latitude }) AS aPoint, point({ longitude: b.longitude, latitude: b.latitude }) AS bPoint, a, b
WITH DISTINCT round(distance(aPoint, bPoint)) AS distance, a, b
ORDER BY distance DESC
WHERE distance / 1000 > 1000
RETURN distance, a.code, b.code

+-----------------------------+
| distance  | a.code | b.code |
+-----------------------------+
| 3798697.0 | "LAS"  | "PVD"  |
| 3798697.0 | "PVD"  | "LAS"  |
| 3787422.0 | "LAS"  | "MHT"  |
| 3787422.0 | "MHT"  | "LAS"  |
| 3692440.0 | "LAS"  | "BDL"  |
| 3692440.0 | "BDL"  | "LAS"  |
| 3668627.0 | "ISP"  | "LAS"  |
| 3668627.0 | "LAS"  | "ISP"  |
| 3665341.0 | "MHT"  | "PHX"  |
| 3665341.0 | "PHX"  | "MHT"  |
| 3595542.0 | "LAS"  | "ALB"  |
| 3595542.0 | "ALB"  | "LAS"  |
| 3498330.0 | "LAS"  | "PHL"  |
| 3498330.0 | "PHL"  | "LAS"  |
| 3462778.0 | "LAS"  | "ORF"  |
| 3462778.0 | "ORF"  | "LAS"  |
| 3384909.0 | "LAS"  | "BWI"  |
| 3384909.0 | "BWI"  | "LAS"  |
| 3320164.0 | "IAD"  | "LAS"  |
| 3320164.0 | "LAS"  | "IAD"  |
| 3279764.0 | "LAS"  | "MCO"  |
| 3279764.0 | "MCO"  | "LAS"  |
| 3256464.0 | "LAS"  | "RDU"  |
| 3256464.0 | "RDU"  | "LAS"  |
| 3193032.0 | "LAS"  | "BUF"  |
| 3193032.0 | "BUF"  | "LAS"  |
| 3191024.0 | "LAS"  | "TPA"  |
| 3191024.0 | "TPA"  | "LAS"  |
| 314910