# Graph Visualization

To visualize our graphs, we've prepared a completely rewritten version of [neo4jupyter](https://github.com/merqurio/neo4jupyter), which you can dive into [here](../graph_plot.py) if you wish.

In short, we've created two drawing functions that will configure and embed small HTML pages using [viz.js](http://visjs.org) to visualize nodes and edges.

Let's start by importing our script and run through some examples.
**Again, we are connecting to our Wikipedia database**

In [1]:
from py2neo import Graph

#################################################
# Update UPDATE-ME in the connection code with 
# The server you were assigned (see the schedule 
# notebook) to connect to using the 
# Links below.
#################################################
# Server 0 - neo4j.dsa.missouri.edu
# Server 1 - neo4j-1.dsa.missouri.edu
# Server 2 - neo4j-2.dsa.missouri.edu
# Server 3 - neo4j-3.dsa.missouri.edu
#################################################

graph = Graph("bolt://wikiread:wikireader@neo4j-1.dsa.missouri.edu:9000")

In [2]:
import sys
# Tell Python a new place it can look to import libraries
sys.path.append("..")
# The file path is ../graph_plot.py
from graph_plot import draw_r, draw_nr, small_options, large_options

<IPython.core.display.Javascript object>

In [3]:
# Let's dive right in and grab some data!
query="""
MATCH (a:Page {title: 'Neo4j'})-[r]->(b)
RETURN r
"""
data = graph.run(query).to_table()

The `draw_r` function iterates through the given relationships and draws them. 
It turns out that relationship objects contain both the start and end nodes, so nothing else is needed!

The second argument maps node labels to node titles. 
This lets us select which property is printed as the node's name. 
Here, we indicate that `Page` nodes should have their `title` property used as their label in the graph.

As it is, this will produce a simple graph with some physics enabled.
When you click on a node, the associated links will be highlighted.
Links also have _mouseover_ text indicating the pages they connect as well as their direction.

In [4]:
draw_r(data, {"Page": "title"})

Physics is controlled by the third argument and defaults to enabled.
If you've got more than 200 nodes, you'll start to experience some slowdown in your browser. 
**In larger graphs, you can lock up your browser tab.**
Make sure you have a good idea of what you're plotting before you plot it!

Below is an example with Physics disabled/

In [5]:
draw_r(data, {'Page': 'title'}, False)

### <span style="background:yellow">Your Turn</span>

Lookup some pages and plot graphs of their connections.  
Replace the `?????` with your desired page title.

In [6]:
# M4:P3:Q1

# Lookup and plot two pages and the nodes they link to
# -----------------
query="""
MATCH (a:Page {title: 'ACID' })-[r]->(b)
RETURN r
"""
data = graph.run(query).to_table()

# A length check should help prevent slowdowns
if not data:
    print("Page not found!")

draw_r(data, {'Page':'title'}, len(data) <= 150)

In [7]:
# Lookup and plot two pages and the nodes they link to
# -----------------
query="""
MATCH (a:Page {title: 'SQL' })-[r]->(b)
RETURN r
"""
data = graph.run(query).to_table()

# A length check should help prevent slowdowns
if not data:
    print("Page not found!")

draw_r(data, {'Page':'title'}, len(data) <= 150)

This is a good start, but it gets more interesting once we start to pull in the relationships of the nodes next to our own.

Here, we get the pages linked to from `Neo4j` and turn them into two lists (we also insert the Neo4j node into both lists).
We then unwind these lists and get the relationships between them.
This lets us get all relationships between our `b` nodes as well as with our `Neo4j` node.

If we ask for `(b)-[r]->(b)` without collecting and splitting it into two lists, Neo4j will iterate through `b` values and we end up asking only for self loops.

In [8]:
# NOTE: the COLLECT(b)+a is the part that 
#  adds (a:Page {title: 'Neo4j'}) to the list
query="""
MATCH (a:Page {title: 'Neo4j'})-->(b)
WITH COLLECT(b)+a AS left_set, COLLECT(b)+a as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN r
"""
data = graph.run(query).to_table()

draw_r(data, {'Page':'title'})

Graphs, after all, are all about connections!
These interconnections let us see relationships between our linked nodes we may not have known about.


We can see that some pages we link to are related to graph databases while other pages are related to Java.
Locations are pushed to one portion of the graph, with different connections between them.
We can even see a hierarchy of geographical regions.
The `North America` and `Europe` pages link to each other, and then `Europe` links to `Sweden`, which has links to and from `Malmö`.

Since `Neo4j` is the basis of our query, everything tends to revolve around it.
If we remove it from the lists we collect, our peer relationship graph looks *much* easier to understand.
The location nodes are almost completely disjoint from our software and technology nodes!

In [9]:
query="""
MATCH (a:Page {title: 'Neo4j'})-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN r
"""
data = graph.run(query).to_table()

draw_r(data, {'Page':'title'})

But, since we removed our central node and are only plotting relationships, any pages that only linked to/from `Neo4j` disappear!

We can tweak this with more list shenanigans and another drawing function.
`draw_nr` will let your draw a list of nodes and a list of relationships.

In [10]:
query="""
MATCH (a:Page {title: 'Neo4j'})-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
data = graph.run(query).to_table()

# We only get one row back from our query
# The first element is a list of nodes, the second is our relationships
draw_nr(data[0][0], data[0][1], {'Page':'title'})

Now we can see those lonely pages as well!

Once again, we've only been looking at outgoing nodes. Let's branch out a little more and plot the relationships between pages one hop away from Neo4j in either direction.

In [11]:
query="""
MATCH (a:Page {title: 'Neo4j'})<-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
data = graph.run(query).to_table()

# We only get one row back from our query
# The first element is a list of nodes, the second is our relationships
draw_nr(data[0][0], data[0][1], {'Page':'title'})

## <span style="background:yellow">Your Turn</span>

Lookup two more pages and plot the relationships between nodes zero degrees (1 hop) away from it.
Make note of any interesting relationships or clusters of pages you observe.

In [12]:
# M4:P3:Q2

# Lookup another  page and plot the relationships between
#  the nodes zero degrees away from it
# Note any interesting relationships or clusers of nodes you observe
#
# Note that this query can take awhile on very large pages.
# -----------------
query="""
MATCH (a:Page {title: 'SQL'})<-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
data = graph.run(query).to_table()

if not data:
    print("Page not found!")
draw_nr(data[0][0], data[0][1], {'Page':'title'}, len(data[0][0]) <= 150)

#### What interesting relationships can you note in your plot above?

In [13]:
# M4:P3:Q4

# Lookup another page and plot the relationships between
#  the nodes zero degrees away from it
# Note any interesting relationships or clusers of nodes you observe
#
# Note that this query can take awhile on very large pages.
# -----------------
query="""
MATCH (a:Page {title: 'ACID'})<-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
data = graph.run(query).to_table()

if not data:
    print("Page not found!")
draw_nr(data[0][0], data[0][1], {'Page':'title'}, len(data[0][0]) <= 150)

#### What interesting relationships can you note in your plot above?

----
Let's branch out another hop.
As you may recall, `Neo4j` starts out small enough, but once we get two hops out, we've reached 3% of Wikipedia!

The fourth argument to our drawing function is a configuration dictionary that lets us control [vis.js settings](http://visjs.org/docs/network/), 
and we've prepared two simple ones. 
By default, the drawing functions use `small_options`. 
Let's see what that contains:

In [14]:
small_options

{'edges': {'arrows': {'to': {'enabled': True, 'scaleFactor': 0.5}},
  'color': {'color': 'grey', 'highlight': '#404040'},
  'font': {'align': 'middle', 'size': 14},
  'smooth': {'enabled': True, 'type': 'dynamic'}},
 'nodes': {'font': {'size': '14'}, 'shape': 'ellipse', 'size': 25},
 'physics': {'enabled': False, 'solver': 'repulsion'}}

Our edges have arrows turned on at the destination end, scaled down slightly.
The edge color is set to grey, and its labels (which we aren't using) are middle aligned with size 14 font.
Edges are curved using the dynamic setting.

Our nodes are ellipses with font size 14. Ellipses are label-size controlled, so the size setting doesn't apply.

Physics are turned on, though the physics flag to the draw function overrides this, and uses the repulsion physics system.

It works well for small graphs. Let's see what we have for large graphs (thousands of nodes):

In [15]:
large_options

{'edges': {'arrows': {'to': {'enabled': True, 'scaleFactor': 0.5}},
  'color': {'color': 'grey', 'highlight': '#404040'},
  'font': {'align': 'middle', 'size': 14},
  'smooth': {'enabled': True, 'type': 'dynamic'}},
 'nodes': {'font': {'background': 'white', 'size': '14'},
  'scaling': {'label': {'enabled': True, 'max': 250, 'min': 14},
   'max': 800,
   'min': 10},
  'shape': 'dot',
  'size': 25},
 'physics': {'enabled': False, 'solver': 'repulsion'}}

We've tweaked the edge highlight color, enabled node scaling, and switched to dots.
In the background, the drawing functions track how many edges are attached to a given node, and this can be used for scaling.

Let's enable scaling on our smaller option set and see what that looks like.
Ellipses and label scaling is complicated, so let's just switch to dots for now.

Scaling is done by node value.
Each edge gives one value point to the nodes at either end.
Each node's percentage of the total value of all nodes is used to place it on the range between min and max.

In [16]:
small_scale = {
    'edges': {
        'arrows': {
            'to': {
                'enabled': True,
                'scaleFactor': 0.5
            }
        },
        'color': {
            'color': 'grey',
            'highlight': '#404040'
        },
        'font': {
            'align': 'middle',
            'size': 14
        },
        'smooth': {
            'enabled': True,
            'type': 'dynamic'
        }
    },
    'nodes': {
        'font': {
            'size': '14'
        },
        'shape': 'dot',
        'size': 25,
        'scaling': {
            'min': 10,
            'max': 50
        }
    },
    'physics': {
        'enabled': True,
        'solver': 'repulsion'
    }
}

In [17]:
query="""
MATCH (a:Page {title: 'Neo4j'})<-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
data = graph.run(query).to_table()

# We only get one row back from our query
# The first element is a list of nodes, the second is our relationships
draw_nr(data[0][0], data[0][1], {'Page':'title'}, True, small_scale)

### <span style="background:yellow">Your Turn</span>

Experiment with various configuration options on the page of your choosing.
Shapes, colors, whatever you want.
The documentation for nodes can be found [here](http://visjs.org/docs/network/nodes.html), and edges [here](http://visjs.org/docs/network/nodes.html).
Global settings can be found [here](http://visjs.org/docs/network/edges.html), and you can experiment with physics [here](http://visjs.org/examples/network/physics/physicsConfiguration.html).

You'll probably end up with a lot of JavaScript errors, but don't worry.
Rerunning the cell should fix everything.
JavaScript can be finnicky, and you may have had a typo in your settings dictionary.

When in doubt, use the classic Control+Z keys in the code cell to undo changes!

In [18]:
# M4:P3:Q6

# Experiment with visualization settings and plot a page and its connections in the cell below.
# --------------------
my_options = {
    'edges': {
        'arrows': {
            'to': {
                'enabled': True,
                'scaleFactor': 0.5
            }
        },
        'color': {
              'color': 'red',
              'highlight': 'green',
        },
        'font': {
            'align': 'middle',
            'size': 14
        },
        'smooth': {
            'enabled': True,
            'type': 'dynamic'
        }
    },
    'nodes': {
        'font': {
            'background': 'white', 
            'size': '14'},
        'shape': 'dot',
        'size': 25,
        'scaling': {
            'min': 10,
            'max': 100
        }
    },
    'physics': {
        'enabled': False,
        'solver': 'repulsion'
    }
}

In [19]:
query="""
MATCH (a:Page {title: 'Neo4j'})<-->(b)
WITH COLLECT(b) AS left_set, COLLECT(b) as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
data = graph.run(query).to_table()

draw_nr(data[0][0], data[0][1], {'Page':'title'}, True, my_options)

---
Let's take a look at all the pages and two hops from `Toudai temple`, in either direction, and their interconnections.
You may recall that [Toudai temple](https://en.wikipedia.org/wiki/T%C5%8Ddai-ji) was in our link analysis lab as a page with the fewest links.
Let's see if we can see what's going on.

**Note:** The number of links explodes beyond 2 hops, so be careful with your visualization choices!

In [20]:
from timeit import default_timer as timer
from copy import deepcopy
query="""
MATCH (a:Page {title: 'Toudai temple'})<-[*1..2]->(b)
WITH COLLECT(DISTINCT b)+a AS left_set, COLLECT(DISTINCT b)+a as right_set
UNWIND left_set AS left
UNWIND right_set AS right
MATCH (left)-[r]->(right)
RETURN left_set, collect(r)
"""
begin = timer()
data = graph.run(query).to_table()
end = timer()
print(end-begin)

# Copying complex, nested dictionaries can get messy, let's get our own fresh copy.
medium_options = deepcopy(large_options)
# Let's dial down the scaling a bit
medium_options['nodes']['scaling']['min'] = 10
medium_options['nodes']['scaling']['max'] = 50
medium_options['nodes']['scaling']['label']['min'] = 12
medium_options['nodes']['scaling']['label']['max'] = 24


draw_nr(data[0][0], data[0][1], {'Page':'title'}, False, medium_options)

1.626289292005822


These connections get pretty messy!
Notice that eight redirect pages place themselves on top of the node for `Tōdai-ji`, and one of them is `Toudai temple`!

Nothing links to `Toudai temple`, it must be around to help with searches.
`ō` isn't very easy to type on an English keyboard!
There may still be links to the other redirects we aren't seeing, but nothing within two hops link to them.

Even though we searched for everything one to two hops away, in reality we only got `Tōdai-ji` and everything attached to it, which is why it's at the center.

# Save your notebook, then `File > Close and Halt`

---