# D3 Graphing

-------------------------------------------------

The raw code for this Jupyter notebook is by default hidden for easier reading. The main focus of this particular page of the notebook is on the graphs and their interpretation. To toggle on/off the raw code, click below:

In [1]:
# Setup Code toggle button
from IPython.core.display import HTML  

HTML(''' 
<center><h3>
<a href="javascript:code_toggle()">Talk is cheap, show me the code.</a>
</center></h3>
<script>
    var code_show=true; //true -> hide code at first

    function code_toggle() {
        $('div.prompt').hide(); // always hide prompt

        if (code_show){
            $('div.input').hide();
        } else {
            $('div.input').show();
        }
        code_show = !code_show
    }
    $( document ).ready(code_toggle);
</script>
''')

In [2]:
# Setup notebook theme
from jupyterthemes import get_themes
from jupyterthemes.stylefx import set_nb_theme
set_nb_theme(get_themes()[1])

&nbsp;

## Get the Data

The data was stored in `graph_data.json`.

&nbsp;

In [68]:
import json

data_file = '../data/graph_data.json'
with open(data_file, 'r') as infile:
    data = json.load(infile)

data

{'links': [{'source': 'ryacca', 'target': 'shebin'},
  {'source': 'shebin', 'target': 'ryacca'},
  {'source': 'ryacca', 'target': 'phonedude_mln'},
  {'source': 'shebin', 'target': 'phonedude_mln'},
  {'source': 'ycpdan', 'target': 'CassPF'},
  {'source': 'ycpdan', 'target': 'internetarchive'},
  {'source': 'ycpdan', 'target': 'ruebot'},
  {'source': 'ycpdan', 'target': 'DataG'},
  {'source': 'ycpdan', 'target': 'liblaura'},
  {'source': 'ycpdan', 'target': 'taylor_amy'},
  {'source': 'ycpdan', 'target': 'ianmilligan1'},
  {'source': 'ycpdan', 'target': 'dchud'},
  {'source': 'ycpdan', 'target': 'WebSciDL'},
  {'source': 'ycpdan', 'target': 'AVArchivist'},
  {'source': 'AVArchivist', 'target': 'ycpdan'},
  {'source': 'ycpdan', 'target': 'smalljones'},
  {'source': 'ycpdan', 'target': 'abrennr'},
  {'source': 'ycpdan', 'target': 'archiveitorg'},
  {'source': 'ycpdan', 'target': 'phonedude_mln'},
  {'source': 'BexAnnalisa', 'target': 'internetarchive'},
  {'source': 'BexAnnalisa', 'targe

&nbsp;

## Graph

The standard graphing libraries are out when working with D3:


*    [Matplotlib](http://matplotlib.org/)
*    [Bokeh](http://bokeh.pydata.org/en/latest/)
*    [Seaborn](http://seaborn.pydata.org/)
*    [Lightning](http://lightning-viz.org/)
*    [Plotly](https://plot.ly/)
*    [Pandas built-in plotting](http://pandas.pydata.org/pandas-docs/stable/visualization.html)
*    [HoloViews](http://holoviews.org/)
*    [VisPy](http://vispy.org/)
*    [pygg](http://www.github.com/sirrice/pygg)

The question is:

> D3 is nontrivial to learn so what libraries are there?

To graph the data using [D3](https://github.com/d3/d3) the jupyter notebook can use [javascript magic](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/JavaScript%20Notebook%20Extensions.html) with `%%javascript`, and the [Pandas](http://pandas.pydata.org/) library supports output of dataframes to compatible json with `to_json` then use `from IPython.display import Javascript` to run it much like the code used to toggle code visibility at the beginning of this notebook. But there is also a [D3 magic](https://github.com/ResidentMario/py_d3) that can be used `py_d3`. But from `py_d3`:

> Force graphs don't work at all. Use ipython-d3networkx instead

The [ipython-d3networkx](https://github.com/jdfreder/ipython-d3networkx) library looks like it hasn't been updated for 2 years and it is not even on [PyPi](https://pypi.org/search/?q=ipython-d3networkx), and indeed it can not even be imported without errors about multiple packages being renamed or depreciated. So now on to [visJS2Jupyter](https://github.com/ucsd-ccbb/visJS2jupyter) a library that uses D3, maybe it is finally something useful. But it is not, even trying to import it gives an error and there is zero documentation.

```python
import visJS2jupyter.visualizations as visualizations
```

```python
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-6-dbc230bacbc2> in <module>()
      2 import matplotlib as mpl
      3 import networkx as nx
----> 4 import visJS2jupyter.visualizations

/usr/local/lib/python3.4/dist-packages/visJS2jupyter/visualizations.py in <module>()
     17 import pandas as pd
     18 from py2cytoscape import util
---> 19 import visJS_module as visJS_module
     20 
     21 def draw_graph_overlap(G1, G2,

ImportError: No module named 'visJS_module'
```

Nobody has time to deal with that. Finally something mature, it looks like [altair](https://github.com/altair-viz/altair) might actually work. It uses the [vega-lite](https://github.com/vega/vega-lite) JSON specification and D3 v4. From a very high level this looks like:

$$
\text{Altair (Dataframe)} \rightarrow \text{Vega-Lite (JSON)} \rightarrow \text{Vega} \rightarrow \text{D3}
$$

But then it does not do networks.

&nbsp;

#### Well, looks like it is time to just roll up the sleaves and do it yourself.

It is possible to [create D3 ready JSON using networkx](https://networkx.github.io/documentation/networkx-1.10/reference/readwrite.json_graph.html). So get the data in there and dump it to the right format. `networkx` will throw a fit if the data is left as is complaining about:

```python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-63-92d991454db7> in <module>()
     11                                image='profile_image_url',
     12                                source='source',
---> 13                                target='target'
     14                            )
     15                       )

/usr/local/lib/python3.4/dist-packages/networkx/readwrite/json_graph/node_link.py in node_link_graph(data, directed, multigraph, attrs)
    168             edgedata = dict((make_str(k), v) for k, v in d.items()
    169                             if k != source and k != target and k != key)
--> 170             graph.add_edge(mapping[src], mapping[tgt], ky, **edgedata)
    171     return graph

TypeError: list indices must be integers, not str
```

Well next time I'll know, but to fix this we can either do an `id` lookup for each, or just hash the usernames for a unique id. But it seems that it wants them to be `list` indices, so sequentially numbered starting from zero. Could add a new attribute to each node then do a lookup for each link.

&nbsp;

In [69]:
index = 0
for node in data['nodes']:
    node.update({'index': index})
    index += 1

for link in data['links']:
    target_done = False
    source_done = False
    for node in data['nodes']:
        if link['target'] == node['screen_name']:
            link['target'] = node['index']
            target_done = True
        if link['source'] == node['screen_name']:
            link['source'] = node['index']
            source_done = True
        if target_done and source_done:
            break

data

{'links': [{'source': 0, 'target': 1},
  {'source': 1, 'target': 0},
  {'source': 0, 'target': 634},
  {'source': 1, 'target': 634},
  {'source': 2, 'target': 15},
  {'source': 2, 'target': 98},
  {'source': 2, 'target': 111},
  {'source': 2, 'target': 310},
  {'source': 2, 'target': 329},
  {'source': 2, 'target': 357},
  {'source': 2, 'target': 415},
  {'source': 2, 'target': 435},
  {'source': 2, 'target': 459},
  {'source': 2, 'target': 479},
  {'source': 479, 'target': 2},
  {'source': 2, 'target': 488},
  {'source': 2, 'target': 517},
  {'source': 2, 'target': 585},
  {'source': 2, 'target': 634},
  {'source': 3, 'target': 98},
  {'source': 3, 'target': 142},
  {'source': 3, 'target': 285},
  {'source': 3, 'target': 567},
  {'source': 3, 'target': 578},
  {'source': 3, 'target': 634},
  {'source': 4, 'target': 57},
  {'source': 57, 'target': 4},
  {'source': 4, 'target': 98},
  {'source': 4, 'target': 131},
  {'source': 4, 'target': 183},
  {'source': 4, 'target': 349},
  {'sourc

In [70]:
import networkx as nx
from networkx.readwrite import json_graph

G = json_graph.node_link_graph(data,
                       directed=True,
                       attrs=dict(
                               id='id',
                               key='id',
                               name='name',
                               screen_name='screen_name',
                               image='profile_image_url',
                               source='source',
                               target='target'
                           )
                      )
# this d3 example uses the name attribute for the mouse-hover value,
# so add a name to each node
#for n in G:
#    G.node[n]['name'] = n
# write json formatted data
d = json_graph.node_link_data(G) # node-link format to serialize
# write json
json.dump(d, open('force/force.json','w'))

&nbsp;

Using `%%javascript` magic load the D3 requirements. Add an `element` to allow for embedding the chart.


&nbsp;

In [91]:
%%javascript

require.config({
    paths: {
        d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min'
    }
});

element.append("<center><div id='chart'></div></center><style> circle.node { stroke: #fff; stroke-width: 1.5px; } line.link { stroke: #999; stroke-opacity: .6; } </style>");

<IPython.core.display.Javascript object>

&nbsp;

Use string formatting to insert any variables into the javascript.

&nbsp;

In [90]:
from IPython.display import Javascript

# Modified from http://mbostock.github.com/d3/ex/force.html
Javascript("""
//Constants for the SVG
var w = 800,
    h = 800,
    //Set up the colour scale
    fill = d3.scale.category20();

//Append a SVG to the body of the html page. Assign this SVG as an object to svg
var vis = d3.select("#chart")
  .append("svg:svg")
    .attr("width", w)
    .attr("height", h);

//Set up the force layout
//Creates the graph data structure out of the json data
d3.json("force/force.json", function(json) {
  var force = d3.layout.force()
      .charge(-120)
      .linkDistance(30)
      .nodes(json.nodes)
      .links(json.links)
      .size([w, h])
      .start();

  //Create all the line svgs but without locations yet
  var link = vis.selectAll("line.link")
      .data(json.links)
    .enter().append("svg:line")
      .attr("class", "link")
      .style("stroke-width", function(d) { return Math.sqrt(d.value); })
      .attr("x1", function(d) { return d.source.x; })
      .attr("y1", function(d) { return d.source.y; })
      .attr("x2", function(d) { return d.target.x; })
      .attr("y2", function(d) { return d.target.y; });

  //Do the same with the circles for the nodes - no 
  var node = vis.selectAll("circle.node")
      .data(json.nodes)
    .enter().append("svg:circle")
      .attr("class", "node")
      .attr("cx", function(d) { return d.x; })
      .attr("cy", function(d) { return d.y; })
      .attr("r", 5)
      .style("fill", function(d) { return fill(d.group); })
      .call(force.drag);

  node.append("image")
      .attr("xlink:href", "https://github.com/favicon.ico")
      .attr("x", -8)
      .attr("y", -8)
      .attr("width", 16)
      .attr("height", 16);

  node.append("svg:title")
      .text(function(d) { return d.name; });

  vis.style("opacity", 1e-6)
    .transition()
      .duration(1000)
      .style("opacity", 1);

  //Now we are giving the SVGs co-ordinates - the force layout is generating 
  //the co-ordinates which this code is using to update the attributes of the SVG elements
  force.on("tick", function() {
    link.attr("x1", function(d) { return d.source.x; })
        .attr("y1", function(d) { return d.source.y; })
        .attr("x2", function(d) { return d.target.x; })
        .attr("y2", function(d) { return d.target.y; });

    node.attr("cx", function(d) { return d.x; })
        .attr("cy", function(d) { return d.y; });
  });
});
""")

<IPython.core.display.Javascript object>