This notebook is a visualization of the collective interconnectedness of the top 50 largest United States companies, using data from the 2015 iteration of Fortune magazine's annual [Fortune 500](http://fortune.com/fortune500/) list and measurements of corporate interdependence as reported by IBM Watson's Concept Insight service. This graph was put together using `watsongraph` for querying, `pandas` for data manipulation, and `d3.js` for visualization.

This visualization is a simple proof-of-concept of the use of the IBM Watson Cognitive Insights service via the `watsongraph` library for modeling things. Companies, in this case, or blocks of cheese, or planetary moons: whatever the case may be, with a little bit of work in extracting and cleaning your data you can probably visualize it.

**Caveat**: This visualization is limited at the moment to the top 50 companies because of certain [scaling issues](https://github.com/ResidentMario/watsongraph/issues/8#issuecomment-173703228) in the `watsongraph` library. Once those are resolved, the sky---or the ceiling on visual clarity, whichever comes first---is the limit!

## Data Munging

In [85]:
from pandas import DataFrame
import pandas as pd

# Use Pandas DataFrame manipulations to import and slice up the data the way we want it.
frame = pd.read_csv("fortune500.csv")
frame = frame.ix[:, ['company', 'industry']]
# industries = list(set(list(frame.ix[:, 'industry'])))
companies = list(frame.ix[:, 'company'])

In [40]:
from watsongraph.node import conceptualize

# Map Fortune 500's company names to their Wikipedia article titles using watsongraph.node.conceptualize.
nodes = [conceptualize(company) for company in companies]

In [86]:
# Attach the nodes to our frame.
frame['node'] = nodes
frame = frame[['company', 'node', 'industry']]

In [87]:
# Unfortunately a few of the data points are lost during conceptualization. This is principally the result of the fact
# that the underlying Concept Insights graph uses an "image" of Wikipedia from 2011, which is missing some data on 2015.
# It's easiest to simply filter these out.
frame = frame[pd.isnull(frame['node']) == False]

In [103]:
from watsongraph.conceptmodel import ConceptModel

# Import these companies into a ConceptModel object.
# For now we will work with the top 50. cf. https://github.com/ResidentMario/watsongraph/issues/8
model = ConceptModel(list(frame['node'])[:50])
model.explode_edges(prune=True)

In [109]:
# On manual inspection Watson mistook Anthem the health insurance company for Anthem the band.
# Again it's easiest to filter this out. All of the other output looks good.
model.remove("Anthem (band)")

In [198]:
# Augment the model nodes with industry and rank, using the DataFrame elements.

def get_industry(company):
    return list(frame.ix[frame['node'] == company, 'industry'])[0]

def get_rank(company):
    return int(frame.ix[frame['node'] == company, 'industry'].index + 1)

model.map_property("industry", lambda company: get_industry(company))
model.map_property("rank", lambda company: get_rank(company))

In [320]:
import json

# Save the data to disk, to keep from having to rerun all of the burdensome queries above when reloading the notebook.
with open('model.json', 'w') as file:
    file.write(json.dumps(model.to_json(), indent=4))

## Visualization

In [318]:
%%javascript

require.config({
    paths: {
        d3: '//d3js.org/d3.v3.min.js',
//         d3tip: '//labratrevenge.com/d3-tip/javascripts/d3.tip.v0.6.3.js'
    }
});

<IPython.core.display.Javascript object>

In [285]:
%%html
<style>

.node {
  stroke: #fff;
  stroke-width: 1.5px;
}

.link {
  stroke: #999;
  stroke-opacity: 1;
}

</style>

In [317]:
%%javascript

// TODO: d3-tip; cf. http://bl.ocks.org/Caged/6476579
require(['d3'], function(d3){
    /*
        Jupyter notebooks save their state between runtimes
        so the chart needs to be explicitly destroyed and
        recreated every time the code is run.
    */
    $("#chart").remove();
    element.append("<div id='chart' style='text-align:center;'></div>");


    var width = 960,
        height = 500;

    var color = d3.scale.category20();

    var force = d3.layout.force()
        .charge(-120)
        .linkDistance(30)
        .size([width, height]);

    var svg = d3.select("#chart").append("svg")
        .attr("width", width)
        .attr("height", height);

    d3.json("model.json", function(error, graph) {
        if (error) throw error;

        force
            .nodes(graph.nodes)
            .links(graph.links)
            .start();

        var scale = d3.scale.linear().domain([.5, 1]).range([1, 3])

        var color_ramp = d3.scale.linear().domain([.5,1]).range(["#ccc","#333"]);

        var link = svg.selectAll(".link")
            .data(graph.links)
            .enter().append("line")
            .attr("class", "link")
            .style("stroke-width", function(d) { return scale(d.weight); })
            .style("stroke", function(d) { return color_ramp(d.weight); })

        var node = svg.selectAll(".node")
            .data(graph.nodes)
            .enter().append("circle")
            .attr("class", "node")
            .attr("r", 5)
            .style("fill", function(d) { return color(d.industry); })
            .on("mouseover", function() {
                d3.select(this)
                .style("stroke", "#000")
            })
            .on("mouseout", function() {
                d3.select(this)
                .style("stroke", "#fff");
            })
            .call(force.drag);

        node.append("title")
            .text(function(d) { return d.id; });

        force.on("tick", function() {
            link.attr("x1", function(d) { return d.source.x; })
                .attr("y1", function(d) { return d.source.y; })
                .attr("x2", function(d) { return d.target.x; })
                .attr("y2", function(d) { return d.target.y; });

            node.attr("cx", function(d) { return d.x; })
                .attr("cy", function(d) { return d.y; });
        });
    });
});

<IPython.core.display.Javascript object>