# Using Interactive [d3](http://d3js.org/) Graphics in the Notebook

In conducting research, it can be useful to aid comprehension through dynamic visualization.  The use case I envision has to do with simulation and trying to understand how the agents interact, but it certainly could extend to a much broader range of applications.  In any event, since the Notebook is just a web page, we can leverage javascript to make this happen.  I don't know much about d3 at this point.  This Notebook will work through an [example](http://www.machinalis.com/blog/embedding-interactive-charts-on-an-ipython-nb/), and then serve as a scratch pad for other visualizations.

In [33]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
from IPython.display import display, display_pretty,HTML,Javascript
import jinja2

## Population by State

Looks like we are going to start simple, with population by state.  The data we are using will come from the [Incorporated Places and Minor Civil Divisions Datasets](http://www.census.gov/popest/data/cities/totals/2012/SUB-EST2012.html) provided by the Census.  They capture subcounty resident populations between 2010 and 2012.  In general, we will gloss over the pandas stuff because ... well, it's a tutorial for me and I know how to use pandas.  Let's read in the data.

In [6]:
#Read in data
sub_est_2012_df = pd.read_csv(
    'http://www.census.gov/popest/data/cities/totals/2012/files/SUB-EST2012.csv',
    encoding='latin-1',
    dtype={'STATE': 'str', 'COUNTY': 'str', 'PLACE': 'str'}
)

#Subset to states
st_est12=sub_est_2012_df[sub_est_2012_df['SUMLEV']==40]

st_est12

Unnamed: 0,SUMLEV,STATE,COUNTY,PLACE,COUSUB,CONCIT,NAME,STNAME,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012
0,40,1,0,0,0,0,Alabama,Alabama,4779736,4779745,4784762,4803689,4822023
1104,40,2,0,0,0,0,Alaska,Alaska,710231,710231,714046,723860,731449
1451,40,4,0,0,0,0,Arizona,Arizona,6392017,6392015,6410810,6467315,6553255
1672,40,5,0,0,0,0,Arkansas,Arkansas,2915918,2915919,2922750,2938582,2949131
2847,40,6,0,0,0,0,California,California,37253956,37253956,37334410,37683933,38041430
3924,40,8,0,0,0,0,Colorado,Colorado,5029196,5029196,5048472,5116302,5187582
4615,40,9,0,0,0,0,Connecticut,Connecticut,3574097,3574097,3576616,3586717,3590347
4900,40,10,0,0,0,0,Delaware,Delaware,897934,897934,899824,908137,917092
5024,40,11,0,0,0,0,District of Columbia,District of Columbia,601723,601723,604989,619020,632323
5028,40,12,0,0,0,0,Florida,Florida,18801310,18802690,18845967,19082262,19317568


It would be nice to have some abbreviations from there, so let's merge them in.

In [14]:
# state = pd.read_csv('http://www.census.gov/geo/reference/docs/state.txt', sep='|', dtype={'STATE': 'str'})
# state.drop(
#     ['STATENS'],
#     inplace=True, axis=1
# )

# st_est12 = pd.merge(st_est12, state, left_index=True, right_on='STATE')
# st_est12.drop(
#     ['SUMLEV', 'COUSUB', 'CONCIT', 'ESTIMATESBASE2010', 'POPESTIMATE2010', 'POPESTIMATE2011'],
#     inplace=True, axis=1
# )

# st_est12

Now, let's get to the d3 business.  We will start with a column chart of the five most populous states in the US. Let's load external javascript dependencies.

In [8]:
%%javascript
require.config({
    paths: {
        d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min'
    }
});

<IPython.core.display.Javascript object>

It appears the first thing we need to do is generate a `<div>` target to house the plots.  In doing so, we can set style parameters that control the look of our chart.

In [13]:
HTML("""
<style>
.bar {
 fill: steelblue;
}
.bar:hover {
 fill: brown;
}
.axis {
 font: 10px sans-serif;
}
.axis path,
.axis line {
 fill: none;
 stroke: #000;
}
.x.axis path {
 display: none;
}
</style>
<div id="chart_d3"/>
""")

Nowe we have to do a few things.  We must construct a template with JS code that will render the chart.  

In [38]:
st_est12_template = jinja2.Template(
"""
// Based on http://bl.ocks.org/mbostock/3885304

require(["d3"], function(d3) {
    var data = []

    {% for row in data %}
    data.push({ 'state': '{{ row[7] }}', 'population': {{ row[12] }} });
    {% endfor %}

    d3.select("#chart_d3 svg").remove()

    var margin = {top: 20, right: 20, bottom: 30, left: 40},
        width = 800 - margin.left - margin.right,
        height = 400 - margin.top - margin.bottom;

    var x = d3.scale.ordinal()
        .rangeRoundBands([0, width], .25);

    var y = d3.scale.linear()
        .range([height, 0]);

    var xAxis = d3.svg.axis()
        .scale(x)
        .orient("bottom");

    var yAxis = d3.svg.axis()
        .scale(y)
        .orient("left")
        .ticks(10)
        .tickFormat(d3.format('.1s'));
        
    var svg = d3.select("#chart_d3").append("svg")
        .attr("width", width + margin.left + margin.right)
        .attr("height", height + margin.top + margin.bottom)
        .append("g")
        .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

    x.domain(data.map(function(d) { return d.state; }));
    y.domain([0, d3.max(data, function(d) { return d.population; })]);

    svg.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(0," + height + ")")
        .call(xAxis);

    svg.append("g")
        .attr("class", "y axis")
        .call(yAxis)
        .append("text")
        .attr("transform", "rotate(-90)")
        .attr("y", 6)
        .attr("dy", ".71em")
        .style("text-anchor", "end")
        .text("Population");

    svg.selectAll(".bar")
        .data(data)
        .enter().append("rect")
        .attr("class", "bar")
        .attr("x", function(d) { return x(d.state); })
        .attr("width", x.rangeBand())
        .attr("y", function(d) { return y(d.population); })
        .attr("height", function(d) { return height - y(d.population); });
});
"""
)

To populate the data parameter in the chart, we can use the data from our pandas DataFrame directly.  Note, however, that it must be iterated over.  We are actually capturing entire rows in each tuple, and the `itertuples()` method does this via what appears to be a generator.  Just to see what's going on under the hood, here is the subset...

In [35]:
st_est12.sort(['POPESTIMATE2012'],ascending=False)[:5]

Unnamed: 0,SUMLEV,STATE,COUNTY,PLACE,COUSUB,CONCIT,NAME,STNAME,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012
2847,40,6,0,0,0,0,California,California,37253956,37253956,37334410,37683933,38041430
71426,40,48,0,0,0,0,Texas,Texas,25145561,25145561,25242683,25631778,26059203
46377,40,36,0,0,0,0,New York,New York,19378102,19378104,19399242,19501616,19570261
5028,40,12,0,0,0,0,Florida,Florida,18801310,18802690,18845967,19082262,19317568
7936,40,17,0,0,0,0,Illinois,Illinois,12830632,12830632,12840459,12859752,12875255


...and here is the tuple version.

In [36]:
print [tup for tup in st_est12.sort(['POPESTIMATE2012'],ascending=False)[:5].itertuples()]

[(2847, 40, u'06', u'000', u'00000', 0, 0, u'California', u'California', u'37253956', 37253956, 37334410, 37683933, 38041430), (71426, 40, u'48', u'000', u'00000', 0, 0, u'Texas', u'Texas', u'25145561', 25145561, 25242683, 25631778, 26059203), (46377, 40, u'36', u'000', u'00000', 0, 0, u'New York', u'New York', u'19378102', 19378104, 19399242, 19501616, 19570261), (5028, 40, u'12', u'000', u'00000', 0, 0, u'Florida', u'Florida', u'18801310', 18802690, 18845967, 19082262, 19317568), (7936, 40, u'17', u'000', u'00000', 0, 0, u'Illinois', u'Illinois', u'12830632', 12830632, 12840459, 12859752, 12875255)]


With this in mind, one can populate the table directly and render the chart.

In [37]:
display(Javascript(st_est12_template.render(
        data=st_est12.sort(['POPESTIMATE2012'],ascending=False)[:5].itertuples())
                  )
       )

<IPython.core.display.Javascript object>

In case you are looking for the chart, it has pushed it back up to the original `<div>` target.  It doesn't come out quite as expected, but all I really cared about was the interplay between Python and JS.  It appears that the IPython tools are available to bridge the gap.