Skip to content
This repository has been archived by the owner on Nov 20, 2021. It is now read-only.

Latest commit

 

History

History
359 lines (284 loc) · 17.1 KB

README.md

File metadata and controls

359 lines (284 loc) · 17.1 KB

Datagramas

A Python library for your Jupyter Notebook that helps you to use and scaffold visualizations with d3.js. It works with 3.4 or newer.

NOTE We are currently updating this library to the 1.0.0 version. Please help us test if installation works ok :)

datagramas screenshot

Overview

Datagramas is a visualization development support tool and a visualization library at the same time. Initially I implemented it to help me develop visualizations in the context of my doctoral thesis. I was researching the mixture of algorithms and visualizations, and therefore I was always iterating over algorithm and visualization design. Hence, currently datagramas supports some visualizations that I needed to implement in my doctoral thesis, plus other examples I found to be interesting to explore.

The main objective of datagramas is to provide an environment to bootstrap visualization implementations, and use scaffolding through templates to be able to reuse the visualizations and to explore data in the Jupyter Notebook.

An important aspect of datagramas is that it works with standard scientific Python data-structures: pandas DataFrames and NetworkX graphs. By using datagramas to develop your visualization, you do not need to worry about data structures and formats. For instance, have you ever found an example visualization that seemed to be what you wanted, but the data structure used was arbitrarily chosen by the developer? And that structure was completely different from what you were using/expecting? By using datagramas there are no arbitrary choices - you use DataFrames and specify which columns will be mapped to the visualization, and that's it.

Examples / Documentation

In addition to this readme, the following notebooks serve as examples/documentation:

Initialization / Installation

First, install the python package (please note that the requirements are needed to run datagramas, but not the examples):

$ pip install -r requirements.txt
$ python setup.py install

Then make a symbolic link in your IPython profile to the datagramas libs folder:

$ cd ~/.jupyter/custom
$ ~/.jupyter/custom$ ln -s ~/path_to_datagramas/datagramas/libs/ datagramas

And finally, edit the custom.js file and add the following lines (if there is no such file, create it):

require.config({
    paths: {
          "sankey": "/custom/datagramas/d3-sankey/sankey",
          "cartogram": "/custom/datagramas/d3-cartogram/cartogram",
          "d3": "/custom/datagramas/d3/d3.min",
          "leaflet": "/custom/datagramas/leaflet/leaflet",
          "topojson": "/custom/datagramas/topojson/topojson.min",
          "parsets": "/custom/datagramas/d3-parsets-1.2.4/d3.parsets",
          "datagramas": "/custom/datagramas/datagramas",
          "force_edge_bundling": "/custom/datagramas/d3-force-bundling/d3.ForceEdgeBundling",
          "legend": "/custom/datagramas/d3-legend/d3-legend.min",
          "cloud": "/custom/datagramas/d3-cloud/d3.layout.cloud",
          "cola": "/custom/datagramas/cola/cola.min",
          "d3-geo-projection": "/custom/datagramas/d3-geo-projection/d3.geo.projection.min",
          "d3-tip": "/custom/datagramas/d3-tip/index"
        },
    shim: {
      "sankey": {
        "exports": "d3.sankey", 
        "deps": ["d3"]
      }, 
      "cartogram": {
        "exports": "d3.cartogram", 
        "deps": ["d3"]
      }, 
      "cola": {
        "exports": "cola", 
        "deps": ["d3"]
      }, 
      "parsets": {
        "exports": "d3.parsets", 
        "deps": ["d3"]
      }, 
      "legend": {
        "exports": "d3.legend", 
        "deps": ["d3"]
      },
      "d3-geo-projection": {
        "exports": "d3.geo.projection",
        "deps": ["d3"]
      },
      "d3-tip": {
        "exports": "d3.tip",
        "deps": ["d3"]
      }
    },
});

require(['datagramas'], function(datagramas) {
    datagramas.add_css('/custom/datagramas/datagramas.css');
});

This code can be generated by the Python function datagramas.init_javascript_code(path='/custom/datagramas'). It will make Jupyter to load the necessary Javascript code for datagramas every time you load a notebook file. If you use an older version of Jupyter Notebook, note that you will need to include the "/static" prefix to those URLs.

Visualization Modules

All visualizations in datagramas are regular Python modules (see the datagramas/visualizations folder). A module is composed of a configuration file (__init__.py) and several template and style files.

The Let's Make Scaffold a Barchart example notebook contains a basic visualization that showcases some of these concepts.

Currently, datagramas includes the following visualizations (in alphabetical order):

  • cartogram of a TopoJSON topology and a pandas DataFrame.
  • cartography of a Topo/GeoJSON geometry, pandas DataFrames for marks and area colors, and NetworkX graphs over the map.
  • circlepack of a NetworkX tree.
  • flow (Sankey diagram) of a NetworkX graph.
  • force directed layout of a NetworkX graph.
  • parcoords - parallel coordinates with a pandas DataFrame.
  • parsets - parallel sets with a pandas DataFrame.
  • treemap of a NetworkX tree.
  • wordcloud of a pandas DataFrame.

The Basic Notebook Examples notebook showcases the usage of most of those visualizations.

The Let's Make a Map Too (and a Cartogram!) notebook showcases the usage of cartogram and cartography.

Template Files

The following are the template files used by datagramas when rendering a visualization:

  • template.js: the main template of each visualization module. Think of this file as the body of a draw() function in a typical visualization module.
  • template.css (optional)
  • functions.js (optional)

When datagramas renders your visualization, it embeds those files into a bigger visualization that follows the reusable chart pattern by Mike Bostock.

Module Configuration

A visualization module must contain a dictionary named VISUALIZATION_CONFIG in its __init__.py file, with at least some of the following elements:

  • Options: these are values that influence how the visualization is rendered. For instance, the cartography visualization has the following options:
'options': {
    'leaflet': False,
    'background_color': False,
    'graph_bundle_links': False
}

When rendering, if you call cartography(geometry=topojson), the geometry you specified will be rendered as any other visualization: just a plain SVG with white background. But if you call cartography(geometry=topojson, leaflet=True), the visualization will be rendered as a slippy map using leaflet.

  • Data: this element indicates which data variables will be available to the visualization. For instance, the cartogram visualization has the following setup:
'data': {
    'geometry': None,
    'area_dataframe': None,
}

This means that a cartogram can be called with a TopoJSON geometry (which you should load from a .js file) and a pandas DataFrame. In your visualization code, these variables will be available as _data_geometry and _data_area_dataframe.

  • Variables: the elements of this dictionary are directly translated into variables available in the template file. For instance, in the barchart example available above, these are the variables:
'variables': {
    'width': 960,
    'height': 500,
    'padding': {'left': 30, 'top': 20, 'right': 30, 'bottom': 30},
    'x': 'x',
    'y': 'y',
    'y_axis_ticks': 10,
    'y_label': None,
    'rotate_label': True,
}

All these variables are available in the template file, with an underscore appended (e.g., _width). Moreover, you can modify them when rendering by using keyword arguments: barchart(dataframe=df, x='letter', y='frequency').

  • Auxiliary Variables: these are Javascript variables that are available to the template code, but are not reachable from Python nor the public JS interface. You can use them to mantain state in the visualization or to cache results. This is an example from the cartography visualization:
  'auxiliary': {
      # a set to save mark positions. since there are two possible sources of positions, we need to do this.
      'mark_positions',
      # the list of available features from the geometry source.
      'available_feature_ids',
      # the list of colors per area
      'area_colors'
  }  

Those variables are available as auxiliary.var_name (e.g., auxiliary.mark_positions).

  • Read-only Properties: these are JS variables that are available in Javascript through getters. For instance, in the cartography visualization you can have a Leaflet instance, among other variables:
      'read_only': {
      # leaflet
      'L',
      'map',
      # the map projection. this could be used to add other things on top of the visualization.
      'projection',
      # here we save the geometry specified - it can be either GeoJSON or TopoJSON.
      'geometry'
  }

If your reusable chart is called chart, then, from Javascript, you can access those variables (e.g., chart.L()).

  • Mapped Attributes: these are mappings between data attributes (e.g., a column in your dataframe) and visualization attributes (e.g., the ratio of a circle). For instance, in the force visualization these are the mapped attributes:
  'attributes': {
      'node_ratio': {'min': 8, 'max': 16, 'value': None, 'scale': 'linear'},
      'link_opacity': {'min': 0.5, 'max': 1.0, 'value': None, 'scale': 'linear'},
      'link_width': {'min': 0.5, 'max': 1.0, 'value': None, 'scale': 'linear'},
  }

This means that, in JS, you will have a variable available named _var_name (e.g., _node_ratio). This variable will be a function that, when called with a datum, will return the corresponding value according to the range and scale (which could be linear, sqrt, or a number - used with d3.scale.pow()) defined in the parameters.

Following the force example, in Python you can specify a node_ratio when calling the visualization in three ways (note that g is a NetworkX graph):

datagramas.force(graph=g, node_ratio=15): all nodes will have ratio 15.

datagramas.force(graph=g, node_ratio='size'): node ratio will be proportional to the size node attribute, using the default minimum and maximum values, and the default scale.

datagramas.force(graph=g, node_ratio={'value': 'size', 'scale': 'sqrt', 'max': 32}): node ratio will be proportional to the size node attribute, with sqrt scale, with a maximum value of 32.

  • Colorables: these are mappings between data attributes and colors. For instance, the force visualization defines the following colorables:
  'colorables': {
      'node_color': {'value': 'steelblue', 'palette': None, 'scale': None, 'legend': False, 'n_colors': None},
      'link_color': {'value': 'grey', 'palette': None, 'scale': None, 'legend': False, 'n_colors': None}
  }

In a similar way to mapped attributes, you can specify a color directly, or by overriding the dictionary for each colorable:

datagramas.force(graph=g, link_color='purple): all links will be colored purple.

datagramas.force(graph=g, link_color={'value': 'source.bipartite', 'palette': 'Set2', 'scale': 'ordinal'}): all links will be colored according to the source.bipartite attribute of each link (this translates to the bipartite attribute of the source node of each link - yes, you can use dot notation).

Note that, given that we cannot discriminate between a color string and a column/attribute name, we need to specify the arguments dictionary.

If the palette is a string, it must be recognized by the function seaborn.color_palette.

  • Objects: these are d3js objects wrapped in a Python class.

Extra Functions

Your __init__.py file can define auxiliary functions and attributes. Datagramas particularly supports the following one:

  • PROCESS_CONFIG(config): where config is the current instance of the VISUALIZATION_CONFIG dictionary.

Among other uses, this function could be used to handle dependencies. For instance, if you specify leaflet=True in cartography, leaflet is added as a dependency. Or, if you specify a projection name (through the projection_name variable), a d3js object is added to the current visualization objects.

Scaffolding

Until now, we have explained how datagramas allows you to code and render visualizations. They are already usable on the Jupyter Notebook, but you want to export the visualization into a reusable chart that you can use in your projects. If that is the case, Datagramas includes that functionality through a method called scaffold.

For example, if you look at the barchart example you will find this notebook cell:

barchart(x='letter', y='frequency').scaffold(filename='./scaffolded_barchart.js')

What this line does is to create a file named scaffolded_barchart.js which you can import into your projects. This chart uses the reusable pattern mentioned in the introduction of this file. In the "In the wild" section at the end you can find a couple of links with scaffolded visualizations.

Credits

Datagramas bundles the following Javascript libraries (see the datagramas/libs subfolder):

The file datagramas/libraries.py specifies library versions and other meta-data.

Datagramas also contains snippets of code from:

  • D3 Plus: we use the color text function.
  • Utilitary Javascript functions from Stack Overflow users, acknowledged on datagramas/libs/datagramas.js.

Next Steps?

In no particular order:

  • Add events to all included visualizations (currently a few of them supportn events).
  • Facet data with small-multiples or visualization widgets (in a similar way to seaborn's FacetGrid).
  • Improve the legend support. Currently legend positioning is not smart, and legend activation is not automatic for charts.
  • Support other bundled layouts/plugins with d3.js.
  • Support layers in the cartography module.

About the (old) name

The first official version of Datagramas is titled "Matta" in honor of Roberto Matta. Curiously, he has a painting named "ojo con los desarrolladores" (desarrolladores is spanish for developers).

In the Wild

  • 2|S: Los Dos Santiagos: this is a project where we scaffolded many visualizations (Sankey, TopoJSON, Force Edge Bundle) to visualize transport data in Santiago, Chile. All visualizations in the page were scaffolded with datagramas! Note: the site is in spanish.
  • Twitter Data Portraits: this visualization was implemented in datagramas for my doctoral thesis. I needed a way to visualize Twitter profiles and the output of a recommender algorithm. Since the data used in the visualization was constantly changing (because algorithms were being developed), I needed a more dynamic way to implement the visualization than always editing JS/HTML files and then reloading everything, including re-execution of algorithms.

Versioning

Datagramas uses semantic versioning. We (will) start with 1.0.0.

Testing

There is no automated testing. However, the example notebooks pretty much cover everything. Feel free to contribute in this aspect!