Skip to content

Explode data filter

David Megginson edited this page May 12, 2017 · 3 revisions
Explode data filter form

The Explode data filter on the Recipe page allows you to convert a wide dataset (e.g. time series) to a narrow dataset (suitable for visualisations and databases), similar to the "reshape" command in the R language. For background information, see the Wikipedia article "Wide and narrow data."

The command looks for the HXL +label attribute. Every column where the +label attribute appears in the input ("wide") data appears will generate a new row in the output ("narrow") data, with the column header as one of the values. The output columns will have the +label attribute removed, and new +header and +value attributes added.

(This process is easier to understand by looking at the Example section below.)

Options

Attribute for former header text: the HXL attribute to add to the original hashtag for the column containing the former text label (defaults to +header).

Attribute for original value: the HXL attribute to add to the original hashtag for the column containing the original value (defaults to +value).

Example

Before (wide data)

District 2010 2011 2012 2013 2014 2015
#adm1 #inneed+label #inneed+label #inneed+label #inneed+label #inneed+label #inneed+label
Coast 5000 6000 6000 2500 1500 1000
Plains 0 0 1000 8000 10000 12000

After (narrow data)

District -- --
#adm1 #inneed+header #inneed+value
Coast 2010 5000
Coast 2011 6000
Coast 2012 6000
Coast 2013 2500
Coast 2014 1500
Coast 2015 1000
Plains 2010 0
Plains 2011 0
Plains 2012 1000
Plains 2013 8000
Plains 2014 10000
Plains 2015 12000

Use cases

This filter is especially useful for taking data that has been optimised for human readability (e.g. time-series data) and converting it to a format that's more suitable for automated processing, such as generating visualisations, importing into a database, etc.

Typically, you would follow the Explode data filter with a Rename column filter to change the output column headers and hashtags to something more meaningful, like this:

District Year People in need
#adm1 #date+year #inneed
Coast 2010 5000
Coast 2011 6000
Coast 2012 6000
Coast 2013 2500
Coast 2014 1500
Coast 2015 1000
Plains 2010 0
Plains 2011 0
Plains 2012 1000
Plains 2013 8000
Plains 2014 10000
Plains 2015 12000

When using the Tagger page to add hashtags to a non-HXL dataset, the Default HXL hashtag field is especially useful: if you set it to a value like "#inneed+label", it will automatically fill that in for all the time-series columns.