Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
77 lines (53 sloc) 2.65 KB
title: Data Rectangling with jq
author: Carl Boettiger
date: '2017-12-11'
categories: [R]
tags: [R, json]
> "Data rectangling": the process of turning highly nested data structures (e.g. JSON, XML) into a tabular format.
Data rectangling is a brilliant turn of phrase coined by Jenny Bryan (UBC, RStudio) and leader in the #rstats community. [Recording]( or [slides]( of Jenny's talk on the subject give a much better intro to the idea and working with this in R, particularly through the `purrr` package.
As nice as `purrr` is for the task, I've recently found that the [`jqr` package]( from [Scott Chamberlain]( and co can be a much easier way to go about rectangling your JSON. Here's a quick comparison based on an example from the [lesson]( Hadley Jenny have on Data Rectangling.
```{r include=FALSE}
library(DT) <- function(x, ...) {
knitr::knit_print(DT::datatable(x), ...)
```{r message=FALSE}
## Using purrr
```{r results='asis'}
gh_flat <- gh_repos %>% flatten() # abandon nested structure and hope we didn't need it
gh_tibble <- tibble(
name = gh_flat %>% map_chr("name"),
issues = gh_flat %>% map_int("open_issues_count"),
wiki = gh_flat %>% map_lgl("has_wiki"),
homepage = gh_flat %>% map_chr("homepage", .default = ""),
owner = gh_flat %>% map_chr(c("owner", "login"))
Note we need to be explicit about missing value defaults and types.
## Using jqr
Note that we can simply exploit the object typing already encoded in the data (`int`, `lgl`,`chr`)
```{r results="asis", message=FALSE}
f <- system.file("extdata/gh_repos.json", package="repurrrsive")
read_file(f) %>%
jq('.[][] | {
name: .name,
issues: .open_issues_count,
wiki: .has_wiki,
homepage: .homepage,
owner: .owner.login
} ') %>%
jqr::combine() %>% # single json file
This example only touches the surface of the `jq` syntax. The [jq manual]( provides a nice overview of this intuitive syntax. `jq` can also perform a wide range of data processing on the elements: including conditionals, comparisons, regular expressions, math, and so forth. While these are great, most R users will want to learn just enough `jq` syntax to get back a nice data rectangle, and then `dplyr` can take over.