-
Notifications
You must be signed in to change notification settings - Fork 1
/
json_vs_yaml.Rmd
164 lines (117 loc) · 3.73 KB
/
json_vs_yaml.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "JSON versus YAML"
date: "`r Sys.Date()`"
output:
workflowr::wflow_html:
toc: false
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
JSON and YAML are popular [serialisation](https://en.wikipedia.org/wiki/Serialization) formats.
>In computing, serialization (or serialisation) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e.g. data streams over computer networks) and reconstructed later (possibly in a different computer environment).
Install the following packages:
* [jsonlite](https://github.com/jeroen/jsonlite) for parsing JSON
* [yaml](https://github.com/vubiostat/r-yaml) for parsing YAML
* [tidyjson](https://github.com/colearendt/tidyjson) for converting JSON to tidy data frames
* [rjson](https://github.com/alexcb/rjson) for parsing JSON
```{r install}
install.packages(c("jsonlite", "yaml", "tidyjson", "rjson"))
```
Load libraries.
```{r load}
library(jsonlite)
library(yaml)
library(tidyjson)
library(rjson)
```
As a first example, we will convert the `women` data set, which is a small data set with 15 observations for 2 variables.
```{r women}
women
```
Convert `women` to JSON using `jsonlite`.
```{r women_json}
women_json <- jsonlite::toJSON(women, pretty = TRUE)
women_json
```
`read_json` does not parse the output of `toJSON`.
```{r read_json}
jsonlite::write_json(x = women_json, path = "women.json")
tidyjson::read_json(path = "women.json")
```
Converts into list.
```{r rjson_from_json}
str(rjson::fromJSON(women_json))
```
Convert `women` to YAML.
```{r women_yaml}
women_yaml <- as.yaml(women, indent = 3)
writeLines(women_yaml)
```
## To data frame
JSON to data frame.
```{r json_to_df}
jsonlite::fromJSON(women_json)
```
YAML to data frame. This does not work for more complex data structures (see below).
```{r yaml_to_df}
yaml.load(women_yaml, handlers = list(map = function(x) as.data.frame(x) ))
```
## Non-tidy data frame
A data frame containing lists.
```{r my_df}
my_df <- data.frame(
id = 1:3,
title = letters[1:3]
)
my_df$keywords = list(
c('aa', 'aaa', 'aaaa'),
c('bb', 'bbb'),
c('cc', 'ccc', 'cccc', 'ccccc')
)
my_df
```
Convert `my_df` to JSON.
```{r my_df_json}
my_df_json <- jsonlite::toJSON(my_df, pretty = TRUE)
my_df_json
```
Convert `my_df` to YAML.
```{r my_df_yaml}
my_df_yaml <- as.yaml(my_df, indent = 3)
writeLines(my_df_yaml)
```
## JSON to YAML and vice versa
Converting from JSON to YAML is easy.
```{r json_to_yaml}
identical(writeLines(as.yaml(jsonlite::fromJSON(my_df_json))), writeLines(my_df_yaml))
```
Converting from YAML to JSON for `my_df` is not as straight-forward because of the different number of keywords.
```{r yaml_load}
my_df_list <- yaml.load(my_df_yaml)
my_df_list
```
This conversion is different from the original data frame to JSON conversion because this creates a single object, where as the original conversion creates an array with three objects.
```{r yaml_to_json_wrong}
jsonlite::toJSON(my_df_list, pretty = TRUE)
my_df_json
```
I could probably write a hacky function to make the conversion but I won't.
## Parsing JSON
The [ffq](https://github.com/pachterlab/ffq) tool generates metadata in JSON:
ffq SRX079566 > data/SRX079566.json
```{r read_ffq_json}
ffq_json <- jsonlite::read_json(path = "data/SRX079566.json", simplifyVector = TRUE)
str(ffq_json)
```
Use a recursive apply to create a named character vector, which is convenient for plucking values.
```{r rapply_ffq}
test <- rapply(object = ffq_json, f = function(x) x)
class(test)
```
Subset the FTP links.
```{r ftp_url}
test[grepl("ftp.url\\d+$", names(test))]
```