Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
210 lines (152 sloc) 5.8 KB
---
title: JSON-LD Vita exploration
author: Carl Boettiger
date: '2017-12-09'
categories: [R]
tags: [R, json-ld]
---
```{r setup, include=FALSE}
library(printr)
library(jsonld)
library(jsonlite)
library(jqr)
library(magrittr)
library(httr)
library(xml2)
# devtools::install_github("cboettig/rdflib@0.0.1")
library(rdflib)
```
# Using JSON Queries (JQ)
## Pure JSON
```{r}
vita <- readr::read_file("../../static/js/vita.json")
jq(vita, '."@reverse".author[] |
{ year: .dateCreated,
author: .author[] | [.givenName, .familyName] | join(" ")
}') %>% combine() %>% fromJSON()
```
### With JSON-LD frame
By first constructing a frame, we can get back a subset of the data we are interested in. This is not as powerful as a graph query, but still has aspects of schema-on-read.
```{r}
frame <-
'{
"@context": "http://schema.org",
"@type": "ScholarlyArticle",
"author": {
"@type": "Person",
"givenName": {},
"familyName": {},
"@explicit": true
},
"dateCreated": {},
"@explicit": true
}'
vita <- jsonld::jsonld_frame("../../static/js/vita.json", frame)
```
```{r}
as.character(vita) %>%
jq('."@graph"[] | {
year: .dateCreated,
author: .author[] | [.givenName, .familyName] | join(" ")
}') %>% combine() %>% fromJSON()
```
# SPARQL and RDF
A simple example
```{r message=FALSE}
"http://dx.doi.org/10.1002/ece3.2314" %>%
httr::GET(httr::add_headers(Accept="application/rdf+xml")) %>%
httr::content(as = "parsed", type = "application/xml") %>%
xml2::write_xml("ex.xml")
```
Our `rdflib` functions perform the simple task of parsing this `rdfxml` file into R (as a `redland` `rdf` class object) and then writing it back out in `jsonld` serialization:
```{r}
rdf_parse("ex.xml", "rdfxml") %>%
rdf_serialize("ex.json", "jsonld")
```
and we now have JSON file. We can clean this file up a bit by replacing the long URIs with short prefixes by "compacting" the file into a specific JSON-LD context. FOAF, OWL, and Dublin Core are all recognized by schema.org, so we need not declare them at all here. PRISM and BIBO ontologies are not, so we simply declare them as additional prefixes:
```{r}
context <-
'{ "@context": [
"http://schema.org",
{
"prism": "http://prismstandard.org/namespaces/basic/2.1/",
"bibo": "http://purl.org/ontology/bibo/"
}]
}'
json <- jsonld_compact("ex.json", context)
```
## Switching contexts and framing
```{r}
context <-
'{
"prism": "http://prismstandard.org/namespaces/basic/2.1/",
"dc": "http://purl.org/dc/terms/",
"bibo": "http://purl.org/ontology/bibo/",
"foaf": "http://xmlns.com/foaf/0.1/",
"owl": "http://www.w3.org/2002/07/owl#",
"schema": "http://schema.org/",
"schema:pageStart": "prism:startingPage",
"schema:pageEnd": "prism:endingPage",
"schema:volumeNumber": "prism:volume",
"schema:identifier": {"@id": "prism:issn", "@type": "@id"},
"schema:Periodical": "bibo:Journal",
"schema:author": "dc:creator",
"schema:isPartOf": "dc:isPartOf",
"schema:publisher": "dc:publisher",
"schema:name": "dc:title",
"schema:familyName": "foaf:familyName",
"schema:givenName": "foaf:givenName",
"schema:Person": "foaf:Person",
"schema:sameAs": {"@id": "owl:sameAs", "@type": "@id"},
"schema:Date": "xsd:date",
"schema:datePublished": {"@id": "http://purl.org/dc/terms/date", "@type": "schema:Date"}
}'
```
Compact raw JSON into this context
```{r}
jsonld_compact("ex.json", context) %>%
fromJSON(simplifyVector = FALSE) -> X
```
Now replace that context with schema.org context, a bit of a hack
```{r}
X[["@context"]] <- "http://schema.org"
X %>%
toJSON(auto_unbox = TRUE, pretty = TRUE) %>%
jsonld_compact("http://schema.org") -> Y
```
Now frame our desired results to explicitly include only the elements we request, giving the graph in the desired tree structure:
```{r}
frame <-
'{"@context": "http://schema.org",
"@graph": {
"id": {},
"name": {},
"pageStart": {},
"pageEnd": {},
"isPartOf": {
"name": {},
"identifier": {},
"@explicit": true
},
"author": [
{
"givenName": {},
"familyName": {},
"@explicit": true
}],
"@explicit": true
}
}'
jsonld_frame(Y, frame)
```
Note that the RDF has different semantic models than schema.org: for instance, `volume` is a property of the scholarly article (well, it's untyped in the RDF, but it's a property of the object described by the article DOI), while in schema.org, `volumeNumber` is a property of a `Periodical` (or `PublicationVolume`), which `hasPart`s made up of `PublicationIssue` objects, themselves `hasPart`s made up of `ScholarlyArticle`s. The whole purpose of JSON-LD functions are to respect semantics, therefore there is no way we can use JSON-LD operations to alter these semantics.
As long as we aren't changing the object structures though, we can change the vocabulary. This is really also something of a hack: we compact the original data, and then just chop off the `@context` and provide our own `@context` that gives schema.org definitions to the terms.
JSON-LD is commonly used to change key names, but this assumes that both contexts can be defined relative to the same URIs. e.g. we can say that in the context of Dublin Core, implicitly `"title": "http://schema.org/name",
or explicitly: `"https://purl.org/dc/title": "http://schema.org/name"`.
Perhaps this ought instead to be done with an ontological operation and the assertion of `sameAs` and similar relationships. Perhaps that would also permit moving between these different levels?
Note that items with specific types must be declared as such to match types expected in schema.org. Others can be captured as schema.org terms just by setting the default `@vocab`.
```{r include=FALSE}
unlink("ex.xml")
unlink("ex.nquads")
unlink("ex.json")
```