Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Flexible Queries #43

Closed
knapply opened this issue Jul 16, 2020 · 0 comments
Closed

More Flexible Queries #43

knapply opened this issue Jul 16, 2020 · 0 comments
Assignees

Comments

@knapply
Copy link
Collaborator

knapply commented Jul 16, 2020

We can do better than using a single JSON Pointer for query=.

json <- c(
    json1 = '{"a":[1,2,3],"b":[4,5,6]}',
    json2 = '{"c":[4,5,6],"d":[7,8,9]}'
)

If you know exactly what values you want to pull into R, you could do the following:

RcppSimdJson::fparse(
    json, 
    query = "a/1",
    error_ok = TRUE
)
#> $json1
#> [1] 2
#> 
#> $json2
#> NULL

RcppSimdJson::fparse(
    json, 
    query = "b/2",
    error_ok = TRUE
)
#> $json1
#> [1] 6
#> 
#> $json2
#> NULL

RcppSimdJson::fparse(
    json, 
    query = "c/1",
    error_ok = TRUE
)
#> $json1
#> NULL
#> 
#> $json2
#> [1] 5

RcppSimdJson::fparse(
    json, 
    query = "d/2",
    error_ok = TRUE
)
#> $json1
#> NULL
#> 
#> $json2
#> [1] 9

But that's more tedious than just parsing the whole thing followed by so some post-parse-processing (which you'd have to do clean up the NULLs anyways).

It's a total waste of potential.

We can do this instead:

queries <- list(
    query_for_json1 = c("a/1", "b/2"),
    query_for_json2 = c("c/1", "d/2")
)

RcppSimdJson::fparse(json, queries)
#> $query_name_for_json1
#> $query_name_for_json1[[1]]
#> [1] 2
#> 
#> $query_name_for_json1[[2]]
#> [1] 6
#> 
#> 
#> $query_name_for_json2
#> $query_name_for_json2[[1]]
#> [1] 5
#> 
#> $query_name_for_json2[[2]]
#> [1] 9

This would dramatically increase the amount of work that can be done before any R objects materialize and minimize the amount of post-parse-processing a user might have to do in R (and potentially eliminate it for some sane and stable JSON schemata).

The code hygiene and performance benefits are absolutely worth it.

  • Proposed API:
    • single queries "recycle" (as they do now: this wouldn't break anyone's code)
    • the length of multiple queries must match the length json= and are applied in a zip-like fashion
    • nesting queries inside a list (ListOf<CharacterVector>) like the example above provides a way to apply multiple queries to each element
      • results of named queries carry the query names (also in example)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant