Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge cases and defaults #18

Closed
knapply opened this issue Jun 11, 2020 · 12 comments
Closed

Edge cases and defaults #18

knapply opened this issue Jun 11, 2020 · 12 comments

Comments

@knapply
Copy link
Collaborator

@knapply knapply commented Jun 11, 2020

What are the edge cases {RcppSimdJson} needs to handle?

What should be the defaults (internal and user-level)?

  • {}
    • Current candidates:
      • [ ] named list(): structure(list(), .Names = character(0))
        • same as jsonlite::fromJSON()
      • [ ] NULL
        • same as jsonify::from_json()
    • Resolved by 94e571a (user passes any R object to replace empty objects with NULL as the defaut).
  • []
    • Current candidates:
      • [ ] vector(): logical(length = 0L)
      • [ ] vector(mode = "list"): list()
        • same as jsonify::from_json() and jsonlite::fromJSON()
    • Resolved by 94e571a (user passes any R object to replace empty arrays with NULL as the defaut).
  • null
    • like {} and [], allow user to pass any R object to replace single null, with NULL as the default.
@knapply
Copy link
Collaborator Author

@knapply knapply commented Jun 11, 2020

@dcooley Moving the conversation from #17 over here:

  • []: vector()
  • {}: structure(list(), .Names = character(0))

Here's the rationale for going about {} and [] this way:

{}

If we say the R representation of {"a":1} is list(a = 1L), we're really saying that its R representation is structure(list(1L), .Names = "a").

If we follow that analogy, empty object {} is structure(list(), .Names = character(0)), which print()s as named list().

JSON R Representation equivalent/print()
{"a":1} structure(list(1L), .Names = "a") list(a = 1L)
{} structure(list(), .Names = character()) named list()

[]

And similarly you're returing [] as NA_LOGICAL?

See above: [] returns Rcpp::LogicalVector(0) (vector of length 0):

RcppSimdJson:::.deserialize_json('[]')
#> logical(0)

This is really a matter of deciding which of the following represents []:

  1. vector()
  2. vector(mode = "list").

But it does raise a few other questions:

  1. Are empty arrays recursive?
  2. Are empty arrays homogeneous?
  3. Are empty arrays "just arrays"... or are they really a special case?

{jsonify} and {jsonlite} use this pattern for simplifcation:

JSON R str()
["a",3.14,1,true,null] c("a","3.14","1","TRUE",NA) chr [1:5] "a" "3.14" "1" "TRUE" NA
[3.14,1,true,null] c(3.14, 1, 1, NA) num [1:4] 3.14 1 1 NA
[1,true,null]' c(1L, 1L, NA) int [1:3] 1 1 NA
[true,null]' c(TRUE, NA) logi [1:2] TRUE NA
[null] NA logi NA
[] list() list()

RcppSimdJson:::.deserialize_json()'s current default uses rcppsimdjson::Type_Policy::anything_goes, which follows nearly the same pattern of collapsing arrays of scalars into R vectors, no matter what.

Here's the same table with the proposed default {RcppSimdJson} pattern for simplification in bold:

JSON R str()
["a",3.14,1,true,null] c("a","3.14","1","TRUE",NA) chr [1:5] "a" "3.14" "1" "TRUE" NA
[3.14,1,true,null] c(3.14, 1, 1, NA) num [1:4] 3.14 1 1 NA
[1,true,null]' c(1L, 1L, NA) int [1:3] 1 1 NA
[true,null]' c(TRUE, NA) logi [1:2] TRUE NA
[null] NA logi NA
[] logical(0) logi(0)
[] list() list()

The idea behind the proposed {RcppSimdJson} approach is to just consistently following the same pattern.

I'd say the difference is that it treats [] as non-recursive and homogeneous (thus becoming a vector) instead of a special case like {jsonify} and {jsonlite} seem to do (potentially recursive and/or homogeneous?).

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 11, 2020

🤷‍♂️ Are there parsers / linters / specs / best practices?

I like how set user-drive policy options otherwise. That is simply the most flexible.

@lemire Thoughts?

@knapply
Copy link
Collaborator Author

@knapply knapply commented Jun 11, 2020

It seems @dcooley and I don't quite agree with some of the default choices {jsonlite} makes. There are compromises and edge cases no matter how you do it, so a 100% consensus isn't likely.

Jeroen Ooms' The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects is a wonderful deep dive into the a lot of the gory, R-specific details.

@dcooley
Copy link
Collaborator

@dcooley dcooley commented Jun 11, 2020

@knapply your logic all makes complete sense. I also tried to stay consistent with jsonlite, but I think I deviated away where round-trips didn't work because I wanted a consistent way to get to and from the same object. (I'm not saying I was right, but it's what I did).

toJSON( NULL )
# {}
fromJSON('{}')
# named list()
to_json( NULL )
# {}
from_json('{}')
NULL

So in summary, I don't have any issues with the defaults chosen for RcppSimdJson, but I think the round-trip is something to keep in mind if/when we can write the to json side of it.

@knapply
Copy link
Collaborator Author

@knapply knapply commented Jun 11, 2020

I'll try and sort the user-passable options ASAP so we at least have maximum flexibility.

I think there are good arguments both ways to be honest.

I have a deployed shiny app that uses {jsonify} to serialize some crazy list columns so users can download spreadsheets, make modifications, and re-upload the changes. We're talking 100k+ row monstrosities that {jsonify} handles with ease.

The tinkering I had to do was all in excel-related packages to get round trips to pass identical() testing, not at all in {jsonify}. I don't think there's a perfect answer for what "right" looks here, but you can't really do better than that!

@knapply
Copy link
Collaborator Author

@knapply knapply commented Jun 18, 2020

I swapped out the JSON parsing routine for an API client with .deserialize_json() for another real-world use case and realized that allowing the user to also replace single nulls with any R object would eliminate a whole step in cleaning up JSON responses from APIs that spam nulls in every object.

I probably have some variation of %||% defined in any code that touches a web API, so I don't know why I didn't think of this earlier.

  • This would
    • eliminate that the need for the user to do that at the R level
      • removing a commonly needed step in constructing data frames manually
    • provide a way to match jsonify::from_json(simplify = FALSE)'s behavior
    • be totally opt-in, as the default can just be set to NULL

So it's on the TODO list.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 23, 2020

Anything actionable here now? I may have lost track of the main storyline...

@knapply
Copy link
Collaborator Author

@knapply knapply commented Jun 23, 2020

The last open checkbox/comment before yours #18 (comment) (unless someone makes a case why it's a bad idea) is the only thing remaining.

It's a matter of adding a single_null = R_NilValue parameter (single_null= is just the first thing that comes to mind) to .deserialize_json() and .load_json() and all the rcppsimdjson::deserialize::simplify_*() functions and anything calling them.

Ultimately, the returned value is here:

case simdjson::dom::element_type::NULL_VALUE:
return R_NilValue;

so that would just need to change from R_NilValue to single_null.

It's the same pattern as empty_array= and empty_object=.

I left the issue open as a reminder to myself, but it can be closed.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 24, 2020

Is what we have "good enough" to be added to master to form 0.0.6?

@knapply
Copy link
Collaborator Author

@knapply knapply commented Jun 24, 2020

Yes.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 24, 2020

Ok, no pending merge or pr. I look over what we have and roll it up this eve or tomorrow.

@knapply
Copy link
Collaborator Author

@knapply knapply commented Jul 5, 2020

single nulls handled in 9957d60

@knapply knapply closed this Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.