-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from_json #10
Comments
Lovely :) Fair point re a possible interface package but I think it would be lovely to have this and expose it. This turned also into a really neat conversation with upstream who may enjoy seeing the feature being present and providing, if you wish, extra 'test coverage' because some more user data may be coming this way. |
I started playing with simdjson + Rcpp back in September(?), but I clearly didn't make much public progress: https://github.com/knapply/simdjsonr @dcooley, I haven't spent any time on a "simplify" workflow (I just don't really use it myself), but now that I've seen the JSON Pointer part in action, I think simdjson is a total game-changer. I have no idea how I missed the Pointer functionality in RapidJSON, but I barely knew what I was doing in C++ when I was working with it more regularly (not that I know what I'm doing 7ish months later). Regardless, it has already made working with enormous (ND)JSON(L) data sets from R actually viable. All that said, it doesn't make much sense to do my own thing elsewhere, especially since this is already on CRAN and rapport with the folks upstream has been established. The approach I'm using follows (I just dropped them in gists instead of pushing a a bunch of garbage to the old repo), I'll start aggregating things into my fork for proper PRs. Dave, I suspect some combination of our approaches may make sense 🤷♂, but it's a pretty safe bet you've spent more time thinking about JSON (and I default to assuming my C++ code is a ticking time bomb). https://gist.github.com/knapply/0cfda08e85ba3fa4f7e61071f83d4768 parse_json.cpp// SIMDJSON_VERSION == 0.3.1
#include <Rcpp.h>
#include <simdjson/simdjson.h>
#include <simdjson/simdjson.cpp>
namespace Rcpp {
template <>
inline SEXP wrap<int64_t>(const int64_t& obj) {
auto out = Rcpp::NumericVector(1);
std::memcpy(&(out[0]), &(obj), sizeof(double));
out.attr("class") = "integer64";
return out;
}
} // namespace Rcpp
namespace simdjsonr {
template <typename int_T>
inline constexpr bool is_really_int64_t(int_T);
template <>
inline constexpr bool is_really_int64_t<uint64_t>(uint64_t x) {
return x > INT_MAX - 1;
}
template <>
inline constexpr bool is_really_int64_t<int64_t>(int64_t x) {
return x > INT_MAX - 1 || x < INT_MIN + 1;
}
template <typename int_T, bool bit64_integer64, bool int_64_strings>
inline constexpr SEXP resolve_int(int_T x) {
return is_really_int64_t<int_T>(x)
? (bit64_integer64 ? Rcpp::wrap<int64_t>(x)
: int_64_strings ? Rcpp::wrap(std::to_string(x)) : Rcpp::wrap<double>(x))
: Rcpp::wrap<int>(x);
}
template <typename F>
inline SEXP build_object(dom::object&& object, F f) {
const R_xlen_t n = std::size(object);
Rcpp::List out(n);
Rcpp::CharacterVector out_names(n);
R_xlen_t i = 0;
for (auto [key, val] : object) {
out[i] = f(val);
out_names[i] = std::string(key);
i++;
}
out.attr("names") = out_names;
return out;
}
template <typename F>
inline auto build_array(dom::array&& object, F f) {
Rcpp::List out;
for (dom::element child : object) {
out.push_back(f(child));
}
return out;
}
template <bool bit64_integer64, bool int_64_strings>
SEXP dump_json(dom::element element) {
switch (element.type()) {
case dom::element_type::ARRAY:
return build_array(element, dump_json<bit64_integer64, int_64_strings>);
case dom::element_type::OBJECT:
return build_object(element, dump_json<bit64_integer64, int_64_strings>);
case dom::element_type::INT64:
return resolve_int<int64_t, bit64_integer64, int_64_strings>(element);
case dom::element_type::UINT64:
return resolve_int<uint64_t, bit64_integer64, int_64_strings>(element);
case dom::element_type::DOUBLE:
return Rcpp::wrap<double>(element);
case dom::element_type::STRING:
return Rcpp::wrap(std::string(element));
case dom::element_type::BOOL:
return Rcpp::wrap<bool>(element);
case dom::element_type::NULL_VALUE:
[[fallthrough]];
default:
return R_NilValue;
}
}
template <bool use_json_pointer>
inline constexpr simdjson::dom::element stage_element(simdjson::dom::element element,
const std::string_view& json_pointer) {
return use_json_pointer ? element.at(json_pointer) : element;
}
template <bool warning>
inline constexpr void throw_bad_parse(const char* msg) {
warning ? Rcpp::warning(msg) : Rcpp::stop(msg);
}
template <bool warning, bool use_json_pointer, bool bit64_integer64, bool int_64_strings>
SEXP parse_json(const Rcpp::CharacterVector& json, const std::string_view& json_pointer) {
const R_xlen_t n = std::size(json);
Rcpp::List out(n);
simdjson::dom::parser parser;
for (R_xlen_t i = 0; i < n; ++i) {
auto [res, error] = parser.parse(std::string_view(json[i]));
if (error) {
throw_bad_parse<warning>("parse error");
continue;
}
out[i] = dump_json<bit64_integer64, int_64_strings>(stage_element<use_json_pointer>(res, json_pointer));
}
return out;
}
inline constexpr auto parse_int64_as_integer64_stop = parse_json<false, false, true, false>;
inline constexpr auto parse_int64_as_string_stop = parse_json<false, false, false, true>;
inline constexpr auto parse_int64_as_double_stop = parse_json<false, false, false, false>;
inline constexpr auto parse_pointer_int64_as_integer64_stop = parse_json<false, true, true, false>;
inline constexpr auto parse_pointer_int64_as_string_stop = parse_json<false, true, false, true>;
inline constexpr auto parse_pointer_int64_as_double_stop = parse_json<false, true, false, false>;
inline constexpr auto parse_int64_as_integer64_warning = parse_json<true, false, true, false>;
inline constexpr auto parse_int64_as_string_warning = parse_json<true, false, false, true>;
inline constexpr auto parse_int64_as_double_warning = parse_json<true, false, false, false>;
inline constexpr auto parse_pointer_int64_as_integer64_warning = parse_json<true, true, true, false>;
inline constexpr auto parse_pointer_int64_as_string_warning = parse_json<true, true, false, true>;
inline constexpr auto parse_pointer_int64_as_double_warning = parse_json<true, true, false, false>;
} // namespace simdjsonr
//
//
// [[Rcpp::export(.parse_json_impl)]]
SEXP parse_json_impl(const Rcpp::CharacterVector& json,
const std::string& json_pointer,
const bool bit64_integer64,
const bool int_64_strings,
const bool error_on_bad_parse) {
using namespace simdjsonr;
const auto use_pointer = !json_pointer.empty();
if (error_on_bad_parse) {
if (bit64_integer64) {
return use_pointer ? parse_pointer_int64_as_integer64_stop(json, json_pointer)
: parse_int64_as_integer64_stop(json, json_pointer);
}
if (int_64_strings) {
return use_pointer ? parse_pointer_int64_as_string_stop(json, json_pointer)
: parse_int64_as_string_stop(json, json_pointer);
} else {
return use_pointer ? parse_pointer_int64_as_double_stop(json, json_pointer)
: parse_int64_as_double_stop(json, json_pointer);
}
} else {
if (bit64_integer64) {
return use_pointer ? parse_pointer_int64_as_integer64_warning(json, json_pointer)
: parse_int64_as_integer64_warning(json, json_pointer);
}
if (int_64_strings) {
return use_pointer ? parse_pointer_int64_as_string_warning(json, json_pointer)
: parse_int64_as_string_warning(json, json_pointer);
} else {
return use_pointer ? parse_pointer_int64_as_double_warning(json, json_pointer)
: parse_int64_as_double_warning(json, json_pointer);
}
}
} ... and simdjson_parse.md has the R wrapper function and some examples of what it looks like in action... simdjson_parse <- function(x, json_pointer = "",
int64 = c("auto", "integer64", "string", "double"),
error_on_bad_parse = TRUE) {
int64 <- match.arg(int64, c("auto", "integer64", "string", "double"))
if (int64 %in% c("auto", "integer64")) {
bit64_available <- requireNamespace("bit64", quietly = TRUE)
if (int64 == "integer64" && !bit64_available) {
stop('`int64` set to `"integer64"`, but {bit64} is not installed.')
}
if (bit64_available) { # int64_t as bit64::integer64
out <- .parse_json_impl(
json = x, json_pointer,
bit64_integer64 = TRUE, int_64_strings = FALSE,
error_on_bad_parse = error_on_bad_parse
)
} else {
int64 = "string"
}
}
if (int64 == "string") { # int64_t as character
out <- .parse_json_impl(
json = x, json_pointer,
bit64_integer64 = FALSE, int_64_strings = TRUE,
error_on_bad_parse = error_on_bad_parse
)
} else { # int64_t as double
out <- .parse_json_impl(
json = x, json_pointer,
bit64_integer64 = FALSE, int_64_strings = FALSE,
error_on_bad_parse = error_on_bad_parse
)
}
if (length(out) > 1L) out else out[[1L]]
} simdjson_parse("[]")
simdjson_parse("{}")
simdjson_parse('{"simd":["j","s","o","n"]}')
simdjson_parse(c("bad_json", '{"good_json":true}'))
simdjson_parse(c("bad_json", '{"good_json":true}'), error_on_bad_parse = FALSE)
simdjson_parse('{"ints":[1,2,3]}')
is.integer(unlist(simdjson_parse('{"ints":[1,2,3]}')))
simdjson_parse('{"big_int":1178007955838509057}')
simdjson_parse('{"big_int":2356015911677018114}', int64 = "string")
simdjson_parse('{"big_int":3534023867515527171}', int64 = "double")
simdjson_parse(
'{"big_ints":[{"a":1178007955838509057,"b":2356015911677018114,"c":[2356015911677018114,4712031823354036228]}]}',
json_pointer = "big_ints/0/c/1"
)
Benchmarkingtweet_json <- readr::read_lines("../tweetio/inst/example-data/ufc-tweet-stream.json")
test_json <- tweet_json[vapply(tweet_json, jsonlite::validate, logical(1L))]
length(test_json)
library(jsonlite)
# library(jsonify, warn.conflicts = FALSE)
bench::mark(
simdjson = simdjson <- simdjson_parse(test_json),
fairer_simdjson = fairer_simdjson <- lapply(test_json, simdjson_parse)
# jsonify = jsonify <- lapply(test_json, from_json, simplify = FALSE) # sefgaults when knitting...?
,
jsonlite = jsonlite <- lapply(test_json, parse_json)
,
check = FALSE
)
Comparing Outputsimdjson[[200]]$entities$user_mentions[[1]][c("id", "id_str", "indices")]
# jsonify[[200]]$entities$user_mentions[[1]][c("id", "id_str", "indices")]
jsonlite[[200]]$entities$user_mentions[[1]][c("id", "id_str", "indices")]
|
yeah I've been focussed on getting the correct R object for the given JSON, which includes the simplification processes. And I haven't so much concentrated on performance. here are a few tests and examples. Currently some |
A quick* benchmark suggests theres some overhead I haven't accounted for, as this implementation is currently slower than library(jsonify)
library(jsonlite)
library(RcppSimdJson)
library(microbenchmark)
js <- readLines('http://opendata.canterburymaps.govt.nz/datasets/fb00b553120b4f2fac49aa76bc8d82aa_26.geojson')
js <- paste0(js, collapse = "")
microbenchmark::microbenchmark(
jsonify = { jfy <- jsonify::from_json( js ) },
jsonlite = { jlt <- jsonlite::fromJSON( js ) },
simdjson = { sim <- RcppSimdJson::from_json( js ) },
times = 5
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# jsonify 138.4742 139.7152 140.6192 139.7596 141.3846 143.7623 5
# jsonlite 1230.4436 1232.6133 1256.6139 1251.1161 1267.9963 1300.9003 5
# simdjson 201.8796 202.3721 203.7961 202.5413 204.1732 208.0143 5
|
@dcooley , try benchmarking a non-geojson file: https://github.com/simdjson/simdjson/blob/master/doc/performance.md#number-parsing I'm trying to figure out how to get this to pass R CMD check with the latest simdjson (this one is still missing Now that RTools40 is considered stable, this should be viable for Windows R users. It's worth noting that a @eddelbuettel Are you opposed to |
|
Let me rephrase. The way I see it we have a few options:
I don't have any other package where header and use are split. We could do that but I don't yet see a really compelling reason besides "well we can". But I may miss something. In any event we can revisit... |
That's the plan. I'd just like to have a solution ready first.
The reason is flexibility. The omnipresence of JSON extends to environments and systems with all kinds of requirements; some valid, some nonsensical.
I was thinking that splitting now (while it has a minimal amount of users) would prevent disruptive headaches later. After more consideration, keeping things together is probably safer: simdjson itself is relatively young and it's clearly evolving... and I suppose copying the two amalgamated files still works in a pinch. |
Right. I still think keeping it as one is preferable, the whole may offer more. I missed that chance with CCTZ (wrapped as RcppCCTZ) and now the source linger in three other packages for no benefit. It's somewhat suboptimal. |
Please see https://github.com/simdjson/simdjson/blob/master/doc/basics.md#requirements
There is no use of stderr or abort in the main library as far as we know. If there is, please report it as a bug. |
Follow-up: abort and stderr did get back into the library. The problem is that we did not have tests. I have added such tests this time around so it should stay away. |
I said it last time, I say it again: really really appreciate that. Makes our downstream work a lot easier. |
@eddelbuettel Yes. Removing offending code is easy. Tracking new commits and checking every line to make sure that we don't fall back is harder. This sort of work needs to be automated. |
@lemire Thank you so much! |
simdjson/simdjson#893 has not yet been merged, but I took the new amalgamation for a test drive in my fork. @dcooley I pulled your changes and added I think a portion of the overhead you're seeing is coming from redundant checks. Specifically, if types are confirmed with After making a few other small modifications, RcppSimdJson builds on Linux, Mac, and Windows w/ 100% passing on the tests @dcooley referenced at #10 (comment) via Github Actions. The only R CMD Check Warnings are coming from the undocumented exports. @eddelbuettel , if you want to keep CI to only Travis + Docker, please say the word. |
"word" I looked into the alternatives, and remain content with Travis CI. |
Note that simdjson can be used with or without exceptions. We have two distinct "sub-API" depending on the mode you are using. It is possible to control this with macros and it depends in part on how you compile the library (with or without exceptions). So you definitively do not have to deal with exceptions if you do not want to. It is usually the case the relying on exceptions comes with a performance overhead. |
Yeah there's definitely something wrong with the way I'm using library(jsonify)
library(jsonlite)
library(RcppSimdJson)
library(microbenchmark)
n <- 1e5
df <- data.frame(
x = 1:n
, y = sample( letters, size = n, replace = T)
)
js <- jsonify::to_json(df)
microbenchmark::microbenchmark(
jsonify = { jfy <- jsonify::from_json( js ) },
jsonlite = { jlt <- jsonlite::fromJSON( js ) },
simdjson = { sim <- RcppSimdJson::from_json( js ) },
times = 5
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# jsonify 247.4799 285.1659 335.8276 366.9796 373.1176 406.3950 5
# jsonlite 289.5658 292.8082 335.9969 300.2588 325.5345 471.8174 5
# simdjson 37025.7764 37443.0639 37941.6096 37634.0160 38023.0598 39582.1316 5 |
You are taking 38 seconds to parse about 2MB of data? That's just not possible. It is a bit difficult to reason in milliseconds. It is easier if you break it down in, say, GB/s. I am not a R user, so I tried to guess what the script would generate... and I implemented it in Python: import string
lower_upper_alphabet = string.ascii_letters
import random
def randomletter():
return random.choice(lower_upper_alphabet)
print("[",end="")
for i in range(1,100000):
print("{\"id\":"+str(i)+",\"val\":\""+randomletter()+"\"},",end="")
print("{\"id\":"+str(i)+",\"val\":\""+randomletter()+"\"}",end="")
print("]",end="\n") This generate a crazy file which I called crazy.json. Then I ran a benchmark over it...
So we achieve ~0.75 GB/s which is very low for simdjson, but it is a somewhat adversarial (synthetic example) case. Ok. So let us turn this into milliseconds. My file spans 2288896 bytes. So we have 0.2% of a GB... We need to divide this by 0.75 GB to get the time in second, and then multiply again by 1000 to get the number of milliseconds... 2288896/1000000000./0.75 * 1000 which is 3 milliseconds. So I would expect simdjson to take about 3 milliseconds to parse this file. Of course, there may be overhead that I am not aware of... But there is no possible way that it goes up to 38 seconds. |
With I don't think it is possible to build an input JSON such that it would take 38 seconds to parse 3 MB... I would argue that no non-broken JSON parser could possibly be that slow. |
Let's keep it apples to apples. It is no longer parse speed alone. @dcooley is trying to build a data structure to return to R, and we typically have a few constraints on the way (having a limited set of types is one). So there will be copies, and in phase one there may be extra copies. Such is life. I trust Dave who has has put together amazing stuff (off JSON input) for the mapdeck viz. Let's not quite shoot with real bullets yet. |
@dcooley , I think I have a reasonable workflow for the integer stuff that won't be regretted (too badly) later that I brought up in #13. I just want to confirm that's the desired direction before it invades any code. I'm still trying to grock what the
It's meant to be a clone of Of course, this is the "easy" part; you're tackling a lot more with js <- paste0(readLines("https://github.com/zemirco/sf-city-lots-json/raw/master/citylots.json"),
collapse = "")
pryr::object_size(js)
#> 189 MB
microbenchmark::microbenchmark(
jsonlite = jsonlite::parse_json(js),
simdjson = RcppSimdJson:::.parse_json(js)
,
times = 1,
check = "identical"
)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> jsonlite 4.402949 4.402949 4.402949 4.402949 4.402949 4.402949 1
#> simdjson 1.212326 1.212326 1.212326 1.212326 1.212326 1.212326 1
rcppsimdjson_dir <- "~/Documents/rcppsimdjson/inst/jsonexamples/"
json_file_paths <- dir(rcppsimdjson_dir, pattern = "\\.json$", full.names = TRUE)
names(json_file_paths) <- dir(rcppsimdjson_dir, pattern = "\\.json$")
jsons <- vapply(
json_file_paths,
function(.x) paste0(readLines(.x, warn = FALSE), collapse = ""),
character(1L)
)
bench_marks <- mapply(
function(.x, .y) {
res <- microbenchmark::microbenchmark(
jsonlite = jsonlite::parse_json(.x),
simdjson = RcppSimdJson:::.parse_json(.x)
,
times = 10,
unit = "ms",
check = "identical"
)
cat("********************** ", .y, "\n")
print(res, order = "median")
cat("\n\n")
cbind(data.frame(file_name = .y), as.data.frame(res))
},
jsons,
names(jsons),
SIMPLIFY = FALSE
)
#> ********************** apache_builds.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 0.434170 0.451185 0.5107266 0.4735555 0.592085 0.608644 10
#> jsonlite 1.142164 1.173881 1.3241093 1.2536620 1.310450 2.024043 10
#>
#>
#> ********************** canada.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 7.230551 7.456116 7.875385 7.849235 8.224713 8.706415 10
#> jsonlite 42.288008 43.653252 44.711665 44.029844 46.164748 47.662902 10
#>
#>
#> ********************** citm_catalog.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 3.638028 3.704805 4.300136 3.886545 5.053951 5.899548 10
#> jsonlite 22.725979 22.934923 23.873911 23.696750 24.524041 25.694399 10
#>
#>
#> ********************** github_events.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 0.195300 0.207428 0.2620517 0.2547240 0.280591 0.427534 10
#> jsonlite 0.918447 0.960214 0.9794883 0.9879705 1.010506 1.017453 10
#>
#>
#> ********************** gsoc-2018.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 7.289759 7.891569 8.145152 7.990608 8.276493 9.496555 10
#> jsonlite 13.240555 13.401006 14.031844 13.863335 14.754366 15.289081 10
#>
#>
#> ********************** instruments.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 0.587803 0.589676 0.694585 0.6076295 0.811805 0.892138 10
#> jsonlite 2.040344 2.153086 2.228978 2.1947955 2.283114 2.598088 10
#>
#>
#> ********************** marine_ik.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 12.97164 13.33626 13.82794 13.56599 13.6018 16.91775 10
#> jsonlite 72.88100 73.37122 76.34738 75.26905 77.1654 87.12118 10
#>
#>
#> ********************** mesh.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 2.305229 2.391161 2.618197 2.558874 2.82882 3.046341 10
#> jsonlite 19.313140 19.674100 20.896420 21.152274 21.49891 23.122117 10
#>
#>
#> ********************** mesh.pretty.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 2.875675 2.90280 3.047431 3.013965 3.160781 3.32808 10
#> jsonlite 19.739430 20.92576 25.905227 27.926010 29.914193 30.26666 10
#>
#>
#> ********************** numbers.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 0.339428 0.341582 0.3677705 0.3715365 0.377728 0.417071 10
#> jsonlite 2.934479 2.941936 3.0214713 2.9529865 2.987781 3.581713 10
#>
#>
#> ********************** random.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 2.483639 2.506914 2.905289 2.568928 3.212158 4.380286 10
#> jsonlite 10.048730 10.181996 11.190389 10.640860 12.015381 13.486832 10
#>
#>
#> ********************** twitter.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 1.767523 2.140532 2.225777 2.201807 2.332409 2.853772 10
#> jsonlite 9.487721 9.667403 9.996055 9.958649 10.020133 10.824681 10
#>
#>
#> ********************** twitterescaped.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 1.890421 1.938308 2.100133 1.983586 2.315762 2.384929 10
#> jsonlite 5.442077 5.511250 5.902488 5.638079 6.064283 7.470741 10
#>
#>
#> ********************** update-center.json
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> simdjson 2.194598 2.274436 2.625021 2.625768 2.965949 3.070396 10
#> jsonlite 9.733094 9.853983 10.574512 10.206547 10.945863 12.812568 10
df <- do.call(rbind, bench_marks) |
I've done some tests on my (I've made this test small and quick 'cos I was getting annoyed waiting for it to run each time I did a test. But this result is representative of larger examples) n <- 1e4L
df <- data.frame(x = 1L:n)
js <- jsonify::to_json( df )
microbenchmark::microbenchmark(
jsonify = { res <- jsonify:::rcpp_get_dtypes( js ) },
rcppsimd = { RcppSimdJson:::rcpp_get_dtypes( js ) }
)
# Unit: microseconds
# expr min lq mean median uq max neval
# jsonify 657.069 673.832 715.9397 697.081 730.814 1013.196 100
# rcppsimd 85469.820 86976.375 94386.8532 92468.832 98570.788 124948.499 100 @knapply FYI the |
@dcooley Yea, that seems weird. Have you tried it without using w/o .get()// [[Rcpp::plugins(cpp17)]]
// [[Rcpp::depends(RcppSimdJson)]]
#include <simdjson.h>
#include <simdjson.cpp>
#include <Rcpp.h>
using namespace simdjson;
using Rcpp::_;
typedef std::unordered_set<dom::element_type> DTypes;
template <typename T>
bool is_homogeneous(T x) {
DTypes dtypes;
for (auto value : x) {
dtypes.insert(value.type());
}
return std::size(dtypes) == 1;
}
template <>
bool is_homogeneous<dom::object>(dom::object x) {
DTypes dtypes;
for (auto [key, value] : x) {
dtypes.insert(value.type());
}
return std::size(dtypes) == 1;
}
// [[Rcpp::export(.dtypes)]]
SEXP test() {
auto cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018,
"tire_pressure": [ 40.1, 39.9 ] },
{ "make": "Kia", "model": "Soul", "year": 2012,
"tire_pressure": [ 30.1, 31.0 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999,
"tire_pressure": [ 29.8, 30.0 ] }
] )"_padded;
dom::parser parser;
dom::element cars = parser.parse(cars_json);
return Rcpp::List::create(
_["element"] = is_homogeneous<dom::element>(cars),
_["array"] = is_homogeneous<dom::array>(cars),
_["object"] = is_homogeneous<dom::object>(cars.at(0)),
_["array"] = is_homogeneous<dom::array>(cars.at(0).at("tire_pressure"))
);
}
/*** R
.dtypes()
##> $element
##> [1] TRUE
##>
##> $array
##> [1] TRUE
##>
##> $object
##> [1] FALSE
##>
##> $array
##> [1] TRUE
*/ After walking through and playing with test <- '{"test":[1,[2,[3]]]}'
jsonlite::fromJSON(test)
#> $test
#> $test[[1]]
#> [1] 1
#>
#> $test[[2]]
#> $test[[2]][[1]]
#> [1] 2
#>
#> $test[[2]][[2]]
#> [1] 3
jsonify::from_json(test)
#> $test
#> $test[[1]]
#> [1] 1
#>
#> $test[[2]]
#> [,1]
#> [1,] 2
#> [2,] 3 To me, there's nothing to simplify here, so I also realized that things can be simplified down to matrices, which complicates the integer handling (possible |
Not really a reference, but my rule is, round-trips have to work. So if you simplified down to a matrix, you couldn't then get back to the So I'm using "simplify" to mean, the simplest structure possible, without breaking the original JSON structure. But it looks like you've found an issue |
Sorry! That wasn't what I intended. 🤦♂️😬 Edit: I'll move the example over there. |
I have a deserialization routine that seems to check a lot of boxes (multiple levels of type-strictness and simplification): https://github.com/knapply/rcppsimdjson/tree/feature/deserialize I haven't quite sorted out how to best handle nested data frames. The way jsonify and jsonlite go about it seems different enough that I need to reevaluate. What I'm envisioning is being able to clone I'm sure the code has issues (C++ has been a total uphill battle for me), but the results seem promising. json1 <- readr::read_file(
"~/Documents/rcppsimdjson/inst/jsonexamples/canada.json"
)
json2 <- readr::read_file(
"~/Documents/rcppsimdjson/inst/jsonexamples/gsoc-2018.json"
)
microbenchmark::microbenchmark(
rcppsimdjson1 = RcppSimdJson:::.deserialize_json(json1),
jsonify1 = jsonify::from_json(json1),
jsonlite = jsonlite::fromJSON(json1)
,
times = 3
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rcppsimdjson1 4.474563 5.140195 6.12159 5.805827 6.945103 8.084379 3
#> jsonify1 45.232373 45.339811 47.12165 45.447250 48.066282 50.685314 3
#> jsonlite 462.022187 467.628038 478.05169 473.233888 486.066435 498.898981 3
microbenchmark::microbenchmark(
rcppsimdjson2 = RcppSimdJson:::.deserialize_json(json2),
jsonify2 = jsonify::from_json(json2),
jsonlite2 = jsonlite::fromJSON(json2)
,
times = 3
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rcppsimdjson2 9.390968 10.85924 12.44297 12.32752 13.96897 15.61043 3
#> jsonify2 30.635981 30.80953 33.61642 30.98309 35.10664 39.23019 3
#> jsonlite2 96.001989 98.44159 108.28356 100.88118 114.42435 127.96752 3
This is what it looks like in action ... type_policy <- list(
anything_goes = 0,
ints_as_dbl = 1,
strict = 2
)
int64_opt <- list(
double = 0,
string = 1,
integer64 = 2
)
js <- '[[1,2,3],
["4","5",null],
[1,2,3.3],
[true,false,true],
[10000000000,20000000000,30000000000]]'
RcppSimdJson:::.deserialize_json(js)
#> [,1] [,2] [,3]
#> [1,] "1" "2" "3"
#> [2,] "4" "5" NA
#> [3,] "1" "2" "3.30"
#> [4,] "TRUE" "FALSE" "TRUE"
#> [5,] "10000000000" "20000000000" "30000000000"
RcppSimdJson:::.deserialize_json(js, type_policy = type_policy$ints_as_dbl)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] "4" "5" NA
#>
#> [[3]]
#> [1] 1.0 2.0 3.3
#>
#> [[4]]
#> [1] TRUE FALSE TRUE
#>
#> [[5]]
#> [1] 1e+10 2e+10 3e+10
RcppSimdJson:::.deserialize_json(js, type_policy = type_policy$strict)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] "4" "5" NA
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 1
#>
#> [[3]][[2]]
#> [1] 2
#>
#> [[3]][[3]]
#> [1] 3.3
#>
#>
#> [[4]]
#> [1] TRUE FALSE TRUE
#>
#> [[5]]
#> [1] 1e+10 2e+10 3e+10
RcppSimdJson:::.deserialize_json(js, type_policy = type_policy$strict,
int64_r_type = int64_opt$string)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] "4" "5" NA
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 1
#>
#> [[3]][[2]]
#> [1] 2
#>
#> [[3]][[3]]
#> [1] 3.3
#>
#>
#> [[4]]
#> [1] TRUE FALSE TRUE
#>
#> [[5]]
#> [1] "10000000000" "20000000000" "30000000000"
RcppSimdJson:::.deserialize_json(js, type_policy = type_policy$strict,
int64_r_type = int64_opt$integer64)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] "4" "5" NA
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 1
#>
#> [[3]][[2]]
#> [1] 2
#>
#> [[3]][[3]]
#> [1] 3.3
#>
#>
#> [[4]]
#> [1] TRUE FALSE TRUE
#>
#> [[5]]
#> integer64
#> [1] 10000000000 20000000000 30000000000
RcppSimdJson:::.deserialize_json('[{"id":1,"val":"a"},{"id":2,"val":"b"}]')
#> id val
#> 1 1 a
#> 2 2 b
RcppSimdJson:::.deserialize_json('[{"id":1,"val":"a"},{"id":2,"val":["b","c"]}]')
#> id val
#> 1 1 a
#> 2 2 b, c
RcppSimdJson:::.deserialize_json('[{"id":1,"val":"a"},{"id":2,"val":["b","c"]}]',
json_pointer = '1/val/0')
#> [1] "b" ... and these are the types of nested data frames that still need some thought... x <- data.frame(driver = c("Bowser", "Peach"), occupation = c("Koopa", "Princess"))
x$vehicle <- data.frame(model = c("Piranha Prowler", "Royal Racer"))
x$vehicle$stats <- data.frame(speed = c(55, 56), weight = c(67, 24), drift = c(35, 32))
js <- jsonify::to_json(x)
str(jsonlite::fromJSON(js)) # identical() to jsonify
#> 'data.frame': 2 obs. of 3 variables:
#> $ driver : chr "Bowser" "Peach"
#> $ occupation: chr "Koopa" "Princess"
#> $ vehicle :'data.frame': 2 obs. of 2 variables:
#> ..$ model: chr "Piranha Prowler" "Royal Racer"
#> ..$ stats:'data.frame': 2 obs. of 3 variables:
#> .. ..$ speed : num 55 56
#> .. ..$ weight: num 67 24
#> .. ..$ drift : num 35 32
str(RcppSimdJson:::.deserialize_json(js))
#> 'data.frame': 2 obs. of 3 variables:
#> $ driver : chr "Bowser" "Peach"
#> $ occupation: chr "Koopa" "Princess"
#> $ vehicle :List of 2
#> ..$ :List of 2
#> .. ..$ model: chr "Piranha Prowler"
#> .. ..$ stats:List of 3
#> .. .. ..$ speed : num 55
#> .. .. ..$ weight: num 67
#> .. .. ..$ drift : num 35
#> ..$ :List of 2
#> .. ..$ model: chr "Royal Racer"
#> .. ..$ stats:List of 3
#> .. .. ..$ speed : num 56
#> .. .. ..$ weight: num 24
#> .. .. ..$ drift : num 32 As someone who came to R well after data.table and dplyr came about, the multi-column data frames that jsonify/jsonlite build are completely bizarre to me. There's also this... Warning message:
In data.table::setDT(x) :
Some columns are a multi-column type (such as a matrix column): [3]. setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column. I'm not suggesting that only "enhanced" data frame users be considered, it's more that the power of RcppSimdJson is going to be in the ability to ingest yuge data sets, so being able to hand off to those packages with minimal fidgeting would be nice. With that in mind, If there's a standard (or even legacy?) use-case for them, it'd be helpful to know what that is so we can consider what options best support that. If anyone has thoughts on that, I'd love to hear them. |
The timings are very enticing. And being able to deal with 'simple' structures (but at scale) has total merit. Think |
That's 99% my use-case, but sadly they're not always simple.
The cool thing about simdjson's "JSON pointer" capability is that it will minimize the need for the insanely tedious mapping to custom structures I had to do there. I'll pull the deserialize routine over here. It is not exactly simple (they type dynamism was... rough), but more sets of eyes may help. |
I think as long as the underlying data relationships are maintained, then the R representation shouldn't really matter. So if there is a good way of representing nested JSON objects in a way suitable for |
That's what I'm thinking as well, but I'm wondering if any folks rely on that structure. Just food for thought. Here's a "fairer" benchmark from #17 using a bigger file. # js <- readr::read_file("https://github.com/zemirco/sf-city-lots-json/raw/master/citylots.json")
js <- readr::read_file("~/Documents/citylots.json")
bench::mark(
rcppsimdjson = rcppsimdjson <- RcppSimdJson:::.deserialize_json(js),
jsonify = jsonify <- jsonify:::rcpp_from_json(js, simplify = T, fill_na = F),
jsonlite = jsonlite <- jsonlite:::parse_and_simplify(js, simplifyVector = T, simplifyDataFrame = T, simplifyMatrix = T)
,
filter_gc = FALSE,
check = FALSE
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 rcppsimdjson 895.88ms 895.88ms 1.12 58.6MB 0
#> 2 jsonify 7.49s 7.49s 0.134 104.5MB 0.668
#> 3 jsonlite 37.85s 37.85s 0.0264 369MB 1.24
microbenchmark::microbenchmark(
rcppsimdjson = rcppsimdjson <- RcppSimdJson:::.deserialize_json(js),
jsonify = jsonify <- jsonify:::rcpp_from_json(js, simplify = T, fill_na = F),
jsonlite = jsonlite <- jsonlite:::parse_and_simplify(js, simplifyVector = T, simplifyDataFrame = T, simplifyMatrix = T)
,
times = 1
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rcppsimdjson 887.617 887.617 887.617 887.617 887.617 887.617 1
#> jsonify 6964.756 6964.756 6964.756 6964.756 6964.756 6964.756 1
#> jsonlite 35507.670 35507.670 35507.670 35507.670 35507.670 35507.670 1 I'm not sure how accurate It isn't surprising that the simplification process is the bottleneck, but it's much more than I expected. For comparison, this is what happens without any simplification. bench::mark(
rcppsimdjson = rcppsimdjson <- RcppSimdJson:::.deserialize_json(js, simplify_to = 3),
jsonify = jsonify <- jsonify:::rcpp_from_json(js, simplify = F, fill_na = F),
jsonlite = jsonlite <- jsonlite:::parse_and_simplify(js, simplifyVector = F, simplifyDataFrame = F, simplifyMatrix = F)
,
filter_gc = FALSE,
check = FALSE
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 rcppsimdjson 1.11s 1.11s 0.901 13.1MB 0
#> 2 jsonify 3.27s 3.27s 0.306 13.1MB 0.612
#> 3 jsonlite 6.68s 6.68s 0.150 13.1MB 0.150 |
To move the conversation forward about what the user-facing API should look like, here's a prototype w/ some data.table-inspired functionality (text, url, file download, decompress, etc), but "vecotrized" over multiple strings, URLs, and files. Don't be shy. It's meant to instigate opinions, criticism, discussion etc. (and definitely has bugs) Early prototype.file_extension <- function(x, dot = TRUE, ignore_zip_ext = FALSE) {
if (ignore_zip_ext) {
base_name <- sub("\\.[bgx]z2?$", "", basename(x))
} else {
base_name <- basename(x)
}
captures <- regexpr("(?<!^|[.]|/)[.]([^.]+)$", base_name, perl = TRUE)
out <- rep(NA_character_, length(x))
out[captures > 0L] <- substring(base_name[captures > 0L], captures[captures > 0L])
if (dot) out else substring(out, 2L)
}
.url_prefix <- function(x) {
vapply(x, function(.x) {
if (substring(.x, 1L, 8L) == "https://") {
"https://"
} else if ((prefix <- substring(.x, 1L, 7L)) %in% c("http://", "ftps://", "file://")) {
prefix
} else if (substring(.x, 1L, 6L) == "ftp://") {
"ftp://"
} else {
NA_character_
}
}, character(1L), USE.NAMES = FALSE)
}
.diagnose_input <- function(x, diagnose_type = TRUE) {
init <- list(
input = x,
url_prefix = .url_prefix(x),
file_ext = .file_extension(x)
)
init$compressed <- init$file_ext %in% c(".gz", ".bz", ".bz2", ".xz")
if (diagnose_type) {
if (!anyNA(init$url_prefix)) {
init$type <- "url"
} else if (!anyNA(init$file_ext)) {
init$type <- "file"
} else {
init$type <- "text"
}
}
structure(init, class = "data.frame", row.names = seq_along(x))
}
fparse <- function(input = NULL,
json_pointer = "",
input_type = c("auto", "text", "file", "url"),
empty_array = NULL,
empty_object = NULL,
max_simplify_lvl = c("data_frame", "matrix", "vector", "none"),
type_policy = c("anything_goes", "numbers", "strict"),
int64_opt = c("double", "string", "integer64"),
verbose = FALSE,
temp_dir = tempdir(),
keep_temp_files = FALSE) {
# validate arguments =========================================================
# types ----------------------------------------------------------------------
if (!is.character(json_pointer) || is.na(json_pointer) || length(json_pointer) != 1L) {
stop("`json_pointer=` must be a single, non-`NA` `character`.")
}
if (!is.character(input)) {
stop("`input=` must be a `character`.")
}
if (any(is.na(input)) || any(nchar(input) == 0L)) {
stop("`input=` contains `NA`s or empty strings.")
}
if (!dir.exists(temp_dir)) {
stop("`temp_dir=` does not exist.")
}
# prep options ===============================================================
# max_simplify_lvl -----------------------------------------------------------
if (!is.character(max_simplify_lvl) && !is.numeric(max_simplify_lvl)) {
stop("`max_simplify_lvl` must be of type `character` or `numeric`.")
}
if (is.numeric(max_simplify_lvl)) {
stopifnot(max_simplify_lvl %in% 0:3)
} else { # (is.character(max_simplify_lvl)) {
max_simplify_lvl <- switch(
match.arg(max_simplify_lvl, c("data_frame", "matrix", "vector", "none")),
data_frame = 0L,
matrix = 1L,
vector = 2L,
none = 3L,
stop("Unknown `max_simplify_lvl` argument.")
)
}
# type_policy ----------------------------------------------------------------
if (!is.character(type_policy) && !is.numeric(type_policy)) {
stop("`type_policy` must be of type `character` or `numeric`.")
}
if (is.numeric(type_policy)) {
stopifnot(max_simplify_lvl %in% 0:2)
} else { # if (is.character(type_policy)) {
type_policy <- switch(
match.arg(type_policy, c("anything_goes", "numbers", "strict")),
anything_goes = 0L,
numbers = 1L,
strict = 2L,
stop("Unknown `type_policy` argument.")
)
}
# int64_opt ------------------------------------------------------------------
if (!is.character(int64_opt) && !is.numeric(int64_opt)) {
stop("`int64_opt` must be of type `character` or `numeric`.")
}
if (is.numeric(int64_opt)) {
stopifnot(int64_opt %in% 0:2)
} else { # if (is.character(int64_opt)) {
int64_opt <- switch(
match.arg(int64_opt, c("double", "string", "integer64")),
double = 0L,
string = 1L,
integer64 = 2L,
stop("Unknown `int64_opt` argument.")
)
}
if (int64_opt == 2L && !requireNamespace("bit64", quietly = TRUE)) {
stop('`int64_opt = "integer64", but the {bit64} package is not installed.')
}
# diagnose input_type ========================================================
input_type <- match.arg(input_type, c("auto", "text", "file", "url"))
# auto -----------------------------------------------------------------------
if (input_type == "auto") {
if (any(substring(input, 1L, 1L) %in% c(" ", "{", "[", '"')) || any(substring(input, 1L, 4L) == "null")) {
input_type <- "text"
} else {
diagnosis <- .diagnose_input(input)
input_type <- unique(diagnosis$type)
if (length(input_type) != 1L) {
stop ("`input` should all be of the same `input_type`. Types detected:",
sprintf("\n\t- %s", input_type))
}
seq_input <- seq_along(input)
}
}
# url ------------------------------------------------------------------------
if (input_type == "url") {
for (i in seq_input) {
temp_file <- tempfile(fileext = diagnosis$file_ext[[i]], tmpdir = temp_dir)
switch(
diagnosis$url_prefix[[i]],
"https://" = ,
"ftps://" = ,
"http://" = ,
"ftp://" = download.file(diagnosis$input[[i]], destfile = temp_file, method = getOption("download.file.method", default = "auto"), quiet = !verbose),
"file://" = download.file(diagnosis$input[[i]], destfile = temp_file, method = "internal", quiet = !verbose),
stop("Unknown URL prefix")
)
diagnosis$input[[i]] <- temp_file
diagnosis$type[[i]] <- "file"
}
input_type <- unique(diagnosis$type)
stopifnot(length(input_type) == 1L)
if (!keep_temp_files) {
on.exit(unlink(diagnosis$input), add = TRUE)
}
}
# file -----------------------------------------------------------------------
input_decompressed <- FALSE
if (input_type == "file") {
if (any(diagnosis$compressed)) { # temporary... this can be done w/o materializing R strings in C++ for at least .gz, and Suggests to support others (?)
.input <- vector("character", length = length(input))
input_decompressed <- TRUE
if (verbose) message("Compressed files found. Decompressing...")
for (i in seq_input) {
if (diagnosis$compressed[[i]]) {
decomp_type <- switch(
diagnosis$file_ext[[i]],
".gz" = "gzip",
".bz" = ,
".bz2" = "bzip2",
".xz" = "xz"
,
"unknown"
)
con <- file(diagnosis$input[[i]], open = "rb")
raw_vec <- readBin(con, what = "raw", n = file.size(diagnosis$input[[i]]))
close(con)
.input[[i]] <- memDecompress(raw_vec, type = decomp_type, asChar = TRUE)
}
}
input_type <- "text"
} else {
diagnosis$input <- Sys.glob(diagnosis$input)
}
}
# set names ==================================================================
if (input_type != "text") {
.input <- diagnosis$input
}
if (input_type != "text" || input_decompressed) {
if (length(names(input))) {
names(.input) <- names(input)
} else {
names(.input) <- basename(input)
}
}
# deserialize ================================================================
switch(
input_type,
"text" = RcppSimdJson:::.deserialize_json(
json = if (input_decompressed) .input else input,
json_pointer = json_pointer,
empty_array = empty_array,
empty_object = empty_object,
simplify_to = max_simplify_lvl,
type_policy = type_policy,
int64_r_type = int64_opt
),
"file" = RcppSimdJson:::.load_json(
file_path = .input,
json_pointer = json_pointer,
empty_array = empty_array,
empty_object = empty_object,
simplify_to = max_simplify_lvl,
type_policy = type_policy,
int64_r_type = int64_opt
)
,
stop("Unknown `input_type`.")
)
} files <- dir("~/Documents/rcppsimdjson/inst/jsonexamples/", pattern = "\\.json$", full.names = TRUE, recursive = TRUE)
urls <- c(
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/apache_builds.json",
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/mesh.json",
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/citm_catalog.json",
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/canada.json",
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/twitter.json",
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/github_events.json",
"https://raw.githubusercontent.com/eddelbuettel/rcppsimdjson/master/inst/jsonexamples/gsoc-2018.json"
)
gz_files <- sapply(
files[1:10],
function(.x) {
R.utils::compressFile(
.x, remove = FALSE, FUN = gzfile, ext = "gz",
destname = sprintf("%s/%s%s", tempdir(), basename(.x), ".gz")
)
}, USE.NAMES = FALSE
)
json_text <- c("[1,2,3]",'[4,5,6]')
fparse(json_text)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] 4 5 6
parsed_files <- fparse(files)
names(parsed_files)
#> [1] "apache_builds.json"
#> [2] "canada.json"
#> [3] "citm_catalog.json"
#> [4] "github_events.json"
#> [5] "gsoc-2018.json"
#> [6] "instruments.json"
#> [7] "marine_ik.json"
#> [8] "mesh.json"
#> [9] "mesh.pretty.json"
#> [10] "numbers.json"
#> [11] "random.json"
#> [12] "adversarial.json"
#> [13] "demo.json"
#> [14] "flatadversarial.json"
#> [15] "che-1.geo.json"
#> [16] "che-2.geo.json"
#> [17] "che-3.geo.json"
#> [18] "google_maps_api_compact_response.json"
#> [19] "google_maps_api_response.json"
#> [20] "twitter_api_compact_response.json"
#> [21] "twitter_api_response.json"
#> [22] "repeat.json"
#> [23] "smalldemo.json"
#> [24] "truenull.json"
#> [25] "twitter_timeline.json"
#> [26] "twitter.json"
#> [27] "twitterescaped.json"
#> [28] "update-center.json"
download_and_parse_files <- fparse(urls)
names(download_and_parse_files)
#> [1] "apache_builds.json" "mesh.json" "citm_catalog.json"
#> [4] "canada.json" "twitter.json" "github_events.json"
#> [7] "gsoc-2018.json"
inflate_and_parse <- fparse(gz_files)
names(inflate_and_parse)
#> [1] "apache_builds.json.gz" "canada.json.gz" "citm_catalog.json.gz"
#> [4] "github_events.json.gz" "gsoc-2018.json.gz" "instruments.json.gz"
#> [7] "marine_ik.json.gz" "mesh.json.gz" "mesh.pretty.json.gz"
#> [10] "numbers.json.gz" |
I think we can close this now that 0.1.0 is out. Please re-open with details if something is still amiss. |
I've been working on a prototype
from_json()
functionality here in my fork, which follows the exact same logic asjsonify
A demo of its current output is
Are you happy for me to make a PR so this
from_json()
lives insideRcppSimdJson
, or would you preferRcppSimdJson
to remain as an 'Interface' library, clear of any R clutter?Also tagging in @knapply who has been working on something similar, who may have another implementation?
The text was updated successfully, but these errors were encountered: