Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept json=raw() directly to skip rawToChar step on JSON received from curl #35

Closed
MichaelChirico opened this issue Jul 8, 2020 · 3 comments

Comments

@MichaelChirico
Copy link

@MichaelChirico MichaelChirico commented Jul 8, 2020

As identified in this Twitter thread:

https://twitter.com/michael_chirico/status/1280656819606548480

See this Gist:

https://gist.github.com/MichaelChirico/f5e09ab9f5f437bb0286e8a42941a3e1

The performance of fparse is already damn impressive, but let's see if we can't do a mite better 😎

JSON as raw can be retrieved like so:

gist = file.path(
  'https://gist.githubusercontent.com/MichaelChirico',
  'f5e09ab9f5f437bb0286e8a42941a3e1', 'raw',
  'ab5f767b54810b53b30841ffe7f614aa07a32be0', 'presto_json_return.R'
)
charToRaw(tail(readLines(gist), 1L))

IINM from C++ POV this raw vector should just be a subset of a character vector...

@MichaelChirico
Copy link
Author

@MichaelChirico MichaelChirico commented Jul 8, 2020

The potential kibosh for this would be encoding issues, though rawToChar also would not work for that case, so users with encoding issues can do iconv themselves I guess?

@MichaelChirico
Copy link
Author

@MichaelChirico MichaelChirico commented Jul 8, 2020

Still working on adapting my code to use RcppSimdJson & benchmarking, so another musing for the day --

I think a major choke point of my current code is some regular gc()s that are happening, which skipping rawToChar could potentially avert. IINM I am getting rawToChar on a huge string on every batch from GET, then parsing out my data & "discarding" the rather large strings (consisting of JSON objects with maybe 100s or rows and/or columns) which are now in the session's string cache (since rawToChar will do mkChar).

It's also something to keep in mind for benchmarking -- unless this phenomenon is captured, the benefit of dropping rawToChar might be understated.

knapply added a commit that referenced this issue Jul 8, 2020
add raw vector support (plus docs, example, tests)
@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jul 8, 2020

Done in #36

knapply added a commit that referenced this issue Jul 14, 2020
reuse parser for multiple raw vectors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.