-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite core parser #26
Comments
We have decided that slightly better prep/preprocessing clientside is a better idea - development branch |
I agree not to pursue this. The parser that is part of the DS2 JSON package is terrible! You may consider using the JSON library engine, assuming that the data structures being transmitted do not require custom mapping (which I don't think they would for the intended purposes here) it would provide the simplest method for digesting. When sending data rows to SAS you could even emulate the schema that SAS uses itself when outputting data to JSON via PROC JSON (for example) |
If you move away from your custom solution to DS2 or JSON library approach,
that would make H54S depended on SAS 9.4 release, making it unusable to
anyone with SAS version less than 9.4. Is it worth it?
…On Tue, 14 Mar 2017 at 22:15 Fried Egg ***@***.***> wrote:
I agree not to pursue this. The parser that is part of the DS2 JSON
package is terrible! You may consider using the JSON library engine,
assuming that the data structures being transmitted do not require custom
mapping (which I don't think they would for the intended purposes here) it
would provide the simplest method for digesting. When sending data rows to
SAS you could even emulate the schema that SAS uses itself when outputting
data to JSON via PROC JSON (for example)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIHy2RLK5FKRvoXhGeTuoHCdJUhW6Th7ks5rlwOKgaJpZM4G7cAD>
.
|
@vnevlev that is a valid point on concern. The DS2 package was not released until 9.4M3 (Jul2015) and the libname was not added until 9.4M4 (Nov2016) which makes it somewhat 'cutting-edge' as far as Base SAS is concerned. |
@FriedEgg @vnevlev Agreeing with both of you here. I think spinning up DS2 each time we want to deserialise a dataset is overkill, and we don't need the full support of the libname engine as we don't need to adhere to strict JSON spec here since we're able to prep our dataset at the frontend. For parsing we're testing a way simpler data step-based solution which was suggested by Bruno Müller as part of this SAS Communities thread. So we're changing a couple of fundamental parser bits here:
2mb of data on the old parser used to take a single thread up to 15 seconds at full blast. Now before ===> (cpu is red, res mem is orange) This looks like a parser performance boost of epic proportions, but I want to do some more real world load testing with jmeter to see what the implications are on JVM & spawner stability. Looking very very good at the moment though - if nothing else, it massively reduces the startup cost of the deserialiser, which tends to be spun up with each request to the server. Anecdotally, to parse the same JSON in Python: with open('data.json') as data_file:
data = json.load(data_file) ...takes 3.5x longer than the new DS method. On the downside, we're replacing our 'masked json' output macro with PR to the development branch coming in the next few days. Been playing with this since October but we've just kicked off a project requiring proper UTF8 support, so... |
That's good news on performance. The only thing to add, PROC JSON wasn't on
9.3, it was on M1 as experimental. I remember trying to use it on 9.3M1 and
wasn't doing a very good job.
…On Sat, 18 Mar 2017, 12:34 Nikola Markovic, ***@***.***> wrote:
@FriedEgg <https://github.com/FriedEgg> @vnevlev
<https://github.com/vnevlev> Agreeing with both of you here. I think
spinning up DS2 each time we want to deserialise a dataset is overkill, and
we don't need the full support of the libname engine as we don't need to
adhere to strict JSON spec here since we're able to prep our dataset at the
frontend.
For parsing we're testing a way simpler data step-based solution which was suggested
by Bruno Müller as part of this SAS Communities thread
<https://communities.sas.com/t5/SAS-Enterprise-Guide/Reading-json-file-into-SAS-using-SAS-enterprise-guide/m-p/205050/highlight/true#M15389>.
So we're changing a couple of fundamental parser bits here:
- different client-side preparation to mask SAS-sensitive chars (ie.
replace \" with "", some other bits), instead of relying on escape()
and unescape() and the urlencode and urldecode SAS functions for
masking & unmasking of special characters
- for the most part getting rid of parameter/macro based processing
and sending data by constructing file upload bodies as part of a multipart
post payload. Parameters are only used for body metadata, not the data
itself. Amongst other things like the potential for full utf8 support, this
change results in a very decent performance boost:
2mb of data on the old parser used to take a single thread up to 15
seconds at full blast. Now
using the ('new') data step method it takes 150ms ish, veeery little CPU
and less memory.
before ===> [image: image]
<https://cloud.githubusercontent.com/assets/11962123/24071822/08e85c4a-0bd3-11e7-9391-17fa6835fcf5.png>
<===
after ===> [image: image]
<https://cloud.githubusercontent.com/assets/11962123/24071840/54e6c76c-0bd3-11e7-9d63-c34b30030bb7.png>
<===
(cpu is red, res mem is orange)
usage
This looks like a parser performance boost of epic proportions, but I want
to do some more real world load testing with jmeter to see what the
implications are on JVM & spawner stability. Looking very very good at the
moment though - if nothing else, it massively reduces the startup cost of
the deserialiser, which tends to be spun up with each request to the server.
On the downside, we're replacing our 'masked json' output macro with PROC
JSON output, which only came about in 9.3. If anyone requires 9.2 support
we can maintain a legacy branch that uses the old output method, it's not a
huge deal.
PR to the development branch coming in the next few days. Been playing
with this since October but we've just kicked off a project requiring
proper UTF8 support, so...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIHy2QE54GU88luBYbh3hZ_A7o6BSxqqks5rm89WgaJpZM4G7cAD>
.
|
Interesting. We're running UTF8 integration tests, and while the majority of it works great, certain characters still trip PROC JSON up. This was the last couple of characters it returned before it errored:
Seems like it's doing a better job, but it might just be that we need better tests... |
False alarm. UTF-8 is complicated 🎱 |
False false alarm, it's not that complicated. You just have to specify |
In the 2.0 release of the adapter (which we just released) we have moved away from parameters and have introduced a dependency on Developers wishing to use h54s to build applications on older versions of SAS will still be able to use v1.0 or v0.11.3 of the adapter. Closing this issue. |
Currently we're loading all macro params into a single-variable work dataset and using prx to work through one 32k chunk at a time. DS2 might give us a way of parallelising the processing of multiple frames when datasets exceed the 32k limit.
Also look at JSON package in 9.4m3
The text was updated successfully, but these errors were encountered: