Performance of JSON-LD framing for larger documents #248

jindrichmynarz · 2018-05-29T09:15:19Z

I have a larger JSON-LD document (24 MB expanded). Framing it gets stuck with 1 CPU fully used (little memory is used). I have a few questions:

What kinds of sizes of JSON-LD documents is the framing tested to work with?
How can I diagnose why the framing gets stuck? Is there a "verbose" mode?
Is there any configuration that can help processing larger JSON-LD documents?

davidlehn · 2018-05-29T17:05:19Z

It's possible you're the first to try framing with docs that size. Our use cases have been smaller docs than that. Perhaps someone else in the community has tried larger docs?

If you're running in node, a good first step would be to start up in debug mode and hook in the chrome dev tools. Then you can do a quick profile and maybe it'll be obvious some code has gone exponential.

Maybe also write a quick test using the ruby implementation to see if it has the same problems. Perhaps there's some insight in that code on better data structures to use. https://github.com/ruby-rdf/json-ld

Is the data available somewhere for others to test?

jindrichmynarz · 2018-05-29T19:12:40Z

Thanks for the hints! I'm running the JSON-LD framing in Node via the jsonld-cli.

The data is unfortunately internal and thus unavailable. I can however try generating synthetic data of similar size and structure to see whether it suffers from the same problems.

davidlehn · 2018-05-30T15:10:52Z

Another thing to try is to cut data size in half, and in half again, etc and check timing. I'm guessing it's not going to be a linear performance graph.

What is the structure of your dataset? If it's a collection of many similar small items, and just fails when the number of them is large, should be fairly easy to make similar test with algorithmically generating test data set of any size. If it's some social graph like thing, where the links are the problem, maybe harder to simulate.

marek-dudas · 2018-08-22T18:29:04Z

I encountered a similar framing problem where even 200kB document might be enough to have to wait for several tens of seconds. Below is example data and four frames. Processing all four of them takes about 70 seconds in Chrome on an average Core i5. I thought I was doing something wrong, but if @jindrichmynarz also thinks framing might be slow, maybe there actually is something suboptimal in the algorithm? Processing similar documents of sizes up to 150kB takes just a few seconds, maybe the problem is higher amount of interlinking in this one, but I haven't investigated that yet.
data.jsonld.txt
frames.txt

@jindrichmynarz, have you found any workarounds in your case?

jindrichmynarz · 2018-08-25T14:40:13Z

I haven't investigated this much more. I tried to frame the larger documents using jsonld-java, but it had similar performance problems and while I tried profiling the code, I haven't found a clear cause of the problems. I think the key question here is to what extent is the poor performance caused by size of input data and by its structure.

jblemee · 2019-02-05T10:21:19Z

Hello,

With this document :

https://gist.github.com/jblemee/41a5c8fa56fffc17896d3b58f42adf43

I got 52% of my cpu time in the function "removedependents" on the playgroung (and in my app)

Here is the function :

 var removeDependents = function removeDependents(id) {
// get embed keys as a separate array to enable deleting keys in map
var ids = Object.keys(embeds);
for (var _i2 = 0; _i2 < ids.length; ++_i2) {
  var next = ids[_i2];
  if (next in embeds && types.isObject(embeds[next].parent) && embeds[next].parent['@id'] === id) {
    delete embeds[next];
    removeDependents(next);
  }
}
};

The problem is exponential. with 1/4 of the json it works. each time you add a element in the json list it's kind of double the execution time

davidlehn · 2019-02-05T23:39:54Z

With a quick glance, I'm not sure if that _removeEmbed code above is exponential itself (?) but it's being called from code looping over ids, so worst case, it probably is. Hopefully it's possible to optimize.

happy-dev · 2019-02-06T09:46:10Z

Did we just broke SoLiD ? :-)

dlongley · 2019-02-06T15:09:12Z

No attempt has been made to optimize the remove embed code -- so I suspect there is much that could be gained. We'd be very happy to accept a PR that improved performance.

dlongley mentioned this issue Mar 3, 2019

Explore supporting "@embed": "@first" w3c/json-ld-framing#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of JSON-LD framing for larger documents #248

Performance of JSON-LD framing for larger documents #248

jindrichmynarz commented May 29, 2018

davidlehn commented May 29, 2018

jindrichmynarz commented May 29, 2018

davidlehn commented May 30, 2018

marek-dudas commented Aug 22, 2018

jindrichmynarz commented Aug 25, 2018

jblemee commented Feb 5, 2019

davidlehn commented Feb 5, 2019

happy-dev commented Feb 6, 2019

dlongley commented Feb 6, 2019

Performance of JSON-LD framing for larger documents #248

Performance of JSON-LD framing for larger documents #248

Comments

jindrichmynarz commented May 29, 2018

davidlehn commented May 29, 2018

jindrichmynarz commented May 29, 2018

davidlehn commented May 30, 2018

marek-dudas commented Aug 22, 2018

jindrichmynarz commented Aug 25, 2018

jblemee commented Feb 5, 2019

davidlehn commented Feb 5, 2019

happy-dev commented Feb 6, 2019

dlongley commented Feb 6, 2019