Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of JSON-LD framing for larger documents #248

Open
jindrichmynarz opened this issue May 29, 2018 · 9 comments
Open

Performance of JSON-LD framing for larger documents #248

jindrichmynarz opened this issue May 29, 2018 · 9 comments

Comments

@jindrichmynarz
Copy link
Contributor

I have a larger JSON-LD document (24 MB expanded). Framing it gets stuck with 1 CPU fully used (little memory is used). I have a few questions:

  • What kinds of sizes of JSON-LD documents is the framing tested to work with?
  • How can I diagnose why the framing gets stuck? Is there a "verbose" mode?
  • Is there any configuration that can help processing larger JSON-LD documents?
@davidlehn
Copy link
Member

It's possible you're the first to try framing with docs that size. Our use cases have been smaller docs than that. Perhaps someone else in the community has tried larger docs?

If you're running in node, a good first step would be to start up in debug mode and hook in the chrome dev tools. Then you can do a quick profile and maybe it'll be obvious some code has gone exponential.

Maybe also write a quick test using the ruby implementation to see if it has the same problems. Perhaps there's some insight in that code on better data structures to use. https://github.com/ruby-rdf/json-ld

Is the data available somewhere for others to test?

@jindrichmynarz
Copy link
Contributor Author

Thanks for the hints! I'm running the JSON-LD framing in Node via the jsonld-cli.

The data is unfortunately internal and thus unavailable. I can however try generating synthetic data of similar size and structure to see whether it suffers from the same problems.

@davidlehn
Copy link
Member

Another thing to try is to cut data size in half, and in half again, etc and check timing. I'm guessing it's not going to be a linear performance graph.

What is the structure of your dataset? If it's a collection of many similar small items, and just fails when the number of them is large, should be fairly easy to make similar test with algorithmically generating test data set of any size. If it's some social graph like thing, where the links are the problem, maybe harder to simulate.

@marek-dudas
Copy link

I encountered a similar framing problem where even 200kB document might be enough to have to wait for several tens of seconds. Below is example data and four frames. Processing all four of them takes about 70 seconds in Chrome on an average Core i5. I thought I was doing something wrong, but if @jindrichmynarz also thinks framing might be slow, maybe there actually is something suboptimal in the algorithm? Processing similar documents of sizes up to 150kB takes just a few seconds, maybe the problem is higher amount of interlinking in this one, but I haven't investigated that yet.
data.jsonld.txt
frames.txt

@jindrichmynarz, have you found any workarounds in your case?

@jindrichmynarz
Copy link
Contributor Author

I haven't investigated this much more. I tried to frame the larger documents using jsonld-java, but it had similar performance problems and while I tried profiling the code, I haven't found a clear cause of the problems. I think the key question here is to what extent is the poor performance caused by size of input data and by its structure.

@jblemee
Copy link

jblemee commented Feb 5, 2019

Hello,

With this document :

https://gist.github.com/jblemee/41a5c8fa56fffc17896d3b58f42adf43

I got 52% of my cpu time in the function "removedependents" on the playgroung (and in my app)

screenshot from 2019-02-05 11-18-35

Here is the function :

 var removeDependents = function removeDependents(id) {
// get embed keys as a separate array to enable deleting keys in map
var ids = Object.keys(embeds);
for (var _i2 = 0; _i2 < ids.length; ++_i2) {
  var next = ids[_i2];
  if (next in embeds && types.isObject(embeds[next].parent) && embeds[next].parent['@id'] === id) {
    delete embeds[next];
    removeDependents(next);
  }
}
};

The problem is exponential. with 1/4 of the json it works. each time you add a element in the json list it's kind of double the execution time

@davidlehn
Copy link
Member

With a quick glance, I'm not sure if that _removeEmbed code above is exponential itself (?) but it's being called from code looping over ids, so worst case, it probably is. Hopefully it's possible to optimize.

@happy-dev
Copy link

Did we just broke SoLiD ? :-)

@dlongley
Copy link
Member

dlongley commented Feb 6, 2019

No attempt has been made to optimize the remove embed code -- so I suspect there is much that could be gained. We'd be very happy to accept a PR that improved performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants