Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

Timeout error #3

Closed
ariutta opened this issue May 5, 2014 · 2 comments
Closed

Timeout error #3

ariutta opened this issue May 5, 2014 · 2 comments

Comments

@ariutta
Copy link

ariutta commented May 5, 2014

Hello,

Thanks for releasing this project. It seems really promising. I'm looking into using it to give biologists more options for querying the open-source data from our non-profit research group WikiPathways.org.

Right now, the software works great when I query a small subset of our data, but when I try querying a larger dataset, I get a timeout error:

"Error: Error: ETIMEDOUT\n    at Request.onResponse [as _callback] (/usr/local/share/npm/lib/node_modules/ldf-client/lib/HttpFetcher.js:77:32)\n    at self.callback (/usr/local/share/npm/lib/node_modules/ldf-client/node_modules/request/request.js:129:22)\n    at Request.EventEmitter.emit (events.js:95:17)\n    at null._onTimeout (/usr/local/share/npm/lib/node_modules/ldf-client/node_modules/request/request.js:591:12)\n    at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)"

Since the examples demonstrate querying DBPedia, I know the software should be able handle my data, which is 24.3 MB in size. It's currently stored as JSON-LD in an online Mongo instance here. (Caution: 24.3 MB json file.)

I'm thinking the problem is either

  1. using JSON-LD on Mongo instead of an actual triplestore, or
  2. putting most of the data for each pathway into an array (e.g. the entities array here) is a bad data shape for efficient queries.

I can run this query when using our pre-production SPARQL endpoint as an datasource, so I'm assuming the main problem is that the software is only intended for small datasets when using JSON-LD as the datasource.

Should I be able to use 24MB of JSON-LD as a datasource, or is that outside the intended usage of the software?

Thanks.

@RubenVerborgh
Copy link
Member

Hi @ariutta,

I'm very curious to hear about your project. Could you drop me a line?
Perhaps we can help you with the infrastructure or make your project a featured use case.

This problem seems to be a server issue (this issue tracker here is part of the client repository). 24MB is indeed large for a JSON-LD file; I can imagine it takes the server a lot of time to filter all of them. The drawback of JSON-LD is that there are no streaming parsers yet (they do exist for Turtle), so everything has to be loaded in memory before the first triples can be emitted.

So while the volume of triples is indeed not outside of the software's usage, it will not be efficient with JSON-LD. Indexed data structures are better at searching for specific triples, which is what basic LInked Data Fragments need. In that sense, JSON-LD is an example to get you up to speed quickly, but not intended for production use (we should document that).

In particular, we currently have some internal software (that will eventually be released as open source) to host datasets with high performance, based on HDT. We might be able to give you a preview of that.

Best,

Ruben

@ariutta
Copy link
Author

ariutta commented May 6, 2014

Hi @RubenVerborgh, email sent. Thanks!

I'll move this ticket to the server repo.

@ariutta ariutta closed this as completed May 6, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants