Timeout error #3

ariutta · 2014-05-05T23:18:55Z

Hello,

Thanks for releasing this project. It seems really promising. I'm looking into using it to give biologists more options for querying the open-source data from our non-profit research group WikiPathways.org.

Right now, the software works great when I query a small subset of our data, but when I try querying a larger dataset, I get a timeout error:

"Error: Error: ETIMEDOUT\n    at Request.onResponse [as _callback] (/usr/local/share/npm/lib/node_modules/ldf-client/lib/HttpFetcher.js:77:32)\n    at self.callback (/usr/local/share/npm/lib/node_modules/ldf-client/node_modules/request/request.js:129:22)\n    at Request.EventEmitter.emit (events.js:95:17)\n    at null._onTimeout (/usr/local/share/npm/lib/node_modules/ldf-client/node_modules/request/request.js:591:12)\n    at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)"

Since the examples demonstrate querying DBPedia, I know the software should be able handle my data, which is 24.3 MB in size. It's currently stored as JSON-LD in an online Mongo instance here. (Caution: 24.3 MB json file.)

I'm thinking the problem is either

using JSON-LD on Mongo instead of an actual triplestore, or
putting most of the data for each pathway into an array (e.g. the entities array here) is a bad data shape for efficient queries.

I can run this query when using our pre-production SPARQL endpoint as an datasource, so I'm assuming the main problem is that the software is only intended for small datasets when using JSON-LD as the datasource.

Should I be able to use 24MB of JSON-LD as a datasource, or is that outside the intended usage of the software?

Thanks.

The text was updated successfully, but these errors were encountered:

RubenVerborgh · 2014-05-06T07:44:58Z

Hi @ariutta,

I'm very curious to hear about your project. Could you drop me a line?
Perhaps we can help you with the infrastructure or make your project a featured use case.

This problem seems to be a server issue (this issue tracker here is part of the client repository). 24MB is indeed large for a JSON-LD file; I can imagine it takes the server a lot of time to filter all of them. The drawback of JSON-LD is that there are no streaming parsers yet (they do exist for Turtle), so everything has to be loaded in memory before the first triples can be emitted.

So while the volume of triples is indeed not outside of the software's usage, it will not be efficient with JSON-LD. Indexed data structures are better at searching for specific triples, which is what basic LInked Data Fragments need. In that sense, JSON-LD is an example to get you up to speed quickly, but not intended for production use (we should document that).

In particular, we currently have some internal software (that will eventually be released as open source) to host datasets with high performance, based on HDT. We might be able to give you a preview of that.

Best,

Ruben

ariutta · 2014-05-06T18:36:29Z

Hi @RubenVerborgh, email sent. Thanks!

I'll move this ticket to the server repo.

ariutta mentioned this issue May 6, 2014

Timeout error LinkedDataFragments/Server.js#4

Closed

ariutta closed this as completed May 6, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout error #3

Timeout error #3

ariutta commented May 5, 2014

RubenVerborgh commented May 6, 2014

ariutta commented May 6, 2014

Timeout error #3

Timeout error #3

Comments

ariutta commented May 5, 2014

RubenVerborgh commented May 6, 2014

ariutta commented May 6, 2014