Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building index on wikidata.truthy.12Nov17 could not parse URI #27

Closed
bumuellp opened this issue Dec 6, 2017 · 2 comments
Closed

Building index on wikidata.truthy.12Nov17 could not parse URI #27

bumuellp opened this issue Dec 6, 2017 · 2 comments
Assignees

Comments

@bumuellp
Copy link

bumuellp commented Dec 6, 2017

I tried to build an index with wikidata.truthy.12Nov17/latest-truthy.non+en.nt as input and got following error message:

bumuellp@beli:~/qlever_stuff/QLever_binaries$ ./IndexBuilderMain -i /nfs/raid5/bumuellp/qlever/wikidat_index_non+en/ -n /nfs/raid1/wikidata/wikidata.truthy.12Nov17/latest-truthy.non+en.nt -a

IndexBuilderMain, version Dec  5 2017 00:23:17

Set locale LC_CTYPE to: en_US.utf8
Wed Dec  6 18:16:29.413 - DEBUG: Configuring STXXL...
Wed Dec  6 18:16:29.414 - DEBUG: done.
Wed Dec  6 18:16:29.414 - INFO:  Making pass over NTriples /nfs/raid1/wikidata/wikidata.truthy.12Nov17/latest-truthy.non+en.nt for vocabulary.
Wed Dec  6 18:16:43.367 - ERROR: BAD INPUT STRING (Illegal URI in : _:genid1299 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .; in /home/bumuellp/qlever_stuff/QLever/src/parser/NTriplesParser.cpp, line 33, function bool NTriplesParser::getLine(std::array<std::__cxx11::basic_string<char>, 3ul>&))

I tested it on a small kb file. It seems to be something about the _:genid1299. If it is at the first or second position in a triplet of an nt file, i get this error message. But if i put it between <>, like <_:genid1299>, the index generation works.

@Buchhold
Copy link
Member

Buchhold commented Dec 6, 2017

Hey, yeah, you're right. Blank nodes are not supported, yet. If you take the underscore inside, you create something that QLever interprets as URI. Properly implementing support for blank nodes shouldn't be too much work, but our use-cases so far always managed without and so far I didn't have the time to read up on them properly. I'll try to do so in time, but I cannot promise an eta, right now.

Altering the input is probably the easierst quick fix, for now.

@niklas88
Copy link
Member

niklas88 commented Dec 7, 2017

According to the RDF TR you should probably replace the blank nodes with proper IRIs of some known format instead of just wrapping them in <>.

That said properly supporting them in the NT format doesn't sound complicated either. My guess would be that one only needs to change this function. However it looks like that is in dire need for better documentation and comments, so might be a bit tiresome to figure out how exactly it works. Also splitting the work into a couple of helper functions might make a lot of sense and with inlining shouldn't come with a performance penalty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants