Random prefixes for blank nodes #8

pietercolpaert · 2013-07-31T18:39:37Z

Problem

I want to use raptor to be able to concatenate RDF files. At this moment, the only reason why this is not possible is because blank nodes are given the same identifiers when converting different files.

Suggested solution

Giving random prefix names per file. This way we can use something like

ls *.rdf | while read file ; do { rapper -o ntriples $file ; } done > dump.n3

to concatenate a directory of rdf files in one dump.

smileygingerbread · 2017-04-26T21:46:12Z

Any progress on this issue? I'm having the same problem. Right now, with rapper it's impossible to process multiple files for the same graph. This is actually a serious bug, because many datasets are not published as a big monolithic file, rather they're published as a collection of smaller files.

smileygingerbread · 2017-04-26T23:13:11Z

This doesn't seem to work very well. It looks like I'm still getting some bnodes with the same (random) names.

smileygingerbread · 2017-04-27T00:13:59Z

@pietercolpaert did you eventually manage to solve this issue? How?

smileygingerbread · 2017-04-29T04:57:52Z

@pietercolpaert @dajobe another option could be to use a --genid=... command-line argument instead. It doesn't require implementing any new function, but only parsing an additional argument. It would work by replacing the string "genid" with the (random) one define by the user.

What do you guys think?

pietercolpaert · 2017-04-29T08:52:49Z

Now looking at my code 4 years later it looks like random bnode names would of course not always return unique bnode names and merging this to master would not solve any problems. Maybe when we’d concatenate it with the current unixtime in something sub microseconds?

An extra parameter also sounds like a good idea.

I suggest I close this PR and you can open a new issue referencing this PR.

smileygingerbread · 2017-04-29T15:55:25Z

@pietercolpaert

it looks like random bnode names would of course not always return unique bnode names

right now bnodes are generated as _:genIdN where N is a progressive number within each parsed files, so for example _:genId1, _:genId2, _:genId3, ...
Your patch seemed to work because it replace the "genId" string with a random one (one per file), so the new names would become _:rnd-string1, _:rnd-string2, _:rnd-string3, ... which is fine.
The problem though, or at least from my tests, is that parsing different files with your patch would generate the same random string (sometimes). So I don't know if this is a problem with the PRNG seed or whatnot.

An extra parameter also sounds like a good idea.

yeah I like this idea as well. Basically a --genid=... parameter to replace the default "genId" string, such that the bnodes will be named _:new-string1, _:new-string2, _:new-string3, ...
I'd submit a patch for this, but I'm completely alien to raptor source code. I'm willing to help, comment, test, even write some code for this, but somebody who knows the code should guide me through. Anybody who can work on this? Shouldn't (in theory at least) be too much work, just add an additional getopt option to replace the default "genId" value with the one passed from the command line.

you can open a new issue referencing this PR

Can't open issues on this repository

smileygingerbread · 2017-04-30T17:04:58Z

rapper also has an -f argument which looks like the default way to set parser/serialization options.

-f OPTION(=VALUE), --feature OPTION(=VALUE)  
                          Set parser or serializer options
                          Use `-f help' for a list of valid options

so, instead of defining a new --genid=... argument, it should be possible (and probably more appropriate as well) to add a new value to this f command. This solution might even be simpler.

Is there any developer or maintainer reading these comments at all???

sharpaper · 2017-10-21T16:07:08Z

Any progress on this?!

sharpaper · 2017-10-21T16:18:13Z

This is a serious bug, because it makes rapper completely useless for any batch processing. This PR is already 4 years old... is rapper/librdf dead or unmaintained?

dajobe · 2020-09-30T02:20:46Z

Not landing, rapper is not a stream processor for RDF graph merging, use redland and it's rdfproc tool for that.

pietercolpaert · 2020-09-30T10:02:23Z

As rapper is a command line tool that can be installed in, among others, Debian repositories, I still think this would have been a nice feature hidden behind a flag, or even as a separate rapper-concat command, that would help dataset maintainers without having to open up a software development environment just to bring some triples together.

Random prefixes for blank nodes

8d9bd8b

pietercolpaert closed this Apr 29, 2017

pietercolpaert reopened this Apr 30, 2017

dajobe closed this Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random prefixes for blank nodes #8

Random prefixes for blank nodes #8

pietercolpaert commented Jul 31, 2013

smileygingerbread commented Apr 26, 2017

smileygingerbread commented Apr 26, 2017

smileygingerbread commented Apr 27, 2017

smileygingerbread commented Apr 29, 2017

pietercolpaert commented Apr 29, 2017

smileygingerbread commented Apr 29, 2017 •

edited

smileygingerbread commented Apr 30, 2017

sharpaper commented Oct 21, 2017

sharpaper commented Oct 21, 2017 •

edited

dajobe commented Sep 30, 2020

pietercolpaert commented Sep 30, 2020

Random prefixes for blank nodes #8

Random prefixes for blank nodes #8

Conversation

pietercolpaert commented Jul 31, 2013

Problem

Suggested solution

smileygingerbread commented Apr 26, 2017

smileygingerbread commented Apr 26, 2017

smileygingerbread commented Apr 27, 2017

smileygingerbread commented Apr 29, 2017

pietercolpaert commented Apr 29, 2017

smileygingerbread commented Apr 29, 2017 • edited

smileygingerbread commented Apr 30, 2017

sharpaper commented Oct 21, 2017

sharpaper commented Oct 21, 2017 • edited

dajobe commented Sep 30, 2020

pietercolpaert commented Sep 30, 2020

smileygingerbread commented Apr 29, 2017 •

edited

sharpaper commented Oct 21, 2017 •

edited