Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Support the --xidmap option in Bulkload* #4917
Previously it was believed that the --store_xids flag had the same behave as
What version of Dgraph are you using?
Have you tried reproducing the issue with the latest release?
Steps to reproduce the issue (command/config used to run Dgraph).
Update: Ignore these steps, go to my last comment below.
Expected behavior and actual result.
It should create an XID folder to be used in later imports via Live loader.
Origin of this issue: https://discuss.dgraph.io/t/bulk-loader-x-option/6115
Okay guys, after some time trying to identify the contexts. I will share my findings.
1 - In fact --store_xids and --xidmap are totally different things.
2 - As there is no documentation or any specific test for both* (actually there are tests for XIDMAP, not the --store_xids). It is suggested that they would behave in the same way by having similar names. But it's not true.
Perhaps the tests have been lost over time, as this feature is very old and is related to RDFs.
It is necessary to update the documentation (I can do this) and also to share the two functions between both loaders in my opinion. Both features available in the two tools would be very good for users.
3 - About the tools.
A feature that we only have in Live load. It is very useful to create a mutation pipeline using Live load. Where we can reuse the blank nodes (and XIDs) with each new data ingestion without having to use the Upsert Block for example.
It is very useful for those who maintain consistent control over the use of Blank nodes naming (and XIDs too, e.g. data coming from RDF triple stores).
Liveloader asks the user to enter a path so that it can be saved in posting lists (I guess it is posting lists, but they are files from Badger). That way you can reuse the XIDs with each new data ingestion just by indicating the location of the previously saved XIDs.
This functionality does not exist in the Bulk loader as my tests concluded.
A feature that exists only in Bulkloader. It takes the XID or blank node and saves it to the same node as a property of it with the predicate name as "xid". e.g.:
This feature is not compatible with --xidmap.
This functionality at first does not seem useful. But I'm sure it's related to the approach on external IDs https://docs.dgraph.io/mutations/#external-ids
It can be useful in this case and we can use Upsert Block. But it is not useful for those who need to ingest large amounts of data. Only small cases.
Thanks for the detailed info @MichelDiz!
So, the --store_xids option in bulkload is working after all. But the original expected behaviour and result problem, to have bulkload create an xid map the same as live loader, is still open.
Should we close this issue and open a new one?