Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Loaders: Add feature to continue a previous load. #3279

Open
MichelDiz opened this issue Apr 10, 2019 · 2 comments
Open

Improve Loaders: Add feature to continue a previous load. #3279

MichelDiz opened this issue Apr 10, 2019 · 2 comments
Labels
area/bulk-loader Issues related to bulk loading. area/live-loader Issues related to live loading. area/usability Issues with usability and error messages dgraph Issue or PR created by an internal Dgraph contributor. kind/feature Something completely new we should consider. status/accepted We accept to investigate/work on it.

Comments

@MichelDiz
Copy link
Contributor

MichelDiz commented Apr 10, 2019

What you wanted to do

Continue a dataset load from where it stopped, with Live Load or Bulk Load which may have been interrupted by N reasons.

Why that wasn't great, with examples

When an interrupt occurs. And I try to insert the load again, the load start from scratch. This is not desired result. Let's avoid spending time rewriting something that is already in the DB.

@MichelDiz MichelDiz added the kind/feature Something completely new we should consider. label Apr 10, 2019
@MichelDiz MichelDiz changed the title Improve Live Loader: Add feature to continue a previous load. Improve Loaders: Add feature to continue a previous load. Apr 10, 2019
@MichelDiz
Copy link
Contributor Author

MichelDiz commented Apr 11, 2019

IMPORTANT

This issue is not just about duplicate Nodes due to a load retry. You can avoid duplicated nodes by using the --xidmap flag.

e.g:

./dgraph live -f test.rdf,other.rdf.gz -s test.schema --xidmap ./xd

Every time you reuse the XIDMAP mapping files, all previously mapped blank_nodes will be automatically addressed/written to the mapped UID.

However the load will always start from scratch, even though Blank_nodes have already been mapped. This issue is just to create a "checkpoint" feature to avoid spending days rewriting something that is already in the DB.

@codexnull codexnull assigned codexnull and unassigned gitlw Apr 26, 2019
@campoy campoy added area/bulk-loader Issues related to bulk loading. area/live-loader Issues related to live loading. area/usability Issues with usability and error messages labels Sep 13, 2019
@campoy campoy added the status/accepted We accept to investigate/work on it. label Sep 13, 2019
@minhaj-shakeel
Copy link
Contributor

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

@MichelDiz MichelDiz reopened this Jul 30, 2022
@MichelDiz MichelDiz added the dgraph Issue or PR created by an internal Dgraph contributor. label Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading. area/live-loader Issues related to live loading. area/usability Issues with usability and error messages dgraph Issue or PR created by an internal Dgraph contributor. kind/feature Something completely new we should consider. status/accepted We accept to investigate/work on it.
Development

No branches or pull requests

5 participants