New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed bulk recognize has assigned uid bug #2850

wants to merge 3 commits into
base: master


None yet
4 participants
Copy link

relunctance commented Dec 28, 2018

rdf eg:

<0x4c4eab0>     <md5>   "a11aa8ed3695e504517ee1c4ea6f22c4"      .

live load work well , but bulk can't recognize that has been assigned uid "0x4c4eab0" .

This change is Reviewable


This comment has been minimized.

Copy link

CLAassistant commented Dec 28, 2018

CLA assistant check
All committers have signed the CLA.

Merge pull request #1 from dgraph-io/master
merge dgraph-io/dgraph

This comment has been minimized.

Copy link

manishrjain commented Dec 28, 2018

@codexnull Can you please review this? Two things to check for:

  1. Does Live load do this?
  2. The UID that's been handed, we need to make sure that bulk loader gets an appropriate lease from Zero to include that UID.

This comment has been minimized.

Copy link

manishrjain commented Jan 3, 2019

This change does not work. And any small fixes won't work either. Say user's RDF has a pre-assigned UID X. This leads to 3 cases:

Case 1: We have assigned up to A uids. Bulk loader would in its entire run would assign up to B uids. X >> B, in which case, at the end of bulk loader, we need to ensure that UID lease > X, so we'll need to artificially jump the lease from B -> X, throwing the X - B UIDs on the floor (unused). This is annoying, but not a breaker.

Case 2: X falls between A and B, i.e. A < X < B. In that case, we need to ensure that X is not assigned to any other node in the system. This requires us to keep track of all the Xs (all predefined UIDs), and ensure that they don't get reassigned. This adds complexity.

Case 3: By the time mapper encounters X, A > X. That means X was already reassigned to some other node in the system. We can't go back and undo that assignment. So, bulk loader must now do a pre-pass through all the RDF files, looking for all Xs, before it starts mappers. That's just too much unnecessary work.

Given these cases, we decided not to support this use case, where a user wants to inject preassigned UIDs to bulk loader. In any case, bulk loader assumes a new cluster, so the idea of coming with a pre-assigned UID is wrong, to begin with.

@manishrjain manishrjain closed this Jan 3, 2019


This comment has been minimized.

Copy link

relunctance commented Jan 4, 2019

Thanks .
Your analysis is very good . many problems can be avoided , It's safe to close it.
But It can use third parties to maintain assigned UID , In this way, The user knows what uid is available.
Maybe it only applies to new clusters, not existing ones to expand group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment