New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed bulk recognize has assigned uid bug #2850

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@relunctance
Copy link

relunctance commented Dec 28, 2018

rdf eg:

<0x4c4eab0>     <md5>   "a11aa8ed3695e504517ee1c4ea6f22c4"      .

live load work well , but bulk can't recognize that has been assigned uid "0x4c4eab0" .


This change is Reviewable

@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Dec 28, 2018

CLA assistant check
All committers have signed the CLA.

Merge pull request #1 from dgraph-io/master
merge dgraph-io/dgraph
@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Dec 28, 2018

@codexnull Can you please review this? Two things to check for:

  1. Does Live load do this?
  2. The UID that's been handed, we need to make sure that bulk loader gets an appropriate lease from Zero to include that UID.
@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 3, 2019

This change does not work. And any small fixes won't work either. Say user's RDF has a pre-assigned UID X. This leads to 3 cases:

https://photos.app.goo.gl/MMpcjuMT4eEqUomf7

Case 1: We have assigned up to A uids. Bulk loader would in its entire run would assign up to B uids. X >> B, in which case, at the end of bulk loader, we need to ensure that UID lease > X, so we'll need to artificially jump the lease from B -> X, throwing the X - B UIDs on the floor (unused). This is annoying, but not a breaker.

Case 2: X falls between A and B, i.e. A < X < B. In that case, we need to ensure that X is not assigned to any other node in the system. This requires us to keep track of all the Xs (all predefined UIDs), and ensure that they don't get reassigned. This adds complexity.

Case 3: By the time mapper encounters X, A > X. That means X was already reassigned to some other node in the system. We can't go back and undo that assignment. So, bulk loader must now do a pre-pass through all the RDF files, looking for all Xs, before it starts mappers. That's just too much unnecessary work.

Given these cases, we decided not to support this use case, where a user wants to inject preassigned UIDs to bulk loader. In any case, bulk loader assumes a new cluster, so the idea of coming with a pre-assigned UID is wrong, to begin with.

@manishrjain manishrjain closed this Jan 3, 2019

@relunctance

This comment has been minimized.

Copy link

relunctance commented Jan 4, 2019

Thanks .
Your analysis is very good . many problems can be avoided , It's safe to close it.
But It can use third parties to maintain assigned UID , In this way, The user knows what uid is available.
Maybe it only applies to new clusters, not existing ones to expand group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment