-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(bulk): Correctly append all the slices of cbuf while reduce phase #7110
Conversation
✔️ Deploy preview for dgraph-docs ready! 🔨 Explore the source changes: a2d4528 🔍 Inspect the deploy logs: https://app.netlify.com/sites/dgraph-docs/deploys/5fd248d98c35150007061a0a 😎 Browse the preview: https://deploy-preview-7110--dgraph-docs.netlify.app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @vvbalaji-dgraph)
…#7110) The bulk loader was not considering the last slice of cbuf in appendToList function. This change fixes the issue.
…#7110) The bulk loader was not considering the last slice of cbuf in appendToList function. This change fixes the issue.
This bug fix applies to the data keys as well, not just the index. This is the count of nodes per predicate from bulk loader for the 21-million movie data set before and after this PR. Some predicates were previously missing a single node. $ diff -u 1.txt 2.txt
--- 1.txt 2020-12-25 10:40:19.780857171 -0800
+++ 2.txt 2020-12-25 10:39:53.564583223 -0800
@@ -12,7 +12,7 @@
6 collection.films_in_collection
116 collections
321 company.films
- 49 company_role_or_service.companies_performing_this_role_or_service
+ 50 company_role_or_service.companies_performing_this_role_or_service
142 content_rating.country
9 content_rating.minimum_accompanied_age
121 content_rating.minimum_unaccompanied_age
@@ -31,7 +31,7 @@
1 cut.release_region
4 cut.runtime
2 cut.type_of_cut
-3508685 dgraph.type
+3508686 dgraph.type
95043 director.film
10 distribution_medium.films_distributed_in_this_medium
877 distributor.films_distributed
@@ -66,13 +66,13 @@
50 http://www.w3.org/2002/07/owl#inverseOf
240856 initial_release_date
7447 job.films_with_this_crew_job
- 191562 language
+ 191563 language
2157 location.featured_in_films
24 locations
4952 metacritic_id
89073 music
34751 music_contributor.film
-2116315 name
+2116316 name
72998 netflix_id
1 nytimes_id
386 other_companies |
Yes @danielmai, the issue was that the last slice of the buffer was not processed. It has nothing to do with the indices. |
The bulk loader was not considering the last slice of
cbuf
inappendToList
function. In condition
next >= 0 && (next < end || end == -1)
the partend == -1
ensures that the last slice is considered because for the last slice, end is-1
hencenext < end
is not true. This fixes the issue of index not being seen in a bulk loaded data, the index was missing because index key is the last key and this bug causes the last key to get missing.This change is