Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing issues in export #3682

Merged
merged 8 commits into from
Jul 19, 2019
Merged

Fixing issues in export #3682

merged 8 commits into from
Jul 19, 2019

Conversation

gitlw
Copy link

@gitlw gitlw commented Jul 17, 2019

fixes #3610 #3478

As described in #3610, there are two main issues in the export implementation

  1. the json output begins with a , with no entry before it
    The current code uses the variable exporter.counter to track the current line number in the output, and tries to insert the separator ",\n" when the line number is not 1.
    However, the problem is that the badger Stream.KeyToList function is called in parallel is multiple goroutines, and they would all be accessing the same shared exporter.counter variable, which means when checking if exporter.counter has the value of 1, other goroutines most likely has updated its value to be above 1, hence causing the first output to have an extra separator ",\n" prepended.

  2. some predicate has weird output that does not confirm to its schema. For example, the name predicate has the type string, but the output would contain entries of the name predicate with an uid value.
    Again, the reason is that the KeyToList function is called in parallel, and our current logic uses a shared exporter variable in the function. This PR changes it so that each go routine has its own local exporter variable.

I've verified that after the change, the json and rdf output are no longer messed up, and the exported rdf and json files can be re-imported into dgraph.


This change is Reviewable

@gitlw gitlw requested review from manishrjain and a team as code owners July 17, 2019 22:54
Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @gitlw and @manishrjain)


worker/export.go, line 554 at r1 (raw file):

		separator = []byte(",\n")
	case "rdf":
		// the separator for RDF should be empty since the toRDF function already

minor: comments with uppercase and periods at the end.

Also for the comments below.


worker/export.go, line 573 at r1 (raw file):

			if kv.Version == 1 { // only insert separator for data
				if hasDataBefore {

we are accessing hasBefore from multiple go routines without any sort of synchronization. Is that safe?

Copy link
Author

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)


worker/export.go, line 573 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

we are accessing hasBefore from multiple go routines without any sort of synchronization. Is that safe?

The Stream.Send function is only called from a single go routine. So it should be fine.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @manishrjain)


worker/export.go, line 573 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

The Stream.Send function is only called from a single go routine. So it should be fine.

OK. then it :lgtm:

@gitlw gitlw merged commit deff61c into master Jul 19, 2019
@gitlw gitlw deleted the gitlw/fix_export branch July 20, 2019 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Dgraph exports incorrect data in JSON and RDF formats.
2 participants