Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for JSON export #3309

Merged
merged 57 commits into from Apr 29, 2019

Conversation

@codexnull
Copy link
Contributor

codexnull commented Apr 22, 2019

Change summary:

  • Handle loading empty JSON array.
  • Add a format parameter to export request URL.
  • Update documentation to describe new feature.
  • Add basic test for JSON export.
  • Fix dumb logic error.
  • Delete unused, commented-out code.

This change is Reviewable

Javier Alvarado added 30 commits Mar 21, 2019
Javier Alvarado
… javier/json_export
Javier Alvarado
Javier Alvarado
Javier Alvarado
Javier Alvarado
Javier Alvarado
Javier Alvarado
Javier Alvarado added 9 commits Apr 19, 2019
@codexnull codexnull requested a review from dgraph-io/team Apr 23, 2019
Javier Alvarado added 3 commits Apr 23, 2019
Javier Alvarado
Javier Alvarado
Javier Alvarado
Copy link
Member

martinmr left a comment

Reviewed 15 of 15 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @codexnull)


dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):

				return
			}
		default:

is this right? It looks like this logic will reply to requests with invalid if "format" is not the first key in the array.

If multiple keys are allowed in form, then the default case should be a continue statement. If "format" is not found in r.Form, use the default format.

If only format is allowed in the form, check that r.Form is either empty or only has that key before the iteration.

Am I missing something?


worker/export.go, line 97 at r1 (raw file):

			// trim null character at end
			trimmed := strings.TrimRight(str.Value.(string), "\x00")
			buf.WriteString(strconv.Quote(trimmed))

I am guessing you changed from using buffers to just printing the strings directly because it seemed overkill.

Can you just confirm (maybe with Manish) whether there was a specific reason buffers were being used before?


worker/export.go, line 162 at r1 (raw file):

			fmt.Fprint(bp, "]")
		} else {
			if p.PostingType != pb.Posting_VALUE_LANG {

flip the cases here so that the first branch is

if p.PostingType == pb.Posting_VALUE_LANG {
  //
} else {
  //
}

worker/export.go, line 172 at r1 (raw file):

				// Copying this behavior from RDF exporter,
				// but how is this not data loss?

leave a TODO to investigate.

For now, I think the export process should inform the user that entries were ignored. Maybe the logs are enough but I sense they might be easily missed.


worker/export.go, line 444 at r1 (raw file):

		switch {
		case pk.IsSchema():

Types are not being handled here. There's a different key type for types that should have its own case. It shouldn't be too different from the schema but feel free to ask me for help.

I suppose import has a similar issue but that might be handled in a separate PR.

Copy link
Contributor Author

codexnull left a comment

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @codexnull and @martinmr)


dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

is this right? It looks like this logic will reply to requests with invalid if "format" is not the first key in the array.

If multiple keys are allowed in form, then the default case should be a continue statement. If "format" is not found in r.Form, use the default format.

If only format is allowed in the form, check that r.Form is either empty or only has that key before the iteration.

Am I missing something?

You're right. Since "format" is the only valid parameter, a request containing any other key is invalid.


worker/export.go, line 162 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

flip the cases here so that the first branch is

if p.PostingType == pb.Posting_VALUE_LANG {
  //
} else {
  //
}
</blockquote></details>

Why? Just to to use '==' instead of '!='? I like to put what I think is the more common case first. I don't mind changing it, just curious about the rationale.


<!-- Sent from Reviewable.io -->
Copy link
Member

martinmr left a comment

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @codexnull)


dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

You're right. Since "format" is the only valid parameter, a request containing any other key is invalid.

Then you should probably check that condition is valid before iterating over the map.


worker/export.go, line 162 at r1 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

Why? Just to to use '==' instead of '!='? I like to put what I think is the more common case first. I don't mind changing it, just curious about the rationale.

I think it make the code a bi more readable because it gives more emphasis that VALUE_LANG are a special case that should be handled differently. It also makes the restriction more explicit.

Copy link
Member

manishrjain left a comment

:lgtm: Had a general look. Overall structure looks OK. I did not take a deep look at the logic, i.e. whether the type conversions are correct or not. Leave it to @martinmr to have a deeper look at that and do the final approval.

Reviewed 13 of 15 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @codexnull and @martinmr)


dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Then you should probably check that condition is valid before iterating over the map.

Given what you're looking for, this way of iteration seems misleading. You could do:

if vals, ok := r.Form["format"]; ok {
  // deal with vals.
}

Even if a user sent multiple options, we can just ignore those, instead of being strict about them.


worker/export.go, line 162 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I think it make the code a bi more readable because it gives more emphasis that VALUE_LANG are a special case that should be handled differently. It also makes the restriction more explicit.

Should really be:

if p == REF { } else if p == VALUE_LANG { } else { .. }


worker/export.go, line 120 at r2 (raw file):

// fctToString convert a facet value to a string.
func fctToStr(fct *api.Facet) (string, error) {

Maybe call it facetToStr. In general, not a fan of removing vowels to shorten names.


worker/export.go, line 138 at r2 (raw file):

	var buf bytes.Buffer

	bp := &buf

How about buf := new(bytes.Buffer)? Do we need another variable bp?

Javier Alvarado
@codexnull codexnull requested a review from danielmai as a code owner Apr 25, 2019
Copy link
Contributor Author

codexnull left a comment

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @codexnull, @manishrjain, and @martinmr)


dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Given what you're looking for, this way of iteration seems misleading. You could do:

if vals, ok := r.Form["format"]; ok {
  // deal with vals.
}

Even if a user sent multiple options, we can just ignore those, instead of being strict about them.

OK, I'll change it.

As a general rule, though, I think it's better to do strict validation (especially of external input) for both security and correctness. I've run into cases where programs ran with the wrong configuration due to a typo in an argument name and, because the invalid name was silently ignored, went undetected in production for weeks or months. But I think the risk here is low.


worker/export.go, line 97 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I am guessing you changed from using buffers to just printing the strings directly because it seemed overkill.

Can you just confirm (maybe with Manish) whether there was a specific reason buffers were being used before?

I was using buf.WriteString(fmt.Sprintf(..)) and Manish suggested using fmt.Fprintf(&buf, ...) instead.


worker/export.go, line 172 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
				// Copying this behavior from RDF exporter,
				// but how is this not data loss?

leave a TODO to investigate.

For now, I think the export process should inform the user that entries were ignored. Maybe the logs are enough but I sense they might be easily missed.

Done.


worker/export.go, line 120 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Maybe call it facetToStr. In general, not a fan of removing vowels to shorten names.

Done.


worker/export.go, line 138 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

How about buf := new(bytes.Buffer)? Do we need another variable bp?

Done. Meant to do that earlier and then forgot.

Copy link
Member

martinmr left a comment

Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)


worker/export.go, line 444 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Types are not being handled here. There's a different key type for types that should have its own case. It shouldn't be too different from the schema but feel free to ask me for help.

I suppose import has a similar issue but that might be handled in a separate PR.

I am marking this as resolved. The type import/export can be done in a separate PR.

Copy link
Member

martinmr left a comment

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)

Copy link
Contributor Author

codexnull left a comment

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)


worker/export.go, line 162 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Should really be:

if p == REF { } else if p == VALUE_LANG { } else { .. }

The code for the last two cases (p = VALUE | VALUE_LANG) is the same except for one print statement.

Javier Alvarado added 2 commits Apr 29, 2019
Javier Alvarado
Javier Alvarado
@codexnull codexnull merged commit 62580a1 into master Apr 29, 2019
4 of 5 checks passed
4 of 5 checks passed
code-review/reviewable 5 files, 2 discussions left (danielmai, manishrjain, martinmr)
Details
Blockade (dgraph) TeamCity build finished
Details
CI (dgraph) TeamCity build finished
Details
GolangCI No issues found!
Details
license/cla Contributor License Agreement is signed.
Details
@codexnull codexnull deleted the javier/json_export branch Apr 29, 2019
dna2github added a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
* Parse query args in export request url.

* Handle import of empty JSON arrays.

* Add format field to ExportRequest protobuf message.

* Add format arg to ExportOverNetwork().

* Add language tags to JSON live load test.

* Add note about JSON exports to documentation.

* Add export JSON test.

* Refactor out common code from JSON and RDF export tests.

* Try to put all export format related settings in one spot.

* Replace buf.WriteString()s with fmt.Fprintf()s.

* Make toRDF and toJSON methods instead of functions.

* Change all buf.Write* to fmt.Fprint* for consistency.

* Fix logic bug.

* Refactor RDF exporter to share more code with JSON exporter.

* Remove unused function.

* Add test of request arg handling.

* Set HTTP status in response.

* Changes suggested by PR review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.