Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid json when using jsonproto output in queries #18701

Closed

Conversation

chiragramani
Copy link
Contributor

@chiragramani chiragramani commented Jun 15, 2023

When using bazel query ... --output=jsonproto , the output JSON is not valid as

  1. elements are not comma separated.
  2. the outer containers don't have "[", "]".

This PR fixes the above issue.

@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Performance Issues for Performance teams labels Jun 15, 2023
@chiragramani
Copy link
Contributor Author

cc: @zhengwei143

@zhengwei143 zhengwei143 self-requested a review June 16, 2023 07:45
@meisterT
Copy link
Member

The current output follows the NDJSON (see http://ndjson.org/) format where each line a separate JSON value. The advantage of that is that you can start parsing each line individually

@zhengwei143
Copy link
Contributor

zhengwei143 commented Jun 26, 2023

Rather, the current output is concatenated json instead of ndjson - there isn't any newline separating each target proto.

I also just realized a slight discrepancy: currently --output=proto outputs a QueryResult proto, but --output=jsonproto doesn't include the QueryResult wrapper (making it more like --output=streamed_proto).

EDIT: I would suggest renaming it to --output=streamed_jsonproto to be consistent with --output=streamed_proto. That is because cquery --output=jsonproto outputs a single CqueryResult and aquery --output=jsonproto outputs a single ActionGraphContainer.

Back to query, given the current implementation, we can't nicely wrap the Target(s) in a QueryResult without a larger refactor or introducing more codepaths in QueryCommand#doQuery (which I'd rather not) because query --order_output=no runs the output formatter callback in parallel (e.g. for multiple top level targets):

return new OutputFormatterCallback<Target>() {
@Override
public void processOutput(Iterable<Target> partialResult)
throws IOException, InterruptedException {
for (Target target : partialResult) {
out.write(
jsonPrinter.print(toTargetProtoBuffer(target)).getBytes(StandardCharsets.UTF_8));
}
}
};
}

It is for this same reason that we can't wrap the Target(s) in square brackets [ ... ] as implemented here. I would suggest going with the ndjson format, but trivially fix this by adding a newline after printing the Target message in json.

}
out.write("[".getBytes(StandardCharsets.UTF_8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #18701 (comment), instead of wrapping it in square brackets, follow the ndjson format and append a newline after the json message instead.

Copy link
Contributor

@zhengwei143 zhengwei143 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an add on, I would suggest renaming this to --output=streamed_jsonproto if you follow my other comment to output it in ndjson format.

cquery --output=jsonproto actually outputs only a single CqueryResult (the behavior was wrongly changed recently in 607d0f7 and reverted in dba9e43). So renaming it as mentioned above would ensure that it isn't inconsistent --output=jsonproto and consistent with --output=streamed_proto of the other cquery/aquery output formats.

Otherwise, if you wish to stick with --output=jsonproto and output a single QueryResult proto, you'll need to find a workaround to first aggregate the targets and wrapping them in the QueryResult before writing (especially in the case of --order_output=no).

@chiragramani
Copy link
Contributor Author

Thank you @zhengwei143 , @meisterT - updated the diff to update jsonproto -> streamed_proto and parse as per the NDJSON specs, please let me know if there is anything else needed from my end.

@zhengwei143
Copy link
Contributor

The changes look good, thanks! Just a few minor comments and we should be good to go.

@chiragramani
Copy link
Contributor Author

The changes look good, thanks! Just a few minor comments and we should be good to go.

let me know once you submit the comments and I will be happy to look into them, thanks.

@@ -20,15 +20,17 @@
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;

/**
* An output formatter that outputs a protocol buffer json representation of a query result and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate anymore since we aren't outputting a query result, but a list of targets in ndjson format. Could you update the comment here w.r.t the new output format?

@@ -1109,15 +1109,15 @@ genrule(
cmd = "echo unused > $(OUTS)",
)
EOF
bazel query --output=jsonproto --noimplicit_deps "//$pkg:bar" > output 2> "$TEST_log" \
bazel query --output=streamed_jsonproto --noimplicit_deps "//$pkg:bar" > output 2> "$TEST_log" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few more targets to the $pkg/BUILD file, query here for //$pkg:* so that we can test that the overall output format is in ndjson, instead of a QueryResult.

@zhengwei143
Copy link
Contributor

let me know once you submit the comments and I will be happy to look into them, thanks.

Of course, I forgot to submit the comments :)

@chiragramani
Copy link
Contributor Author

chiragramani commented Jul 11, 2023

Hey @zhengwei143 , I have updated the PR however, some of the CI jobs failed and I think jq is not available on all platforms - what are your suggestions to go about it - any existing validation utilities that I can make use of which work on all the platforms?

@zhengwei143
Copy link
Contributor

Thanks for the update! Let me check to see whether it is possible to use jq.

@chiragramani
Copy link
Contributor Author

Thanks for the update! Let me check to see whether it is possible to use jq.

I removed jq and tests are all successful. I would like your help with this PR as the changes that went live in 6.3 have a bug where the output isn't in NDJSON format and isn't valid JSON either. I plan to make this change available in 6.4, please let me know if you have any other suggestions for me.

Copy link
Contributor

@zhengwei143 zhengwei143 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, just some minor changes requested and it should be good to go (sorry for not catching that one earlier).

Internally, it seems that there have been some usages of --output=jsonproto that I'll need to migrate. That might take a while but I'll get started while this review is ongoing.

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Aug 10, 2023
@brentleyjones
Copy link
Contributor

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Aug 10, 2023
@iancha1992
Copy link
Member

@bazel-io fork 6.4.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Aug 10, 2023
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Aug 10, 2023
…put=streamed_jsonproto` implementation.

Closes bazelbuild#18701.

PiperOrigin-RevId: 555417403
Change-Id: I30eb06f734188f8511884954f43c5f5b3c0091a3
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Aug 10, 2023
…put=streamed_jsonproto` implementation.

Closes bazelbuild#18701.

PiperOrigin-RevId: 555417403
Change-Id: I30eb06f734188f8511884954f43c5f5b3c0091a3
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Aug 10, 2023
…put=streamed_jsonproto` implementation.

Closes bazelbuild#18701.

PiperOrigin-RevId: 555417403
Change-Id: I30eb06f734188f8511884954f43c5f5b3c0091a3
chiragramani added a commit to uber-common/bazel that referenced this pull request Aug 11, 2023
…put=streamed_jsonproto` implementation.

Closes bazelbuild#18701.

PiperOrigin-RevId: 555417403
Change-Id: I30eb06f734188f8511884954f43c5f5b3c0091a3
iancha1992 added a commit that referenced this pull request Aug 14, 2023
…w `--ouput=streamed_jsonproto` implementation. (#19226)

Fix valid json when using jsonproto output in queries with new
`--ouput=streamed_jsonproto` implementation.

Closes #18701.

Commit
2cd583a

PiperOrigin-RevId: 555417403
Change-Id: I30eb06f734188f8511884954f43c5f5b3c0091a3

Co-authored-by: Chirag Ramani <chirag.ramani@uber.com>
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Performance Issues for Performance teams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants