Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for invoking bazel with args from file #10796

Closed

Conversation

robbertvanginkel
Copy link
Contributor

Fixes #8609. Also see https://groups.google.com/d/msg/bazel-discuss/895-OjU2Z2A/mYlw05rpDgAJ. Allows Bazel to be invoked with all arguments coming from a file rather than using commandline arguments. For example:

$ cat args.txt
build
//...
$ bazel-bin/src/bazel @args.txt
INFO: Invocation ID: 5a582c6a-4a3c-4c16-bff1-619aba804783
...

With the current implementation: only one argfile is allowed, it should be prefixed with an @ and the first argument (argv[1]). Argfiles should be newline separated (\n), one argument per line.

TODO:

  • Decide param file location (tempfile? somewhere in output_base?).
  • Decide command_server.proto changes/semantics when using argfile.
  • Investigate special quoting needs of param file for JVM on Windows.
  • Tests
  • Documentation

Todo: Windows quoting seems to be different and happen in
WindowsEscapeArg somewhere. Need to test.
@robbertvanginkel
Copy link
Contributor Author

👋My first time contributing to the bazel client, C++ isn't my area of expertise. Although I realise the PR isn't in a completely mergeable state (missing tests/documentation), I wanted to open a PR earlier rather than later to ask for feedback (so I don't have to rewrite tests/code etc for code that might need a bit of an overhaul).

I'll leave some comments on the files I have open questions/doubts about, please give it a look!

Comment on lines 263 to +264
const std::string &command, const std::vector<std::string> &command_args,
const std::string &invocation_policy,
bool uses_param_file, const std::string &invocation_policy,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe refactor Communicate to take a const OptionProcessor *option_processor, as the first three arguments are currently:


server->Communicate(
  option_processor.GetCommand(), 
  option_processor.GetCommandArguments(), 
  option_processor.UsingParamFile()

@@ -715,6 +715,20 @@ static void RunBatchMode(

jvm_args_vector.insert(jvm_args_vector.end(), command_arguments.begin(),
command_arguments.end());
if (option_processor.UsingParamFile()) {
blaze_util::Path param_file =
blaze_util::Path(startup_options.output_base).GetRelative("params");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these param files are safe to be written to output_base, or if they need unique locations to support multiple clients using the same output_base.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to dig around the client locking code to sort out how multiple clients would react to this.

Comment on lines +726 to +727
content = blaze_util::Quoted(*i) :
content += '\n' + blaze_util::Quoted(*i);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java argument file format takes all whitespace as separate arguments, so we need to appropriately quote each.
https://docs.oracle.com/javase/9/tools/java.htm#JSWOR-GUID-4856361B-8BFD-4964-AE84-121F5F6CF111

Comment on lines +276 to +282
std::string Quoted(const std::string s)
{
std::ostringstream ss;
ss << std::quoted( s.c_str() );
return ss.str();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a better way to achieve quoting that java expects the argfile to have.

Comment on lines +61 to +62

string parameter_file = 7;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a oneof with repeated bytes arg = 2;, that way it is clear for the java code which field to parse. Right now the java code attempts to parse field 2, if its empty it attempts to parse this.

I'm not sure where this proto file is used, and if this would be backward compatible change. Any tips on how to do this best welcome.

@aiuto aiuto added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged labels Feb 15, 2020
@aiuto aiuto requested review from philwo and jin February 15, 2020 03:22
@philwo
Copy link
Member

philwo commented Feb 19, 2020

Hi @robbertvanginkel,

thanks for your contribution! This seems like a pretty cool feature to have. I'll try to find you a reviewer for this!

args = builder.build();
} catch (IOException e) {
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to modify the Java code?
I think it would be simpler to open the file and do the expansion in option_processor.cc.
(also skip the change to the proto file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after doing the expansion in the cpp client we still need to send it to the java server. As the server/client communicate over gRPC this still has a message size limit, see #8609 (comment) for some more context.

The only solution I found for that is making sure the client writes the arguments to a file and instructs the server to read that file, for which we'd need a proto change (or implicitly encode this somehow in the existing field for args).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm told that the gRPC limit is entirely artificial and we can just increase it.

Call .maxInboundMessageSize(1<<30) on NettyServerBuilder here:

NettyServerBuilder.forAddress(address).addService(this).directExecutor().build().start();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep a sensible limit on the max inbound message size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michajlo what would you consider a sensible limit? 10/100s of megabytes?

What I like about the file approach is that it doesn't impose another seemingly arbitrarily chosen limit. What I'd like to avoid is ending up with a choice that needs to be increased 6months or a year down the line. Maybe the limit should be configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably 10s? Generous arbitrary limits are nice because they provide some healthy pushback to prevent folks from wandering into the minefield of less well defined limits.

@pcj
Copy link
Member

pcj commented Feb 24, 2020

I don't think we should assume that the bazel client and bazel server are always running on the same machine (and sharing the same filesystem), so changing the command_server.proto is probably not a good idea.

Copy link
Contributor

@michajlo michajlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately given the client-server setup and the misc paths to starting up a server or initiating requests, param files can wind up getting tricky. This might require a larger design discussion to make sure we don't accidentally paint ourselves into any corners. I'm happy to review and help you with the design if you decide to take it on, but I'm also going to offer what might be a more direct route to solving your initial problem over on #8609.

for (const string &arg : arg_vector) {
request.add_arg(arg);
if (uses_param_file) {
blaze_util::Path param_file = output_base_.GetRelative("params");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can definitely be races between clients here - we release the client-side lock right after sending the request, so by the time the file is being parsed on the server it could have been replaced by another file.

@@ -715,6 +715,20 @@ static void RunBatchMode(

jvm_args_vector.insert(jvm_args_vector.end(), command_arguments.begin(),
command_arguments.end());
if (option_processor.UsingParamFile()) {
blaze_util::Path param_file =
blaze_util::Path(startup_options.output_base).GetRelative("params");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to dig around the client locking code to sort out how multiple clients would react to this.

args = builder.build();
} catch (IOException e) {
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably 10s? Generous arbitrary limits are nice because they provide some healthy pushback to prevent folks from wandering into the minefield of less well defined limits.

using_param_file = true;
string filename = args[1].substr(1, args[1].length()-1);
string contents;
if (!blaze_util::ReadFile(filename, &contents)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid io in SplitCommandLine, and generally as much as possible in options processing besides for rc handling. Can this be lifted to a different abstraction layer? This also avoids mutating args, which could get confusing.

@robbertvanginkel
Copy link
Contributor Author

Thanks all for the comments! There's definitely some extra investigation to do here, I agree with @michajlo that it is probably a good idea to have a larger design discussion around it.

However, I think that might take some more time and I would like to have a reasonable resolution for #8609 soon. The alternative suggested (a --build_file option similar to the existing --query_file) seems like a pretty straightforward implementation (#10856), although a bit less generic. Please give that idea a look, I hope that can be merged a bit sooner and be a stop gap till I can spend the time working out the details for a generic bazel arguments file discussion.

robbertvanginkel added a commit to uber-common/bazel that referenced this pull request Mar 4, 2020
An alternative to bazelbuild#10796 to fix bazelbuild#8609.

As bazelbuild#8609 (comment) points out there's currently a --query_file to read a query pattern from file, but there's no analogous feature available for build/test. Generic parameter file support attempted in bazelbuild#10796 turns out to be a little more complex than expected due to bazel's client/server architecture and likely requires some design document.

This PR would fix help bazelbuild#8609 without rushing into any designs, so we can spend time properly designing generic parameter files while also alleviating the immediate problem with a known/simpler pattern.
bazel-io pushed a commit that referenced this pull request Mar 4, 2020
An alternative to #10796 to fix #8609.

As #8609 (comment) points out there's currently a `--query_file` to read a query pattern from file, but there's no analogous feature available for build/test. Generic parameter file support attempted in #10796 turns out to be a little more complex than expected due to bazel's client/server architecture and likely requires some design document.

This PR would fix help #8609 without rushing into any designs, so we can spend time properly designing generic parameter files while also alleviating the immediate problem with a  known/simpler pattern.

Closes #10856.

PiperOrigin-RevId: 298953350
@jlisee
Copy link
Contributor

jlisee commented Mar 11, 2020

With #10856 landed should this be closed?

@robbertvanginkel
Copy link
Contributor Author

Yes #10856 fixes my immediate issue for #8609. I don't have the time right now to dive into general argfile support and writeup a proposal.

Closing out this PR to reflect I'm not actively working on it. If anyone ends up looking at argfile support in the future, I hope this points them in the right direction :).

apattidb pushed a commit to databricks/bazel that referenced this pull request Apr 17, 2023
An alternative to bazelbuild#10796 to fix bazelbuild#8609.

As bazelbuild#8609 (comment) points out there's currently a `--query_file` to read a query pattern from file, but there's no analogous feature available for build/test. Generic parameter file support attempted in bazelbuild#10796 turns out to be a little more complex than expected due to bazel's client/server architecture and likely requires some design document.

This PR would fix help bazelbuild#8609 without rushing into any designs, so we can spend time properly designing generic parameter files while also alleviating the immediate problem with a  known/simpler pattern.

Closes bazelbuild#10856.

PiperOrigin-RevId: 298953350
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for reading a list of target patterns for from a file
8 participants