Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow merging Args objects #6230

Open
keith opened this issue Sep 25, 2018 · 17 comments
Open

Allow merging Args objects #6230

keith opened this issue Sep 25, 2018 · 17 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: feature request

Comments

@keith
Copy link
Member

keith commented Sep 25, 2018

Currently if you want to support persistent workers, you must pass a single Args object to your run action. This args file is read and passed as the WorkRequest to the persistent worker. Any other arguments are passed to the persistent worker as launch arguments.

If you build up your arguments in multiple steps into multiple Args objects, there is no way to combine them to end up passing them all in a single object.

Feature requests: what underlying problem are you trying to solve with this feature?

Currently rules_swift builds up 2 Args objects, one here and another here it then passes both objects to run. It's nice to build these up separately using the Args API, but supporting persistent workers would require combining these, instead of passing them both.

What operating system are you running Bazel on?

macOS

What's the output of bazel info release?

0.17.1

@pauldraper
Copy link
Contributor

A workaround is to include the path to the second arg file as a argument in the first, or and arg file that has paths to both arg files.

@tomlu
Copy link
Contributor

tomlu commented Sep 25, 2018

What's the reason you can't build them up as a single Args object?

@keith
Copy link
Member Author

keith commented Sep 25, 2018

A workaround is to include the path to the second arg file as a argument in the first, or and arg file that has paths to both arg files.

👍

What's the reason you can't build them up as a single Args object?

We could definitely do that, if that's the recommendation that's fine. It would just require us to pass the original args objects to the other function, this seems a bit weird because ideally they shouldn't know about each other like that. Alternatively I could change one to return a list instead of using an Args object and use that with add_all.

I thought since Args are emulating lists, it would make sense to be able to combine them for uses like this.

@tomlu
Copy link
Contributor

tomlu commented Sep 25, 2018

A workaround is to include the path to the second arg file as a argument in the first, or and arg file that has paths to both arg files.

I don't think this can work. The rule author does not know what the path is at analysis time.

Alternatively I could change one to return a list instead of using an Args object and use that with add_all.

If you use lists I imagine that you'd lose a lot of the efficiency benefit, so probably don't do that.

I thought since Args are emulating lists, it would make sense to be able to combine them for uses like this.

The natural API would be to allow nesting, but then I have to add code to respect parameter files during that nesting. This isn't so trivial.

If we merge it raises the question of what happens to the parameter file information from each of the two args objects. One of them wins? Throw both away?

You do know you can pass multiple Args objects to ctx.actions.run and friends? Does that fit the bill?

@keith
Copy link
Member Author

keith commented Sep 25, 2018

If you use lists I imagine that you'd lose a lot of the efficiency benefit, so probably don't do that.

In this case the place I would replace with a list will only ever have 3 arguments, so I'm assuming that would end up ok

If we merge it raises the question of what happens to the parameter file information from each of the two args objects. One of them wins? Throw both away?

Great question, as a consumer I would assume the Args object being merged into would "win" but I think if it's documented well either approach could work.

You do know you can pass multiple Args objects to ctx.actions.run and friends? Does that fit the bill?

For persistent workers this doesn't seem acceptable. The first file is passed as the WorkRequest but the second file is used on launch of the persistent worker, and not for each request. I don't see any documentation on persistent workers, but this is how it appears to work

@pauldraper
Copy link
Contributor

pauldraper commented Sep 25, 2018

I don't think this can work. The rule author does not know what the path is at analysis time.

They can. https://docs.bazel.build/versions/master/skylark/lib/actions.html#write

If we merge it raises the question of what happens to the parameter file information from each of the two args objects. One of them wins? Throw both away?

I imagine concatenate.

You do know you can pass multiple Args objects to ctx.actions.run and friends? Does that fit the bill?

That does, except in @keith's case of workers, where Bazel will send only one (I think?) of the arg files as a work request.

@tomlu
Copy link
Contributor

tomlu commented Sep 25, 2018

They can. https://docs.bazel.build/versions/master/skylark/lib/actions.html#write

Sure if you manage your param files manually then you can know. This creates additional actions and is generally not as efficient as if you let the system manager your param files. If you use Args#use_param_file then the system controls the path.

But as a workaround you could do this.

I imagine concatenate.

That's the nested issue right there. I have to keep track of where you inserted the nested Args object during evaluation of the param file machinery. It gets more complicated than you might think at first glance, that's why we decided not to allow this in the first place.

That does, except in @keith's case of workers, where Bazel will send only one (I think?) of the arg files as a work request.

I wonder if we could support multiple parameter files instead in the worker system? @philwo?

keith added a commit to keith/rules_swift that referenced this issue Sep 27, 2018
Since Args objects cannot be combined and this use case shouldn't be a
performance hotspot, we can instead pass these as a list to better
support persistent workers which require a single Args object.

Related issue upstream bazelbuild/bazel#6230
@keith
Copy link
Member Author

keith commented Oct 2, 2018

I've looked more into this and it looks like I may have had this wrong. This code definitely seems to group the params files and the other arguments differently, allowing you to have multiple files.

@pauldraper since you agreed with my original assumption here can you double check? I've verified that the case that caused me to open this issue wasn't setup to format as a params file, and it looks like changing it to works correctly 🤔

@tomlu
Copy link
Contributor

tomlu commented Oct 2, 2018 via email

@jin
Copy link
Member

jin commented Jan 14, 2019

@laurentlb could you add a priority to this issue, please?

@pauldraper
Copy link
Contributor

I imagine concatenate.

Sorry, I misunderstood the question when I answered this.

I'd suggest adding

args1.add_args(args2)

The parameter file information from args1 is used. The meaning of that seems obvious to me.

My preference would also be for updating the worker protocol to allow more
natural expression of commands rather than adding API that we may not be
able to make sensible.

Sure, great.

What's the reason you can't build them up as a single Args object?

That seems pretty reasonable as well. I honestly don't have a great use case in mind; I just posted the workaround.

@gdeest
Copy link

gdeest commented Jan 17, 2019

I strongly support the original request. The current design of run and run_shell taking a list of string / Args instead of a single Args object feels to me like a workaround to this very issue, where the merging is performed internally by Bazel.

Allowing one to build Args from other Args would make for a much nicer and composable API. Forcing one to build arguments from the start as a single Args object infects rules code and significantly complicates building command-line arguments, especially for long command-lines where different subsets of arguments are contributed by different functions. One is either forced to use lists for this, or to thread the args object through multiple functions, which makes code less declarative overall.

@alanfalloon
Copy link
Contributor

The current design of run and run_shell taking a list of string / Args instead of a single Args object feels to me like a workaround to this very issue, where the merging is performed internally by Bazel.

I disagree with this. The position in the list of arguments is where the param file argument will appear if you use the args.use_param_file features. So this is actually important information in the interface and not just a work-around. I use this feature in some rules to make temporary data files for my actions.

However, I'm not opposed to allowing a merge operation on Args objects. I certainly have some starlark that could have been factored more naturally by returning "args snippets".

That's the nested issue right there. I have to keep track of where you inserted the nested Args object during evaluation of the param file machinery. It gets more complicated than you might think at first glance, that's why we decided not to allow this in the first place.

I think it's reasonable to fail if the param-file features are enabled on Args that are merged. Make people set those options on the "final" Args objects. It is certainly the most conservative way to address this and simplifies the implementation somewhat by not allowing a nested automatic param-file substitution. If needed this can always be relaxed in the future.

@brandjon brandjon added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Build-Language and removed P2 We'll consider working on this in future. (Assignee optional) team-Starlark labels Feb 17, 2021
@brandjon
Copy link
Member

See #8515 for proposed syntax for args merging.

@brandjon brandjon added untriaged team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts and removed team-Build-Language labels Nov 4, 2022
@comius comius added the team-Local-Exec Issues and PRs for the Execution (Local) team label Aug 21, 2023
@comius
Copy link
Contributor

comius commented Aug 21, 2023

I think this issue should be owned by both team-Rules-API and team-Local-Exec.

The solution is not necessary to allow merging of Args. It could also be improvements to how persistent workers are invoked.

@tjgq
Copy link
Contributor

tjgq commented Aug 21, 2023

+1. Using a params file to tell apart per-worker and per-invocation args is, quite frankly, a terrible API. At Google we've had bugs introduced by mistakenly adding a per-worker arg when per-invocation was intended (b/220912711). In the presence of complex argument construction logic, often spread across files and shared between multiple actions, it's not always locally obvious that that's going to happen. Forcing per-worker args to stand on their own and per-invocation args to be in a param file also feels capricious.

The merge API might make mistakes less likely but is still error-prone. I think a better idea (although one that would require a migration) is to allow separation of per-worker and per-invocation args at the ctx.actions.(run|run_shell) level (e.g. a separate worker_args parameter).

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Oct 26, 2024
@tjgq tjgq added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: feature request
Projects
None yet
Development

No branches or pull requests