Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurability roadmap: discussion #6431

Closed
gregestren opened this issue Oct 17, 2018 · 27 comments
Closed

Configurability roadmap: discussion #6431

gregestren opened this issue Oct 17, 2018 · 27 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: support / not a bug (process)

Comments

@gregestren
Copy link
Contributor

gregestren commented Oct 17, 2018

This is a non-technical issue just meant as a connection point for discussion, thoughts, questions, concerns, etc. about Bazel's configurability & multiplatform work as prioritized in https://www.bazel.build/roadmaps/configuration.html.

It's also a step toward integrating our project workflow deeper into the Bazel community.

My current thinking is to maintain one of these issues and roadmaps for each year. But I'm open to suggestions for other venues like bazel-dev, etc.

I also want to have a dedicated GitHub issue for each individual roadmap item. As of this writing we're
about halfway there. So individual priorities can be discussed on their own threads and this thread can cover big picture stuff.

@gregestren gregestren self-assigned this Oct 17, 2018
@laurentlb laurentlb added the team-Configurability platforms, toolchains, cquery, select(), config transitions label Oct 18, 2018
@gregestren gregestren added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed category: extensibility > configurability labels Nov 28, 2018
@gregestren
Copy link
Contributor Author

FYI, I've been prototyping a different output path scheme for #6526 (this also affects platform-independent Java compilation in #6527).

The idea is to use a scheme that's more correct and scalable than what we have today. i.e. you can do bazel build //foo:all where different rules in //foo auto-configure themselves for different platforms, different custom flags (no need for --whatever_flag at the command line), and so on, and the following two properties hold:

  1. it doesn't crash
  2. it's usably fast

Neither property is really met by today's path scheme, which is why we limit auto-configuring rules.

At my current pace I'm planning to have some feedback on the viability of my prototype by EOY. I have it working with simple genrules and almost (?) have it working with simple C++ rules. I'd like to scale this up to a "realistic" project. But that quickly gets complicated since most realistic projects have a whole bunch of extra dependencies, cross-language build actions and weird corner cases that defy simple experiments.

Starting 2019 we can focus on getting some path scheme into production (whether this or something more simplistic). So my EOY goal is to have some more data to help guide that decision. And maybe write a big email about it.

This comment should arguably be on #6526. But I wanted to write it here because it enables all the other initiatives on the roadmap and deserves some visibility for that.

@gregestren gregestren changed the title Configurability 2018 roadmap: discussion Configurability roadmap: discussion Jan 28, 2019
@johnynek
Copy link
Member

One question I have about this configurability is how it might relate to building with different versions of a compiler that may be slightly incompatible.

e.g. compiler v10 accepts mostly more than v9, but occasionally features are removed so v9 supports some code v10 does not.

In this world, using v10 may mean a different set of external dependencies selected and it may mean that some targets would need select-like systems to select different code for the v9 than v10.

Would this be in scope for this work?

It is highly relevant to scala (2.11, 2.12, 2.13 each have incompatibilities that mean some sources need this treatment to compile on different versions).

Secondly, scala supports multiple backends. The two most popular being jvm and js. Currently the rules_scala do not support js at all, but it would be interesting to think of the best way to support them. Maybe jvm and js are architectures? I don't know.

@katre
Copy link
Member

katre commented Jan 29, 2019

It sounds like jvm and js are different target platforms (building for a browser vs building for a jvm).

The different compiler versions is a little more tricky. Do users want to use both in the same build, or just pick one and stick with it? Can you give examples of the types of things users would need to change when changing from scala 2.12 to 2.13?

@johnynek
Copy link
Member

especially for open source libraries, you want to support many versions because your users might be using 2.11, 2.12, or 2.13. So, a library wants to be able to build with different versions with a minimum of fuss.

Second use case is migration: you are migrating your monorepo from 2.12 to 2.13. You want to easily have your CI build both versions so you can try to get both green and then through the switch.

@katre
Copy link
Member

katre commented Jan 29, 2019

Sorry, I wasn't clear about what I was asking. I agree that there is definitely a usecase for this.

What I am curious about: what types of changes would a user be making to their scala_library targets when migrating from one version to the next?

@johnynek
Copy link
Member

typically people have some files that have version specific implementations, but export a shim API that works on both versions. Then internal to those files they will call different APIs or use different syntax.

So, the user needs to select different srcs to compile (maybe different dependencies) based on the version of the compiler.

@katre
Copy link
Member

katre commented Jan 29, 2019

I see, thanks for clarifying.

This feels like a case best handled with a new flag in starlark, to select between different compiler versions. Your toolchain can use the flag to decide which specific toolchain to use, and you can provide a config_setting that users can use to declare multiple sources (or dependencies, as needed) based on compiler version.

@gregestren
Copy link
Contributor Author

Quick update: a roadmap refresh is pending: bazelbuild/bazel-website#184. I'll also do per-issue updates on each issue linked from the roadmap over the next days.

Lots of nice convergences starting to happen.

@cgruber
Copy link
Contributor

cgruber commented Jan 10, 2020

Can we get another roadmap refresh?

@gregestren
Copy link
Contributor Author

Yes. I understand the last refresh has been a while and I apologize for that.

Is there any theme you're most interested in?

@cgruber
Copy link
Contributor

cgruber commented Jan 14, 2020

The key things would be:

  1. the pieces that impact building on one platform for a target platform, and having the cache hits match with the same artifact (assuming the actual same output) when built on a different host platform. (e.g., a pure java target built on both linux and a mac being equivalent from a caching perspective)

  2. This project's impact on native android builds (if any) as relates to native code, fat architecture specification, etc.

Note: I realize #2 may not directly be affected, but some of the issues we've found there seem kind of in this vein, and I don't know if the platform configurability system can help with some of the issues there, notably the need to propagate and configure all the NDK (native android) infrastructure merely for consuming a third_party .aar containing native code. I suspect that it's far more on the rules_android side than this project, but I thought I'd mention it in case there is any connection.

@cgruber
Copy link
Contributor

cgruber commented Jan 14, 2020

#1 is the biggest, though, because it means we cannot prime developer-accessible build caches from CI machines, even when the build machine should have no impact on the resulting artifacts.

@gregestren
Copy link
Contributor Author

Thanks, Christian.

Can you point to any good examples of #2? I think the trimming work we're up-prioritizing can speak to that. Importantly, the more specific use cases we can identify the more easily we can prioritize that work. We're trying to articulate priorities in terms of solving tangible user needs (like "make Android builds that set the NDK faster") vs. the abstract and hard-to-calibrate backend work that makes it happen.

Re: #1: I remain personally committed to moving that forward this quarter. I continue to really want to play around with my experimental output path design and give it some actual battle testing. It got put on pause last winter when we had an unexpected temporary loss of head count and I shifted my time to basically keeping the lights on.

We're just about operating at full capacity again and actively figuring out who's focusing on what this year. That's the main reason we haven't updated the roadmap yet. Over the next week or so clearly stated goals will come out of that. Look for the refresh over that time frame.

@gregestren gregestren assigned katre and unassigned katre, gregestren and juliexxia Jan 14, 2020
@gregestren gregestren assigned katre, gregestren and juliexxia and unassigned katre Jan 14, 2020
@cgruber
Copy link
Contributor

cgruber commented Jan 16, 2020

So, to characterize #2 a bit more, a simple description is - why do I need to even add the NDK when all I'm doing is including an .aar that has native code in the final binary? And then when I do, and I specify architectures, why are non-native operations (which contain only cross-platform dex/resources content) scaling slower for each architecture I configure (speaks to re-work).

There are a lot of ways that these situations result in massively slower builds, but fundamentally a pure-java/kotlin android_library containing itself no native code should (a) be built as fast in pure or native contexts (b) should build the same speed when building a thin apk, or a "fat" apk for one architecture, or for N architectures (since they're completely cross-platform artifacts), and (b) have the exact same cacheing hashes (this may be true, I haven't verified this, but we have some ongoing investigation to validate this).

@gregestren
Copy link
Contributor Author

That makes sense, thanks. I agree this invites consultation with android_rules. But they're probably going to want the trimming and inefficiency measurement tools we're working on here in configurability to help support their changes.

@gregestren
Copy link
Contributor Author

FYI: a draft 2020 roadmap is pending at bazelbuild/bazel-website#215. I'm waiting for feedback from core devs before officially pushing it later this week.

Notably, for:

mid 2020 An experimental Bazel mode automatically “trims” build graphs NOT STARTED (#6524)

I can imagine a more limited but practically effective version of this we could target for as soon as March.

This would address "I'm using transitions and deps are building twice and slowing down the build" problems. Ping me if you'd like to discuss more.

@gregestren
Copy link
Contributor Author

@gregestren
Copy link
Contributor Author

Just updated the roadmap. Major current themes:

  • Making Android builds compatible with platform-based toolchain resolution (which also unblocks a major reason C++ builds don't use it by default yet)
  • Convenience aliases for Starlark flags: bazel build //foo --@some_repo@//some/package/flag_defs:my_flag --> bazel build //foo --:my_flag
  • Dedicating real engineer time to better executor caching for actions that don't change when, e.g., the CPU changes

@gregestren
Copy link
Contributor Author

@gregestren
Copy link
Contributor Author

gregestren commented Nov 24, 2021

I haven't updated the roadmap in some time but we are moving forward. I'll just update here:

  • We're focusing most effort on making the platforms API ubiquitous. Which means moving all major rules to --platforms and making this whole migration section obsolete
  • Our group is specifically focused on Android rules, which the current roadmap highlights. This is mostly invisible work now but it is moving forward. You can see some progress here but the reality is most of the current work is migrating projects, which is why the work isn't highly visible
  • It's still going to take time to complete all this platforms work and there are various balls to juggle. Specifically, we'll need to ensure the proper toolchain and platform definitions are available (shouldn't be hard), and that existing Bazel Android projects migrate successfully (which is harder). These are practical realities that will not make this deployable by a simple bit flip
  • We also need to ensure C++ platform & toolchain definitions exist. But C++ should be a lot more ready. If your project uses C++ we strongly encourage you to move it to the --platforms API now by flipping --incompatible_enable_cc_toolchain_resolution. We'll need your help because we can't foresee exactly what platforms and toolchains every user uses.
  • Apple is most complicated because the Apple rules themselves don't have the logic to support platforms. We need to work with the apple_rules owners to accomplish this. It's unclear right now who has the bandwidth to move that forward

I'm sorry this all sounds so complicated, but what it comes down to is when C++, Apple, and Android rules all support platforms, we'll consider this API deployed. C++ and Android already work behind --incompatible_* flags. Apple rules don't yet work. We can't enable C++ platforms by default yet until Android and Apple work, since lots of Apple/Android projects have C++ deps. We're working hard on all this but we ultimately need user support for the inevitable migration challenges.

Elsewhere on the roadmap:

  • I remain keenly interested in pushing through experimental better-caching Java compilation. It remains second priority to platformization. I'd love to get it to a point where interested users can experiment. I think we'll need a few executor hooks to the current experimental code to get to that point.
  • @sdtwigg is doing some great work to make transitions more efficient, cacheable, and correct (i.e. less "action conflict" errors). Avoid different transition output directories when build settings have the same values #14023 covers some of that work. It's ongoing through the end of this year but already transitions that set --cpu should be more efficient by avoiding unnecessary ST-<hash>es. See for example the updated test at 75a16b7. But the best is yet to come.

Always happy to continue discussions with anyone to clarify anything. It's hard to stay on top of all the subtleties, but I really want to reaffirm that progress is happening.

@brentleyjones
Copy link
Contributor

Apple is most complicated because the Apple rules themselves don't have the logic to support platforms. We need to work with the apple_rules owners to accomplish this. It's unclear right now who has the bandwidth to move that forward.

I'm one of the maintainers of rules_apple, and I would love to help out on this front.

@gregestren
Copy link
Contributor Author

Oh, awesome! I'll reach out to you to discuss more. What venue do you prefer?

@brentleyjones
Copy link
Contributor

The Bazel slack works for me. Thanks!

@keith
Copy link
Member

keith commented Nov 25, 2021

Also I believe folks from the apple team at google were working on this in the past. But I'm not sure where that ended up. @allevato @susinmotion

@allevato
Copy link
Member

I haven't been; @kaylathar is the one to talk to about Apple rules' platform effort these days.

@kaylathar
Copy link
Member

Feel feee to ping me on slack - our team is actively working on platform support. You will need to make some changes to take advantage of it as we do more work :)

@katre
Copy link
Member

katre commented Dec 2, 2022

Closing this as out of date, please comment if this is still an issue.

@katre katre closed this as completed Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests