Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support module.sc files in subfolders #3213

Merged
merged 73 commits into from
Aug 8, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Jun 17, 2024

This PR implements support for per-subfolder build.sc files, named module.sc. This allows large builds to be split up into multiple build.sc files each relevant to the sub-folder that they are placed in, rather than having a multi-thousand-line build.sc in the root configuring Mill for the entire codebase.

Semantics

  1. The build.sc in the project root and all nested module.sc files are recursively within the Mill directory tree are discovered (TODO ignoring files listed in .millignore), and compiled together

  2. We ignore any subfolders that have their own build.sc file, indicating that they are the root of their own project and not part of the enclosing folder.

  3. Each foo/bar/qux/build.sc file is compiled into a millbuild.foo.bar.qux package object, with the build.sc and module.sc files being compiled into a millbuild package object (rather than a plain object in the status quo)

  4. An object blah extends Module within each foo/bar/qux/build.sc file can be referenced in code via foo.bar.qux.blah, or referenced from the command line via foo.bar.qux.blah

  5. The base modules of module.sc files do not have the MainModule tasks: init, version, clean, etc.. Only the base module of the root build.sc file has those

Design

Uniform vs Non-uniform hierarchy

One design decision here is whether a module.sc file in a subfolder foo/bar/ containing object qux{ def baz } would have their targets referenced via foo.bar.qux.baz syntax, or via some alternative e.g. foo/bar/qux.baz.

A non-uniform hierarchy foo/bar/qux.baz would be similar to how Bazel treats folders v.s. targets non-uniformly foo/bar:qux-baz, and also similar to how external modules in Mill are handled e.g. mill.idea.GenIdea/idea, as well as existing foreign modules. However, it introduces significant user-facing complexity:

  1. What's the difference between foo/bar/qux.baz vs foo/bar.qux.baz or foo/bar/qux/baz?
  2. What query syntax would we use to query targets in all nested module.sc files rather than just the top-level one e.g. __.compile?
  3. Would there be a correspondingly different way of referencing nested module.sc modules and targets in Scala code as well?

Bazel has significant complexity to handle these cases, e.g. query via ... vs :all vs *. It works, but it does complicate the user-facing semantics.

The alternative of a uniform hierarchy also has downsides:

  1. How do you go from a module name e.g. foo.bar.qux.baz to the build.sc or module.sc file in which it is defined?
  2. If a module is defined in both the root build.sc and in a nested module.sc, what happens?

I decided to go with a uniform hierarchy where everything, both in top-level build.sc and in nested module.sc, end up squashed together in a single uniform foo.bar.qux.baz hierarchy.

Package Objects

The goal of this is to try and make modules defined in nested module.sc files "look the same" as modules defined in the root build.sc. There are two possible approaches:

  1. Splice the source code of the various nested module.sc files into the top-level object build. This is possible, but very complex and error prone. Especially when it comes to reporting proper error locations in stack traces (filename/linenumber), this will likely require a custom compiler plugin similar to the LineNumberPlugin we have today

  2. Convert the objects into package objects, such that module tree defined in the root build.sc becomes synonymous with the JVM package tree. While the package objects will cause the compiler to synthesize object package { ... } wrappers, that is mostly hidden from the end user.

I decided to go with (2) because it seemed much simpler, making use of existing language features rather than trying to force the behavior we want using compiler hackery. Although package objects may go away at some point in Scala 3, they should be straightforward to replace with explicit export foo.* statements when that time comes.

Existing Foreign Modules

Mill already supports existing foo.sc files which support targets and modules being defined within them, but does not support referencing them from the command line.

I have removed the ability to define targets and modules in random foo.sc files. We should encourage people to put things in module.sc, since that would allow the user to co-locate the build logic within the folder containing the files it is related to, rather than as a bunch of loose foo.sc scripts. Removing support for modules/targets in foo.sc files greatly simplifies the desugaring of these scripts, and since we are already making a breaking change by overhauling how per-folder module.sc files work we might as well bundle this additional breakage together (rather than making multiple breaking changes in series)

build.sc/module.sc file discovery

For this implementation, I chose to make module.sc files discovered automatically by traversing the filesystem: we recursively walk the subfolders of the root build.sc project, look for any files named module.sc. We only traverse folders with module.sc files to avoid having to traverse the entire filesystem structure every time. Empty module.sc files can be used as necessary to allow module.sc files to be placed deeper in the folder tree

This matches the behavior of Bazel and SBT in discovering their BUILD/build.sbt files, and notably goes against Maven/Gradle which require submodules/subprojects to be declared in the top level build config.

This design has the following characteristics:

  1. In future, if we wish to allow mill invocations from within a subfolder, the distinction between build.sc and module.sc allows us to easily find the "enclosing" project root.

  2. It ensures that any folders containing build.sc/module.sc files that accidentally get included within a Mill build do not end up getting picked up and confusing the top-level build, because we automatically skip any subfolders containing build.sc

  3. Similarly, it ensures that adding a build.sc file "enclosing" an existing project, it would not affect Mill invocations in the inner project, because we only walk to the nearest enclosing build.sc file to find the project root

  4. We do not automatically traverse deeply into sub-folders to discover module.sc files, which means that it should be almost impossible to accidentally pick up module.sc files that happen to be on the filesystem but you did not intend to include in the build

This mechanism should do the right thing 99.9% of the time. For the last 0.1% where it doesn't do the right thing, we can add a .millignore/.config/millignore file to support ignoring things we don't want picked up, but I expect that should be a very rare edge case

Task Resolution

I have aimed to keep the logic in resolve/ mostly intact. The core change is replacing rootModule: BaseModule with baseModules: BaseModuleTree, which provides enough metadata to allow resolveDirectChildren and resolveTransitiveChildren to find BaseModules in sub-folders in addition to Module objects nested within the parent Module. Other than that, the logic should be basically unchanged, which hopefully should mitigate the risk of introducing new bugs

Compatibility

This change is not binary compatible, and the change in the .sc file desugaring is invasive enough we should consider it a major breaking change. This will need to go into Mill 0.12.x

Out-Of-Scope/TODO

  1. Running mill without a subfolder of the enclosing project. Shouldn't be hard to add given this design, but the PR is complicated enough as is that I'd like to leave it for follow up

  2. Error reporting when a module is duplicated in an enclosing object and in a nested module.sc file. Again, probably not hard to add, but can happen in a follow up

Pull request: #3213

@lefou lefou marked this pull request as draft June 18, 2024 08:52
@lefou
Copy link
Member

lefou commented Jun 19, 2024

What are the semantics when mill is run from a sub-directory? Is this handled in any way?

  1. No, Mill will as usual try to handle the directory as project.
  2. Partly, Mill will detect and warn about being executed in a sub-project.
  3. Yes, Mill will find the root and interpret all path selectors with a sub-module prefix.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jun 19, 2024

@lefou I haven't thought about that yet. I'm not sure what the best thing to do is. I'm guessing (2) or (3), just because that's what other build tools seem to do

@lefou
Copy link
Member

lefou commented Jun 19, 2024

@lefou I haven't thought about that yet. I'm not sure what the best thing to do is. I'm guessing (2) or (3), just because that's what other build tools seem to do

We should remind our single-source-(file)-of-truth concept and make split-up projects explicit.

Instead of looking up (guessing) sub-projects, we should explicitly list them in the parent project. I think we previously handled kind-of sub-projects with import @file but broke it later, but some import statement or better some usage directive would be nice. Additionally or alternatively, we could denote the root/parent project in the sub-project, so lookup is faster and more accurate, if we decide to support running from a sub-project is a good idea.

I would strongly recommend to not guess sub-project relations solely on the existence of files. Then we end up with a chaotic and non-reproducible setup (which you can experience with Gradle for example).

@lefou
Copy link
Member

lefou commented Jun 19, 2024

With an explicit sub-project configuration, it would be possible to aggregate multiple stand-alone projects (e.g. by pointing to a location outside the current project or some git submodules). As long as sub-projects don't use any resources of the aggregating project, their build results (in out/) should also be independent and we could just use the separate project-local out-directories for each aggregated project.

So, simply aggregating multiple Mill projects should not require a rebuild of each single project. But it would make re-use of external project easy and lightweight.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jun 19, 2024

I think there are a few orthogonal things here

Aggregating standalone projects

This can be done with or without explicit references between subfolder build.sc files. Arguably it's even simpler without explicit references: aggregation then becomes a matter of a standard filesystem symlink, rather than a change to a custom configuration. But either way it can be done with or without explicit references

The challenge with aggregation is the things that are "global". These include:

  • The mill-build/ meta-build folder
  • .mill-version or .config/mill-version
  • .mill-jvm-opts

There's no general solution for combining these global configs in different standalone projects when aggregating, but maybe some compromise is sufficient to be useful. Bazel has similar issues, and draws an arbitrary line of what works in aggregated projects that works well enough in practice (e.g. BUILD and WORKSPACE files are aggregated, but not .bazelrc).

These are actually similar to the problems involved with treating sub-directories as standalone projects when running mill inside of them. However, this is orthogonal to the question of explicit references between subprojects.

Explicit References Between Subprojects

This is something that different build tools handle differently:

  1. Bazel and its family (Buck, Pants) has no explicit references between subfolder build files
  2. SBT has no explicit references between subproject build files, but optionally you can define things in .scala files and explicitly reference those in your root build.sbt
  3. Gradle has explicit references from the root settings.gradle file on the sub-projects
  4. Maven has explicit references from the root pom.xml file on the sub-projects

So clearly all different approaches can work. I think my current inclination is to go the Bazel/SBT route rather than the Gradle/Maven route. For two reasons:

  1. Unlike handling of the meta-build which is O(1) boilerplate, explicit references to submodules is O(n) boilerplate. This is not a problem when n is small and you only have a few subprojects, but I'd like to try and optimize Mill for when n is large. Maybe not O(1000s) like you see in Bazel projects, but at least O(10s) to O(100s), at which point the boilerplate from the root "registry" importing the subprojects becomes significant (not just lines of code, but also merge conflicts, etc.).

  2. My experience with Bazel is that "put a build file in a subfolder to configure the build for that subfolder" is a very easy workflow for developers to understand. Forcing people to add a reference to a top-level registry maven/gradle-style isn't necessary. Nobody has issues with understanding the "look for enclosing folder with a build file to see where the build config is" workflow, both humans and tools

Existing support for foreign modules

Existing "foreign modules" work, but they cannot serve the purpose of these nested build.sc files: they cannot be referenced from the command-line, and can only be referenced if imported and defined in the root build.sc. While this works, I don't see many people making good use of them: none of com.lihaoyi, scala-cli, coursier, use it, but chipsalliance/chisel does. I think that's for good reason:

  1. There are many ways to organize your foreign modules, and so the relationship between the foreign module code and the subfolders in the projects are arbitrary, which makes it non-obvious how to set them up initially and non-obvious how the read/understand the code later

  2. The boilerplate for importing and registering the foreign module logic in build.sc is tedious and non-obvious. In theory we can already do SBT-style explicit registration of foreign module code in the root build.sc, and it works just as easily as doing so in SBT, but it's so much more work than dumping everything in the same file that people (including myself) don't

In effect, the current way foreign modules and import $files work in Mill is very similar to SBT's "define stuff in Scala files and then use them in the root build.sc" workflow, which I don't think is sufficiently easy. e.g. digging through the chipsalliance/chisel Mill config, it's non-obvious how the code in the various helper traits in their foreign modules correspond to folders on disk, even though I could in theory guess at the naming convention or grep/jump around the references in code and figure it out

My goal for this effort is to solve both these issues:

  1. There should only be one obvious place to put a subfolder build.sc file related to a folder: in that folder. And there should only be one obvious place you go to find the code/config later: in the nearest build.sc file in the enclosing folders

  2. Moving logic into subfolder build.sc files should be easy: create file and copy/paste. In my experience, that workflow in Bazel is a lot easier for non-experts to work with than the workflow in SBT where you need to subsequently register your new logic in the root build.sbt


I only started looking into this a few days ago, so it's still pretty rough. Hopefully it'll become more concrete as the implementation progresses

@lefou
Copy link
Member

lefou commented Jun 25, 2024

@lihaoyi Thanks for you thorough explanations.

I think a link from the sub-project to the root or parent project would be the best solution.

  • It's analogue to the import $meta._ ($meta.mill-build). We could use a import $parent._ ($parent.^) or a using directive //> using mill.parent ...
  • It's explicit
  • We don't need a .millignore
  • It scales
  • It's also more or less copy-and-paste stable. E.g. you can duplicate a sub-directory and use the copy as-is.
  • It also helps to detect the sub-module status when mill is run from that directory, so it's explicit where the root and a potential meta-build and other settings will be looked for.

The only downside of the sub-project idea as a whole is the lookup which we need to do with every Mill invocation.

Maybe, we can settle with a single enabler-option in the root module, so we only look for sub-projects if it is enabled. (//> using mill.scanSubProjects true).

@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 7, 2024

@lolgab I've fixed the resolution logic. Took a bit of surgery, but in the end managed to do it without too much invasive refactoring

@lefou The semantics of import $meta, import $ivy, etc. are unchanged from before. Before this, you could already put these things in various foo.sc files you import $file-ed, and MillBuildRootModule would aggregate all of them to use. We can certainly specify them further and lock down unwanted usage, but I'd like to separate that from this PR.

I think it's probably not worth keeping binary compatibility for this PR. We are planning to break compatibility anyway at some point in the next few months to release 0.12.x, and this PR definitely needs to go into 0.12.x, so this PR can be the first of those breaking changes. We don't need to merge it yet, until we start having further compat-breaking PRs targeting 0.12.x (e.g. Scala 3.x), at which point we can cut a 0.11.x branch for maintenance releases and break bincompat in main. Dogfooding the 0.12.x-RCs, publishing downstream Mill plugins, testing 0.12.x-RC across com-lihaoyi, etc. is going to need to happen anyway, so I don't think it's worth putting in a special effort to keep this one PR binary-compatible when further PRs are going to break stuff anyway

@lihaoyi lihaoyi added the compat-breaker This PR is good to merge, but breaks compatibility and needs to wait till we prepare a major release label Jul 14, 2024
@lefou
Copy link
Member

lefou commented Jul 16, 2024

I wonder how likely is it, that this PR breaks compatibility?

It now needs explicit enablement via a $-import. That means, without it, Mill behavior won't change at all. Also we didn't change any user facing API. That means, this PR should not impact any existing project.

But being able to split up builds without considering any other breaking change is a huge benefit for some of the larger projects I witnessed.

Maybe, we could release this useful feature in a 0.11.x release after all?

The only breaking change I do recognize is the removed foreign module support. We could cut a new release in which we mark that feature as deprecated or encourage users to reach out to us, so we have some better feeling how many projects are affected.

The safest approach would be to cut a binary compatible 0.12, so the ecosystem still works, but we can use the new feature.

As a reminder. Changing Mill versions isn't a problem. Changing ABI is what is costly for the Mill ecosystem, as all plugins need to be prepared and re-released for it.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 16, 2024

It is technically possible to maintain compatibility, but I don't think it's worth it. It would involve duplicating large swathes of code: basically the entire resolution logic, the MillBootstrapRootModule codegen logic, and all associated tests (assuming we actually want to ensure both code paths work!). That is a ton of copy-paste that will really muck up the codebase.

The thing about breaking bincompat with 0.12.x is that we're going to need to do it anyway: Scala 3 support will involve breaking bincompat, bumping uPickle to 4.x will involve breaking bincompat, as will many other changes in the 0.12.x discussion #3209. Given that we're going to need minimum 2 releases anyway (0.12.0-RC1 and 0.12.0 final), it costs us nothing to include this PR as part of 0.12.0-RC1, and saves us a whole lot of trouble trying to maintain compatibility in the codebase

The options are basically:

  1. Release this PR as 0.11.9 under a flag, break bincompat in 0.12.0 with all the other changes
  2. Release this PR as a binary compatible 0.12.0, break bincompat in 0.13.0 with all the other changes
  3. Release this PR together with all the other changes when breaking bincompat in 0.12.0
  4. Fork 0.11.x off of main, have this PR target both branches (e.g. main first then backported to 0.11.x after)

Of these options:

  • I don't like (1) because sneaking in such a huge change in a point release feels wrong. This is a major change to how Mill build scripts are compiled and executed, and maintaining strict compatibility if the flag is not enabled will involve tons of copy-paste

  • I don't like (2) because it forces users to update twice, since 0.12.0 and 0.13.0 both feel like major releases that will need to go through separate validation not just for us but also for every user performing the upgrade

  • For (3), only users interested in trying out the RC need to upgrade twice, the rest just upgrade once when 0.12.0 final is out. This PR is not so special we need to get it into a stable release early, before all the other 0.12.0 changes have landed.

  • (4) would involve duplicating some PRs (e.g. main + backport), but would decouple the main and 0.11.x release cadence, so we can merge stuff into main and backport to 0.11.x asynchronously on-demand

This is probably the point where we should consider cutting a 0.11.x maintenance branch from main, which aims to target 0.12.0. That way we can do whatever we want in main, and backport/cherry-pick whatever we want to 0.11.x (e.g. this PR). Fundamentally the "huge old project" requirements and "small new project" requirements differ, and forking a maintenance branch would make that distinction official and allow each to progress at its own pace. Also, the slower pace of development in 0.11.x would mean we can be more free with duplicating stuff and taking on technical debt to maintain compatibility, since there won't be that much forward development to be impacted

We don't need to do it immediately, but we should plan to do it in the near future (e.g. 31st July?), so we have some time to release 0.12.0-RC1 by Q4 and 0.12.0 final by EOY

@lihaoyi lihaoyi added this to the 0.12.0 milestone Jul 18, 2024
@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 18, 2024

I'm hitting a bincompat issue here with mill.RootModule. I want to get rid of this alias, because it mixes in MainModule, whereas I want the RootModules for nested module.sc files to not mix in MainModule, and so I want to do import mill.main.{RootModuleForeign => RootModule} to pick a version without that mixin.

But if I get rid of the alias, that breaks binary compatibility. Not sure whether there's some way around that.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 18, 2024

For now I just re-enabled MainModule methods in sub-folder module.sc files. We can clean it up further when we break bincompat in 0.13.0

@lefou
Copy link
Member

lefou commented Jul 18, 2024

You can't get rid of a public type in a binary compatible way. But wouldn't it better to use a SubModule instead of RootModule to better match semantics?

@lefou
Copy link
Member

lefou commented Jul 18, 2024

If we consider running mill from submodules a thing, we probably want a way to use resolve, show, clean and all the other commands. Maybe, with the introduction of a SubModule, we can filter them out when found in a nested tree, so resolve on the root level will not show all MainModule-commands on all sub-projects.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 18, 2024

The terminology is a bit mixed up, but Root could mean the root of the file, rather than the root of the entire project. Similarly, submodule could mean modules nested im the same file rather than in subfolders. But I admit it is ambiguous and don't have a solution that doesnt add a bunch of verbosity

You're right about running Mill in subfolders. Not sure what the solution there is, maybe these should be treated differently than normal tasks and always be available regardless of what folder you are in. But the details of that design can probably wait till later, since we'll be breaking compat in 0.13.0 in any case so we will have a chance to tweak things

@DamianReeves
Copy link

I think something is broken with traversal when using the new nested module feature:

❯ ./mill -i showNamed __.sources
[build.sc] [#10] [55/59] compile 
[#10] [info] compiling 1 Scala source to /Users/milluser/code/repos/github/external/finos/morphir/out/mill-build/compile.dest/classes ...
[#10] [info] done compiling
[1/1] showNamed 
1 targets failed
showNamed scala.MatchError:  (of class build.package$)
    mill.resolve.ResolveCore$.$anonfun$resolveDirectChildren0$1(ResolveCore.scala:375)
    scala.util.Either.map(Either.scala:390)
    mill.resolve.ResolveCore$.resolveDirectChildren0(ResolveCore.scala:375)
    mill.resolve.ResolveCore$.resolveDirectChildren0(ResolveCore.scala:362)
    mill.resolve.ResolveCore$.$anonfun$instantiateModule0$1(ResolveCore.scala:212)
    scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:183)
    scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:179)
    scala.collection.immutable.List.foldLeft(List.scala:79)
    mill.resolve.ResolveCore$.instantiateModule0(ResolveCore.scala:205)
    mill.resolve.ResolveCore$.instantiateModule(ResolveCore.scala:195)
    mill.resolve.Resolve$Tasks$.$anonfun$handleResolved$2(Resolve.scala:45)
    scala.collection.immutable.List.map(List.scala:251)
    scala.collection.immutable.List.map(List.scala:79)
    mill.resolve.Resolve$Tasks$.handleResolved(Resolve.scala:42)
    mill.resolve.Resolve.$anonfun$resolveNonEmptyAndHandle$3(Resolve.scala:272)
    scala.util.Either.flatMap(Either.scala:360)
    mill.resolve.Resolve.resolveNonEmptyAndHandle(Resolve.scala:272)
    mill.resolve.Resolve.resolveNonEmptyAndHandle$(Resolve.scala:242)
    mill.resolve.Resolve$Tasks$.resolveNonEmptyAndHandle(Resolve.scala:34)
    mill.resolve.Resolve.$anonfun$resolve0$4(Resolve.scala:226)
    scala.util.Either.map(Either.scala:390)
    mill.resolve.Resolve.$anonfun$resolve0$3(Resolve.scala:225)
    scala.collection.immutable.List.map(List.scala:247)
    scala.collection.immutable.List.map(List.scala:79)
    mill.resolve.Resolve.$anonfun$resolve0$2(Resolve.scala:224)
    scala.collection.immutable.List.map(List.scala:247)
    scala.collection.immutable.List.map(List.scala:79)
    mill.resolve.Resolve.$anonfun$resolve0$1(Resolve.scala:223)
    scala.util.Either.flatMap(Either.scala:360)
    mill.resolve.Resolve.resolve0(Resolve.scala:222)
    mill.resolve.Resolve.resolve0$(Resolve.scala:216)
    mill.resolve.Resolve$Tasks$.resolve0(Resolve.scala:34)
    mill.resolve.Resolve.resolve(Resolve.scala:213)
    mill.resolve.Resolve.resolve$(Resolve.scala:208)
    mill.resolve.Resolve$Tasks$.resolve(Resolve.scala:34)
    mill.main.RunScript$.$anonfun$evaluateTasksNamed$1(RunScript.scala:24)
    scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
    mill.main.RunScript$.evaluateTasksNamed(RunScript.scala:24)
    mill.main.MainModule$.mill$main$MainModule$$show0(MainModule.scala:42)
    mill.main.MainModule.$anonfun$showNamed$1(MainModule.scala:307)
    mill.define.Task$TraverseCtx.evaluate(Task.scala:71)

I get that against this project: https://github.com/finos/morphir/tree/build-changes-plus-cli-introduction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat-breaker This PR is good to merge, but breaks compatibility and needs to wait till we prepare a major release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants