Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for "sub-projects". #1233

Open
KristofferC opened this issue Jun 19, 2019 · 55 comments · Fixed by #1242
Open

Proposal for "sub-projects". #1233

KristofferC opened this issue Jun 19, 2019 · 55 comments · Fixed by #1242

Comments

@KristofferC
Copy link
Sponsor Member

I have had an idea for a while which I call "sub-projects" and we discussed it on the pkg-dev call yesterday. The post here is to summarize that discussion.

  1. What problem does sub-projects try to solve?

    There are cases where we use multiple Project.toml in a package. One common such scenario is for documentation where there is a Project typically containing Documenter.jl and the package (which has a relative path in the Manifest.toml). The documentation Manifest.toml file contains the full resolved state (independent of the "main Manifest.toml). The problem is that, with time, it is very likely that the version of dependencies in the docs Manifest.toml drift away from the version of dependencies in the main Manifest.toml. This is likely not desired since things like doctest might pass with the docs Manifest.toml but might not pass when the main Manifest.toml is used.
    The same applies to having a test specific Project / Manifest but there it is arguably even worse because now you are not sure that the tests that are run are representative for running the package with the main Project.toml active. What we want here is to be able to use the main Manifest.toml, but give some extra additional dependencies that are only used for docs / testing.

    Another problem area is shown by looking at the "model-zoo" for machine learning models in Julia: https://github.com/FluxML/model-zoo. Each model has a separate Project / Manifest and to run a model you set that as the active project and then include the model. The problem here is that each model can potentially use vastly different versions of packages that all models have in common (Flux, NNLib, CuArrays etc). Actually using code from the model zoo then becomes very annoying since it is hard to get your own project in the same state as the models run in. What we want here is to be able to give a set of packages at a fixed version (Flix, NNlib, CuArrays etc), have all models run on those versions, but also have each model add some extra dependencies because it needs to do something special (e.g. read image files).

    So to summarize, the core issue is that there is no way to "incrementally" add a chunk of a dependency graph to an existing project. If you want to add extra dependencies in a scenario, you need a full copy of Project.toml / Manifest.toml and this will eventually lead to divergence between versions in that Manifest and the main-project.

  2. What is a sub-project?

    A sub-project is in essence an incremental addition of packages to an already existing "main-project". There needs to be some way to identify the main-project from the sub-project, and right now, the details for how this is done is not important but we could envision a main-project = ".." entry into the sub-project Project.toml to give a relative path to the main-project which is here one directory above.

    The core property of a sub-project is that when you resolve it, versions for dependencies in the manifest for the main-project are fixed. In other words, the resolved state of a sub-project is only an incremental addition to the existing dependency graph that is set by the main-project. That means that the compat info in a sub-project must be consistent (resolvable) with the existing versions in the main project.

    This would allow us to have a test or documentation project which simply is a sub-project to the main project. Since the version of the dependencies are forced to be the same in the sub-project we know (modulo type piracy and similar issues) that the tests we run in these sub-projects will work with the manifest in the main-project.

  3. Implementation questions:

    1. How should sub-projects be identified?
      Firstly, it is desirable to be able to see that a project is a sub-project "locally" (i.e. by only looking at the directory of the sub-project).
      Thus, we want to have some information in the sub-project to show that it, in fact, is a subproject. One proposal is to have a main-project = $path_to_main_project entry in the Project.toml.

      Relevant for point 3.2 is also if the main project should have some mapping to sub-projects. It feels annoying to have to specify both main-project in the sub-project and a list of sub-projects in the main-project so preferable that can be avoided.

    2. What should resolve do in a main-project in the presence of sub-projects?

      If we re-resolve the main-project (upon e.g. an update), the main Manifest will change. The sub-manifests are now "out of sync" with the main-manifest, so they are potentially in a non-resolvable state. This is bad. One possible solution to this is that, if any sub-projects exist, resolving the main-project also resolves and updates all sub-projects. If any of these resolves fail, the resolve it rejected. That would keep all sub-manifests in sync at all times.

    3. What changes are needed to code-loading?

      Sub-projects are different from main-projects in that they only specify additional dependencies outside the main-project. This has some problems when it comes to the current implementation of code-loading in Julia.
      As an example, in the case where package A is in the main project and package B depending on A is in the sub-project, activating the subproject and loading B will error because we cannot find A in the current project. Code loading needs to know that it should look in the main-project for the UUID to A.

      A related issue is what should go into the sub-manifest.
      There are two choices. Either the full Manifest is stored or only the addition of dependencies that comes from the sub-project is stored.
      In isolation, the latter choice is clearly preferable since it doesn't repeat any redundant information. This might however mean that we need to slightly complicate the code loading to also deal with partial manifests. Since it seems we might need to touch code loading anyway, I think only storing the extra info is the way to go.

@tpapp
Copy link
Contributor

tpapp commented Jun 19, 2019

Relevant for point 3.2 is also if the main project should have some mapping to sub-projects.

A reasonable default could be all **/Project.toml in the directory tree of the original Project.toml. This would work out of the box for documentation sub-projects, and also for the model zoo example.

If necessary, we could have a syntax for excluding from and adding to this, but I would save that for later.

@davidanthoff

This comment has been minimized.

@KristofferC

This comment has been minimized.

@KristofferC
Copy link
Sponsor Member Author

A reasonable default could be all **/Project.toml in the directory tree of the original Project.toml. This would work out of the box for documentation sub-projects, and also for the model zoo example.

It would then be all Project.toml in subdirectories that have a main-project = entry that points to that main-project. It works but since there is no explicit mapping you need to look for them all the time. Might be no problem but if the project contains a huge number of folders and files, it might be slow?

@DilumAluthge
Copy link
Member

Out of curiosity, what is the status on this?

Unfortunately I don’t have time to implement this myself, but if someone else is working on it, I’d be happy to help test a prototype.

@tkf
Copy link
Member

tkf commented Jan 8, 2020

The problem is that, with time, it is very likely that the version of dependencies in the docs Manifest.toml drift away from the version of dependencies in the main Manifest.toml.

I think this is sometimes a desired property. I'm using test/environments/jl10/Manifest.toml for running CI with julia 1.0 and using test/Manifest.toml for the latest julia JuliaFolds/Transducers.jl#116. This is required because many packages started to drop julia < 1.3 due to the new artifacts facility. Another scenario is testing with oldest compatible upstream packages. If all sub-projects have to have shared dependencies, it becomes impossible to test packages in such scenarios.

@KristofferC
Copy link
Sponsor Member Author

Read this now https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html, which seems to be a quite similar thing.

@KristofferC
Copy link
Sponsor Member Author

KristofferC commented Mar 3, 2020

Some updates here after talking with @fredrikekre

  • There should be one Manifest, and that is next to the main project. That collects the full dependency graph.
  • The main project lists the subprojects, subprojects = ["SubProject"].
  • A subproject can only live one level deeper than the main project.
  • A subproject has a normal Project.toml and becomes a subproject if the path above it has a Project.toml with a subprojects entry pointing to it.
  • The resolver collects dependencies from the main project and all sub projects, same for compat, runs the resolver and outputs it to the main Manifest.
  • Code loading would have to be updated to look for the main project in case it doesn't find the uuid for a package at a top-level package, or if it doesn't find a manifest next to the current project.
  • A sub project shouldn't have other sub projects.

@00vareladavid
Copy link
Contributor

The resolver collects dependencies from the main project and all sub projects, same for compat, runs the resolver and outputs it to the main Manifest.

This is the real tricky part. With only one subproject, things work fine. The problem is 2+ subprojects. Say A is the root project and has B and C as subprojects. We want to express "A dependends on B and C, but not simultaneously". The "not simultaneous" will place less restrictions on the resolver, but AFAIU there is no easy way to express that. Have you put any thought into this problem?

@KristofferC
Copy link
Sponsor Member Author

Everything gets resolved in one step. You collect all deps, and all compat and then run the resolver. The result gets put in the main Manifest.toml. I don't get what

We want to express "A dependends on B and C, but not simultaneously".

means.

@00vareladavid
Copy link
Contributor

Say subproject A restricts dependency X to version 1, but subproject B restricts dependency X to version 2. This is not a problem because only one subproject is active at a time, but if you shove everything into the resolver at once, it will error.

Perhaps this will not be a problem in practice?

@KristofferC
Copy link
Sponsor Member Author

KristofferC commented Mar 3, 2020

It is a problem and will not resolve. All your subprojects need to be compatible. One of the major points is to be able to use one manifest for all subprojects.

@00vareladavid
Copy link
Contributor

Gotcha, just making sure. It might be annoying, but it seems like potentially just an edge case. In any case I see no clear solution to that problem.

@davidanthoff
Copy link

I like it! In particular that there is only one Manifest.toml, that will make supporting subprojects sooo much easier in LanguageServer.jl.

  • A subproject can only live one level deeper than the main project.
  • A subproject has a normal Project.toml and becomes a subproject if the path above it has a Project.toml with a subprojects entry pointing to it.

Could the sub-project Project.toml have an entry parentproject in it instead that points to the parent Project.toml, and then the one level deeper constraint could be dropped?

One thing that has turned out really tricky in the LS implementation are situations where the meaning of a file like Project.toml depends on surrounding context. For example, if Project.toml just had a flag in it that indicated whether this is a Project.toml for a package or not, things would be a lot easier than what we do now (where we try to figure the answer to that question out by looking at various other stuff). This here strikes me as a similar situation: it would a lot easier for us if we could just look at the content of a Project.toml, and purely based on the content figure out whether this is a sub-project, rather than having to deduce that from the surrounding files.

@KristofferC
Copy link
Sponsor Member Author

Could the sub-project Project.toml have an entry parentproject in it instead that points to the parent Project.toml, and then the one level deeper constraint could be dropped?

In that case, we would just keep searching upwards, having to define a bidirectional mapping would be too annoying imo. But it seems best to be conservative at first.

This here strikes me as a similar situation: it would a lot easier for us if we could just look at the content of a Project.toml, and purely based on the content figure out whether this is a sub-project, rather than having to deduce that from the surrounding files.

The reason for this is that if you have a set of packages that you want to share a manifest, you just put a Project.toml in the directory above, list the packages as subprojects and you are done. Having to go in into each package and add some subpackage flag everywhere would be annoying.
I don't really think that would be hard to implement.

@goretkin
Copy link
Contributor

goretkin commented Mar 3, 2020

There should be one Manifest

Would this change the current advice about not checking in a Manifest.toml for a "library" package? I mentioned in #1714 (comment) that I think it's beneficial to have, even for a library package, a Manifest.toml for the tests, including for test-specific dependencies. There could also be perf-benchmarking specific dependencies.

@tkf
Copy link
Member

tkf commented Mar 23, 2020

There should be one Manifest

In addition to the concern @goretkin raised, I'd like to mention that this makes it difficult for package authors to reliably test the package with different sets of dependencies (e.g., additionally testing against the oldest compatible dependencies). Supporting multiple upstream versions is important for avoiding fragmentation of the ecosystem.

@KristofferC
Copy link
Sponsor Member Author

that I think it's beneficial to have, even for a library package, a Manifest.toml for the tests, including for test-specific dependencies. There could also be perf-benchmarking specific dependencies.

You would have a sub-project for test and one for benchmarks. All of this would resolve into one manifest.

I'd like to mention that this makes it difficult for package authors to reliably test the package with different sets of dependencies (e.g., additionally testing against the oldest compatible dependencies).

Why? You would need to do something along the lines of pkg> resolve --strategy=oldest before you run the tests.

@goretkin
Copy link
Contributor

You would have a sub-project for test and one for benchmarks. All of this would resolve into one manifest.

If I'm understanding correctly, my response is that I don't want the Manifest.toml to be resolved from scratch. I would like to specify (at least partially) the exact dependencies.

@jlperla
Copy link

jlperla commented Sep 28, 2020

If I understand what is being proposed: Another usecase of this is for sets of lecture notes/notebooks where only some packages are used in some lectures.

That is, imagine I have a base set of packages I use for all of my lecture notes, and then a subset used for a lecture. For example

/lectures
    Project.toml  # has Plots.jl, etc.
    Manifest.toml  # Nice default for projects without special requirements.
    /differential_equations
        Project.toml  # Adds in DifferentialEaquations, Flux, etc.
        Manifest.toml  # presumably is a manifest supporting the nested inheritence?
        sciml_notebook.ipynb  # activates the local Project whrn run
    /bayesian
       Project.toml  # adds in Turing
       Manifest.toml  # compatible with the /lectures/Project.toml and /lectures/bayesian/Project.toml superset
       bayesian.ipynb
    /intro
       intro.ipynb   # activates the default /lectures/Project.toml

This would make maintenance far easier. With huge books and lecture notes, it is otherwise a collosal pain in the ass to bump a package version i n each Project.toml and test that there are no regressions.

What I ended up doing, out of practical necessity, was just to have a superset project.toml file in the main directory, but it is a decidedly anti-pattern for many reasons.

@StefanKarpinski
Copy link
Sponsor Member

Seems better for a book or lecture series to have a single global manifest that has all the dependencies you use anywhere with a single set of mutually compatible versions that work for everything.

@jlperla
Copy link

jlperla commented Oct 6, 2020

@StefanKarpinski Yeah, that is what I did.

But the problem is that Julia has so many cool packages that you if you take the superset of what you want to use in a book or set of lectures, what happens is:

  • If you update one package you are in a whack-a-mole where it upgrades things in other lectures due to the dependency graph, and breaks something else. This is the nature of Julia with its chaotic and innovative packages, and I don't see it as a fundamental problem (as long as you can keep to smaller Manifest files!)
  • If you try to add in a new dependency, there is a decent chance that somewhere in the graph of dependencies you have some sort of an unsatisfiability issue. And as the superset of packages used in your lectures get bigger, it approaches P(1).
  • If someone wants to use a single lecture, when then ]instantiate it has to do everything all at once for the whole book/lecture series. Which can take a long time if you start using all of the cool things julia has to offer for machine learning/differential equations/etc.

None of these are completely insurmountable, but it is a maintenance burden and means you end up with the Manifest.toml of the lectures being far from the latest released versions of packages because it is too painful to update it very often.

It also might be that with the newer CompatHelper/etc. stuff and some more stability in the dependencies that things are less painful than they used to be.

@goretkin
Copy link
Contributor

goretkin commented Nov 1, 2020

Maybe this is already clear to others in the thread, but I think this is the clearest way to explain what I hope for

  • toplevel/Project.toml expresses package dependencies
  • toplevel/test/Project.toml expresses test-specific dependencies only (i.e. not redundant with toplevel/Project.toml, but used in conjunction with it. e.g. toplevel/test/Project.toml [deps] lists the project toplevel/Project.toml name and UUID)
  • toplevel/test/Manifest.toml that captures the resolution of the test project (package deps and test-specific deps)

The specific paths are not so important to me. Those three files would be version controlled.

It's not important to me that toplevel/Manifest.toml and toplevel/test/Manifest.toml can be in anyway reconciled with one another.

@StefanKarpinski
Copy link
Sponsor Member

I think the things we need to make this really useful and start switching to it are:

  • when resolving, resolve main project by itself first
  • then resolve each subproject, taking the manifest of the main project as fixed
  • for simplicity, probably best to just save complete subproject manifest in test/Manifest.toml
  • probably fine for the subproject to explicitly depend on the main project

So in short, the subproject by itself looks like a standalone project and can mostly be treated as one. The exception is when resolving, we'll want to make sure to resolve the superproject first as described above.

In terms of UI to make this useful, being able to activate a subproject environment would take care of most of it. An orthogonal but useful feature would be being able to manipulate non-active environments by referring to them with some syntax in commands. For example (straw man syntax): add ./test TestDep. Note that while this would be especially useful for subprojects, it would not need to be limited to them; you could, for example, be in a development project and do add @v1.6 DevTool to add a dev tool to the @v1.6 environment. In order to really make this work, we'd want unambiguous syntax for all kinds of environments, but I think we could work that out.

@jlperla
Copy link

jlperla commented Nov 2, 2020

If find the current test dependencies in the Project.toml file to be imperfect but workable if just writing code and doing a ]test.

The real pain point right now is the interaction with the tooling and project/manifest activation during development in an IDE. i.e. the --project = @. can't activate the test dependencies, so if you ctrl-enter through a unit tests while tweaking stuff, it doesn't work. Then if you use test/Project.toml and test/Manifest.toml it is very cumbersome to keep things in sync, and opening up the project in vscode doesn't activate it, so manual steps are required. For this, I am not really thinking about big packages, but rather the sorts of smaller packages where you really want a Manifest.toml checked into source code control for reproducibility of a paper.

I say this to reiterate that syncing up with what @davidanthoff is discussing above is especially important for these sorts of use cases.

@StefanKarpinski
Copy link
Sponsor Member

Part of the point of subprojects is that it would be possible to activate a subproject environment. I don't know that involving more people in the discussion here will further the goal of getting something that works instead of the current state of perpetual "we're still using an old abandoned targets feature that definitely doesn't work, but no one has agreed to how subprojects should work, so we're stuck with nothing that really works". Instead, I think we should just pick a thing and make some kind of progress here.

@jlperla
Copy link

jlperla commented Nov 2, 2020

I don't know that involving more people in the discussion here will further the goal of getting something that works

Probably true. Especially if the definition of "that works" includes GUI implementation in vscode. Regardless, I am the least useful person in the discussion so this is my last comment.

@StefanKarpinski
Copy link
Sponsor Member

Anything that we do that works well can have a GUI interface put on top of it, so I don't think much feedback is necessary in that direction.

goerz added a commit to JuliaQuantumControl/QuantumPropagators.jl that referenced this issue Oct 17, 2021
QuantumControlBase extends QuantumPropagators, but QuantumPropagators
does not depend on QuantumControlBase except for testing... Thus the
dependency should be only in test/Project.toml and docs/Project.toml,
but not in the main Project.toml.

This also makes it necessary to not use Pkg.test() for local
development: There's just no way to dev-install a local test-dependency
without that dependency being in the main Project.toml. So anything
that's only a test-dependency, but not a dependency of the package
itself, will break.  That's actually how the circular dependency got
into Project.toml in the first place.

On CI, we don't have to worry about this as much: the testing will
modify the main Project.toml, but we throw away the checkout after the
CI run finishes, so it works.

All of these problems are because Pkg.test() is has some buggy behavior
if there is a test/Manifest.toml (which would be the appropriate place
to record the local dev-installation of test dependencies):
JuliaLang/Pkg.jl#1585
JuliaLang/Pkg.jl#1233
@ufechner7
Copy link
Contributor

Any update on this proposal? What is missing?

@MarkNahabedian
Copy link

See #3297, which proposes that the "extras" and "target's" fields in Project.toml be generalized so that they could be used for "docs" and other targets, not just "test" and "build", which are special cased in Operations.jl.

@Roger-luo
Copy link

Just an idea, if we allow a subproject, it would be nice to allow it inside src and load the corresponding module lazily, e.g Package/Project.toml and Package/src/A/Project.toml, if the user only calls using Package it will not load module A so it can act like an extension but within the provided package. And it will only load A if one types using Package.A

This would help reduce the development work of trying to split packages into Base, Core etc. which is not necessary anymore, and one can just type XXX.Base instead of using XXX.XXXBase. It should remove the usage of meta-packages in a lot of places too.

@Roger-luo
Copy link

I'd like to add a few use cases to this proposal. While maintaining a somewhat large project with multiple defacto "sub-package"s because we want to reduce the overhead for those who only use one of the subpackages. Bumping breaking versions of a package is quite painful at the moment, let me take the following example

the Bloqade packages as a whole want a breaking release 0.2.0, which is because several changes within the sub-packages, currently this requires one bump to the version of BloqadeExpr to 0.2.0 update the corresponding compat of other sub-packages, then bump the next dependency. This results in an order of packages to release. This results in a rather convoluted release process. But you may ask, if in this case, why don't you just put all of them in one package? We don't because if you put all of them in one package, the user only needs one of the features will now need to load everything and most of them they don't need.

So one feature I'd hope to have is to allow sub-packages to share the same version number with the main package, while lazily loading the corresponding module explicitly. So this means the source code will ship with the main package, only loading is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.