Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for recipe and package immutability #16

Merged
merged 4 commits into from
Jan 28, 2021

Conversation

memsharded
Copy link
Member

Packages and recipes, identified by their complete reference including the revision, all artifact bare file contents (conanfile.py, source files, compiled libraries and executables, etc) stored in the Conan cache and in servers are always immutable.

So far, in Conan 1.X packages and recipes are mutable without revisions and there are corner cases, even with revisions, in which the immutability is not achieve. This PR proposes to define all stored recipes and packages as immutable (which the already approved proposal of using revisions greatly help). Please read carefully the proposal and looking forward your feedback.


  • Upvote 👍 or downvote 👎 to show acceptance or not to the proposal (other reactions will be ignored)
    • Please, use 👀 to acknowledge you have read it, but it doesn't affect your workflow
  • Comment and reviews to suggest changes to all (or part) of the proposal.

@rasjani
Copy link

rasjani commented Dec 18, 2020

Sorry if this isnt relevant but here it goes. I havent following up on usage of user/channel (is that just cci convention to drop those?) and relation of this proposal but it at least raises a question of how "promotion" of a package from one user/channel to another user/channel would be affected with this change ?

@DoDoENT
Copy link

DoDoENT commented Dec 18, 2020

I'm not completely sure that I understand this - what is currently not immutable? The fact that admins can delete the package from the Artifactory and upload the new one or overwrite the existing one? I don't know how you can prevent this, except by disabling the possibility to delete packages from Artifactory...

Regarding the revisions, if the default algorithm for revisions would be the hash of the contents, then only those affected would be those that use revision_mode = "scm".

Also:

The conan copy command will be removed, as it can mutate a package reference.

I don't understand this. How can conan copy mutate a package reference? I thought it leaves the original intact and creates a copy of the package under new user/channel. By definition, this is a new package with new references, etc.

Also, why will conan copy be removed? What alternative should be used?

@memsharded
Copy link
Member Author

Sorry if this isnt relevant but here it goes. I havent following up on usage of user/channel (is that just cci convention to drop those?) and relation of this proposal but it at least raises a question of how "promotion" of a package from one user/channel to another user/channel would be affected with this change ?

Hi @rasjani Yes, dropping user/channel is mostly for ConanCenter. Users are still encouraged to use them, specially the "username" part, to identify their own packages created by themselves.

Regarding the promotion, that is right, you can read in the proposal, that moving one package from one user/channel to another user/channel violates the immutability. This will be removed in Conan 2.0 (including the conan copy command). We have learned (and it is a know best practice in devops and package management) that it works much better if the immutability is respected, and promotions are run between repositories. The mechanism we are already recommending is copying packages from one repository to another. This keeps immutability intact and implements much more robustly promotion pipelines.

@memsharded
Copy link
Member Author

I don't understand this. How can conan copy mutate a package reference? I thought it leaves the original intact and creates a copy of the package under new user/channel. By definition, this is a new package with new references, etc.

As long as you can have (and many recipes do have) conditionals and logic in the recipe that depends on user/channel, like checking self.user or self.channel, this breaks the package. A typical case is propagating the user/channel part to dependencies in requires. Doing this you can have a package with a given user and channel, that did use dependencies to build from another user and channel, that is an "impossible" state now. Even the internal "conaninfo.txt" stores the old, invalid references.

Also, why will conan copy be removed? What alternative should be used?

Promotion between repositories. This is a proven, well known best practice for other package managers, and we have been already been designing Conan latest features, like the lockfiles, or the new CI flows (soon to be launched in the Academy trainings), to follow this principle. Some Conan users are already using it successfully, and we are also applying it ourselves at a huge scale in ConanCenter.

@memsharded
Copy link
Member Author

Just for historic reasons: the user/channel part is mostly a legacy from the time Conan servers only had 1 repository and there was only 1 non moderated central repository, and a way was needed to allow different users to upload their own packages for the same library. This proved to not work very well, and the new moderated and centralized in Github repo way to contribute to ConanCenter is way more successful.

@DoDoENT
Copy link

DoDoENT commented Dec 18, 2020

A typical case is propagating the user/channel part to dependencies in requires. Doing this you can have a package with a given user and channel, that did use dependencies to build from another user and channel, that is an "impossible" state now. Even the internal "conaninfo.txt" stores the old, invalid references.

🤔 , so if I understood correctly, it would not be possible anymore for package to depend on both protoc/<version>@user/channel (build-requires) and protobuf/<same-version>@same-user/same-channel (and similar packages that have both runtime library and code generator tool)? Or did I get something wrong?

@ytimenkov
Copy link

Could this be clarified a bit with regard to the source of hashing? As I understand SCM is used only for recipe revision, while (binary) package revision is always calculated based on the contents, right?

This can be calculated only after package is built.

In this case I think it is quite important to at least mention that there could be different binary packages with exactly the same settings but different hashes. For example, if compiler flags are different, like -O2 vs -O3 or slightly different compiler was used (since compiler.verison is just major.minor, so updating patch-version may flip few bits...).

Or if I get this wrong and SCM revision used for package revision too we get the opposite situation, where a change in produced binary won't create a new package (same: if there was some compiler codegen issue fixed in patch upgrade).

In any case I think it's important to mention your research on reproducible builds.

@memsharded
Copy link
Member Author

so if I understood correctly, it would not be possible anymore for package to depend on both protoc/@user/channel (build-requires) and protobuf/@same-user/same-channel (and similar packages that have both runtime library and code generator tool)? Or did I get something wrong?

No, that shouldn't be related to this, that is about the build-host contexts, and that is expected to still be possible. Actually, using the build-host contexts will also probably be proposed for Conan 2.0 as the default behavior. I might be missing something.

@memsharded
Copy link
Member Author

In this case I think it is quite important to at least mention that there could be different binary packages with exactly the same settings but different hashes. For example, if compiler flags are different, like -O2 vs -O3 or slightly different compiler was used (since compiler.verison is just major.minor, so updating patch-version may flip few bits...).

Yes, but not sure if I understand. This is exactly the purpose of the "package-revisions". Getting a new, different package revision (for the same recipe version, revision, dependencies, settings, etc), only happens if you force another build of the same thing, even if absolutely nothing changed, and you either have a change in the environment that is not modeled by Conan at all, or your builds are not reproducible.

You are right:

  • Recipe revision: By default is the recipe contents (recipe + source code, or coordinates to source code). It can also be configured to follow the SCM commit. In principle both are tied, any change done in the recipe or source code implies a change in SCM, and even if in the other direction is not strictly true, in practice it is the same, except some cases in which the developers still want to create a new recipe revision even if nothing changed in recipe or source code.

  • Package revision: Always the hash of the contents. A package revision always belong to a given recipe revision + package-id, it is not something that can be detached from it.

Does this clarify a bit? Maybe add those items to the doc?

@jasal82
Copy link

jasal82 commented Dec 18, 2020

Will it still be possible to mix dependencies from release and develop (stable and unstable) repositories? This is a mandatory requirement for us. We should at least have the possibility to select the remotes on a per-requirement basis then. An even better solution would be to make Conan aware of the versioning workflow so that there is a bijection between SCM branches/tags and the associated packages. Currently we're providing this bijection manually by mapping user/channel to branches/tags. I don't see how this could be mapped to Conan remotes.

@ytimenkov
Copy link

Does this clarify a bit?

Yes. I think I was thinking more about Conan being distributed so technically it's possible that different repositories have different package revisions. Especially if something like --build=missing is used together with scm mode for recipe: 2 different people may build from the same recipe and upload to different remotes. Or same remote which will cause overwrite...

I think my main concern is that it is not possible to know hash in advance to skip the build and if buidls are not totally reproducible just accidentally triggering CI may result in catastrophic rebuilds of all downstream packages.

Lockfiles are very close connected to this feature as they can prevent unintended rebuilds (or provide consistency) but building CI around lockfiles is much more work (not everyone may be willing to invest into).

Maybe add those items to the doc?

I think it's a good idea.

@memsharded
Copy link
Member Author

Yes. I think I was thinking more about Conan being distributed so technically it's possible that different repositories have different package revisions. Especially if something like --build=missing is used together with scm mode for recipe: 2 different people may build from the same recipe and upload to different remotes. Or same remote which will cause overwrite...

Not really an overwrite. 2 things can happen:

  • Builds are not deterministic. A new package revision will happen in each build agent. The timestamp of the upload will define which is "latest" to be resolved.

  • Build are deterministic: The same package revision will be computed, and then the second upload will be skipped, not an overwrite (or it would be an overwrite, but as it is the same, it wouldn't matter)

I think my main concern is that it is not possible to know hash in advance to skip the build and if buidls are not totally reproducible just accidentally triggering CI may result in catastrophic rebuilds of all downstream packages.

I think in the general case there are 2 "safe" CI approaches, based on the "package_id_mode":

  • default_package_id_mode=recipe_revision_mode. It means: If something changes in the source code of a package, including the recipe or the source code itself, then it will be a new recipe revision, and all consumers of this package will compute new package_ids and then will require to be rebuilt. This is "matematically" exact at the source code level.

  • default_package_id_mode=package_revision_mode. It means: if anything changes, from the source code, to the binary, I don't care if it is just a useless timestamp of the binary because it is not a deterministic build, the package revision will change, and all consumers of that package will have a new package_id and need to rebuild.

Possibly, the first one is more suitable to normal development, in which we assume that changes are in sources, and the environment is quite controlled, (CI, docker images). When something breaks in the environment badly, then a new build can still be forced manually. So the second one seems to make more sense just for very extreme cases, with environment variability or something like that.

@ytimenkov
Copy link

Hmmm... Didn't think about it from this perspective.

So if package revision of the upstream package defines package id of a downstream one means we can't simply forecast package id for the final app until all dependencies are built. Not sure which consequences this has...

Another important thing to keep in mind is ability to install dependencies for development and but get "missing binary package" (for "enterprise ci" case). Then again, lockfiles can help but they should either be checked in or downloaded from ci. Both options are so-so...

@michaelmaguire
Copy link

michaelmaguire commented Dec 18, 2020

The mechanism we are already recommending is copying packages from one repository to another. This keeps immutability intact and implements much more robustly promotion pipelines.

Will jFrog be adding support for atomic moves of multiple packages to avoid possible cases where another user might see 'half' newly promoted packages while promotion copy package-by-package is in progress?

(see e.g. conan-io/docs#1935 "Lockfile CI documentation glosses over possible difficulties in atomic Artifactory copy of multiple packages" )

@michaelmaguire
Copy link

michaelmaguire commented Dec 18, 2020

Just for historic reasons: the user/channel part is mostly a legacy from the time Conan servers only had 1 repository and there was only 1 non moderated central repository, and a way was needed to allow different users to upload their own packages for the same library. This proved to not work very well, and the new moderated and centralized in Github repo way to contribute to ConanCenter is way more successful.

If Conan is going to move more towards separate repo usage, I hope that jFrog will add better, finer-grained permissioning support for Artifactory repo creation. In larger organizations, the teams writing CI jobs are (rightly so) not allowed the Artifactory Admin credentials that would allow for repo creation on-the-fly during builds or to create new "distributions".

Co-authored-by: Ayaz Salikhov <mathbunnyru@users.noreply.github.com>
@foundry-markf
Copy link

We have learned (and it is a know best practice in devops and package management) that it works much better if the immutability is respected, and promotions are run between repositories. The mechanism we are already recommending is copying packages from one repository to another.

Can anyone point me to the documentation that describes how to copy packages from one repository to another, please?
Preferrably in a scriptable way.

I like the idea of building packages in an isolated repository, testing them there, and only once happy, transferring them to the 'production' repository in one fell swoop.

@michaelmaguire
Copy link

michaelmaguire commented Dec 22, 2020

@foundry-markf

Can anyone point me to the documentation that describes how to copy packages from one repository to another, please?
Preferrably in a scriptable way.

https://www.jfrog.com/confluence/display/JFROG/Scripted+Pipeline+Syntax#ScriptedPipelineSyntax-PromotingBuildsinArtifactory

Also see the jfrog rt command line tool:

https://www.jfrog.com/confluence/display/CLI/CLI+for+JFrog+Artifactory

But see my notes in #16 (comment) about the current lack of atomicity if you need to 'promote' multiple packages at once from one repo to another.

@Lizabeth-Roberts-Bose
Copy link

I am a little confused on this one and I think there are two use cases to break down:

  1. Allowing packages and recipes in artifactory to be mutable
  2. Allowing locally created packages to be mutable

The main reason I want to separate these two use cases is because if you, as a developer, are working with multiple recipes you will want to be able to create packages to test things locally before uploading anything to artifactory.

In this case I think it's important to allow packages to be mutable because typically things like the packages version (at least in our case) as well as user/channel won't change between these conan create calls, so the packages need to be overwritten in cache or things won't work. Perhaps revisions can solve this but I haven't been able to get this to work well locally and conan sometimes gets confused and doesn't pick the correct package.

The second use case is allowing a package to be overwritten in artifactory. This one I think should not be allowed and the idea of making anything already uploaded to artifactory (which by the way, in our case only CI is allowed to do this, no users have upload permissions) immutable I think makes sense.

My only real confusion here is what happens to the conan upload --force flag? Can we no longer force upload and overwrite recipes/packages with this proposal or am I missing something?

@memsharded
Copy link
Member Author

In this case I think it's important to allow packages to be mutable because typically things like the packages version (at least in our case) as well as user/channel won't change between these conan create calls, so the packages need to be overwritten in cache or things won't work. Perhaps revisions can solve this but I haven't been able to get this to work well locally and conan sometimes gets confused and doesn't pick the correct package.

Maybe this is a side effect of the current limitation of the Conan cache, that can only contains 1 revision at a time? This is a known problem, as revisions are being bumped out of the cache and replaced with a new one. Yes, indeed, going this way would mean that we need to improve the cache, so it can host multiple revisions simultaneously, and in that way, developers could be doing modifications to packages, creating more revisions without interfering with other projects that are using some given revisions of those packages.

My only real confusion here is what happens to the conan upload --force flag? Can we no longer force upload and overwrite recipes/packages with this proposal or am I missing something?

Most likely it means: if the revision you are uploading already exists in the server, remove it, and re-upload (making it the latest revision available). Would be necessary only for extreme cases, to fix something that is broken, as the revisions themselves should allow for a very efficient upload flow (if the revision is in the server no need to do anything, no need to compress/zip files, no need to check checksums of .tgz, as was being done for old V1 without revisions)

@Lizabeth-Roberts-Bose
Copy link

Maybe this is a side effect of the current limitation of the Conan cache

Thank you for the clarification! If this is the case, I think this is something that may need to be fixed, or a work around in place as part of conan 2.0. Being able to develop multiple recipes locally at the same time, with complex interdependencies is critical for us.

One key problem we have always had, that is similar to this is that because nothing in cache is changing as we locally change code and run a new conan create command, using build=missing doesn't pick up the change and so changes don't propagate locally without essentially clearing the cache out. Perhaps this change can address this somehow by always creating a new revision in cache that conan recognizes and propagates to other dependencies

@memsharded
Copy link
Member Author

One key problem we have always had, that is similar to this is that because nothing in cache is changing as we locally change code and run a new conan create command, using build=missing doesn't pick up the change and so changes don't propagate locally without essentially clearing the cache out. Perhaps this change can address this somehow by always creating a new revision in cache that conan recognizes and propagates to other dependencies

yes, I think a Conan cache that can store multiple revisions for the same recipe would allow that flow, and the --build=missing with recipe_revision_mode, for example, should work without any issues.

@DoDoENT
Copy link

DoDoENT commented Dec 30, 2020

I'm just curious about using packages that have local modifications in cache. Of course that this is not something that should be done, but can often simplify cross-package debugging.

For example:

  • create a header-only package liba that will be used from application
  • consume it from the application and discover that there is a bug in liba
  • edit the header directly in Conan cache and iterate with the application until the bug is resolved
  • copy the fixed headers back from conan cache into liba's repository and commit, so CI can make an official package

I know that the official Conan way of doing that is:

  • edit the header in liba repository and create new package
  • test in the application

However, this is very time consuming, as creating package can be quite slow when lots of headers need to be copied (e.g. when fixing bugs in boost or eigen) - it's much faster to edit directly in cache and then copy the edits back to the repository once finished.

This has nothing to do with servers nor with how conan works - I'm just curious that if the package revision will be calculated based on package's source, will the application be able to consume the liba after it was locally edited? This is currently possible in conan v1.x.

@memsharded
Copy link
Member Author

edit the header directly in Conan cache and iterate with the application until the bug is resolved

Local modifications in the cache will always be forbidden and leading to undefined behavior, binary incompatibilities and loss of data.

We are working towards a better experience doing local development, if you are editing a source file and want to test it from downstream, you need to put it in editable mode. Current efforts are oriented to make the "editable" layout definition easier and more user friendly (even having a built-in default that is good for many cases).

@DoDoENT
Copy link

DoDoENT commented Dec 30, 2020

Current efforts are oriented to make the "editable" layout definition easier and more user friendly (even having a built-in default that is good for many cases).

I like that. Currently, putting packages into editable mode is a lot of hassle and not practical for most use-cases (especially when I need to test different platforms (e.g. iOS, Android, Emscripten, MacOS) in parallel). By exporting the package into local cache I at least have the chance of properly testing the package and package_id methods from the downstream.

Yes, I know about conan package that tests the packaging, but it does not test the package_id as it does not calculate any hashes.

I am looking forward to the ideas and proposals for making local cross-package development easier.

@foundry-markf
Copy link

This comment is interesting, particularly about needing to manually map from SCM branches/tags to users/channels. We've discussed this internally, as we would like to make a closer association with say, long-lived feature branches and how to thus associate them with packages created so references both don't collide, and are identifiable.

Also, recipe revisions seem to follow a linear history, so don't allow for branching in the SCM sense.

I don't see a response in the conversation about whether Conan package references/revisions can be made aware (even if opt-in) to SCM branching/tagging workflows. Has there been any thoughts on this possibility?

@memsharded
Copy link
Member Author

@foundry-markf @jasal82

An even better solution would be to make Conan aware of the versioning workflow so that there is a bijection between SCM branches/tags and the associated packages. Currently we're providing this bijection manually by mapping user/channel to branches/tags. I don't see how this could be mapped to Conan remotes.

This is what we are doing in ConanCenter already: we are mapping every branch/PR to a repository in Artifactory. We put the binaries of that branch there, to not mix them with the production ones. As it might be difficult (Artifactory repo creation permissions) for some teams, it is also possible to have a temporary "tmp" or "builds" repository, and associate every branch/PR with a lockfile. This is what we are doing in the CI/CD examples in the docs and very soon in the CI/CD training.

@foundry-markf
Copy link

@memsharded thanks for the response.

This is what we are doing in the CI/CD examples in the docs

Could you provide a link to the docs referred to, please? My Google-fu is failing to find them.
I looked over https://docs.conan.io/en/latest/integrations/ci.html but it does not mention the SCM branch mapping to Artifactory remotes, and lockfiles.

@memsharded
Copy link
Member Author

@memsharded
Copy link
Member Author

We are merging as approved this proposal, with some modifications to take into account the feedback received. Check the last commits about it:

  • Added clarifications about the recipe and package revisions
  • A new cache (with a new layout) will be implemented that will allow to host multiple revisions
  • We will move the concern of the more controlled or atomic promotions on the server side to the relevant server teams, and work with them to keep improving this aspect.

Thanks very much all for all the feedback, we will be opening new proposals soon!

@memsharded memsharded merged commit bf96169 into conan-io:main Jan 28, 2021
@memsharded memsharded deleted the proposal/immutability branch January 28, 2021 16:10
@climblinne
Copy link

We use the package_id at the moment, to identify the application with it's dependency. Now you also introduce the package_revision for the exact binary output files.
How would be the libraries be referenced in the application package_id?

  • be referenced by pkg/version@user/channel#recipe_revision:package_id
  • or be referenced by pkg/version@user/channel#recipe_revision:package_id#package_revision ?

@memsharded
Copy link
Member Author

Hi @climblinne

Can you clarify what you mean with How would be the libraries be referenced in the application package_id?

The complete "coordinates" of a given binary package containing the artifacts, headers, libs, etc will be pkg/version@user/channel#recipe_revision:package_id#package_revision. Every pkg/version@user/channel#recipe_revision:package_id "package_id" can contain multiple package revisions, a different one for each build if the build is not fully deterministic. This would be a much less frequent case than changing recipe revisions. Package revisions will account for a re-build when something changed in the system. Lets say that you had to upgrade your build machines or something, nothing really changed in Conan or source code, but you still want to rebuild the binaries. You will get 2 package revisions, the older build, built with the previous machines setup, and a new revision of the binaries built with the upgraded machines setup. Let me know if this clarifies something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants