-
-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial hotfix script #170
Conversation
I have written this in a way that can be used with the
|
I've added this to the meeting agenda. ( https://conda-forge.hackpad.com/ ) |
@mcg1969, what do you guys do when hot-fixing a package? I noticed that |
Yes, the Of course, this causes some issues such as when someone downloads a bunch of tarballs and does There is no similar facility on |
Somehow changing metadata feels also like "not producing reproducible envs"... |
I can understand the thought, but the fact is that if I do a |
In fact, if the package is altered in the repo, then there is currently no way to reproduce the environment at the MD5 level, because the old package is gone. We've been talking about a way to address this. Basically, we'd never actually remove a package when it is overwritten; rather, we would simply move it aside to a URL that is accessible by MD5 signature. With some care you could even design the web server to handle the very URLs that
|
That sound nice. Just try to avoid getting into those messy URLs that PyPI added. 😄 |
Isn't that what the build number is for: change the package, add a new build, upload both to the repo: If the user use the old package, it should fail and if the new one, it should work. Reproducibility at work... [Edit: just to make sure: IMO reproducibility also means that failures and problems are reproducible. Fixing such situations with metadata fixes are nice for end users, but it's not "reproducible builds/envs".] |
No, there are genuine situations where backporting metadata fixes is required. Bumping build numbers doesn't do it. Ask @jakirkham who has been banging on Continuum to alter our |
New build numbers can be used to define a new set of requirements for a package. For example two build of |
It is true it can be fixed this way, but we ultimately have practical limits. It would not have been practical to ask someone to rebuild |
Two cases where bumping build numbers is insufficient.
|
Metadata only influences how an environment is created and modified; it has no functional influence. I can blow away the entire |
Linux distros have the same problem and I sure they work with patch+rebuild:
IMO the need for using such a way is an indicator that something in the workflow isn't right yet: the "it's impractical to rebuild gcc" or the "we need to indicate higher dependencies" indicates that the builds are not as fast or easy as it should be, the need for "forcing packages out" seems to indicate that dependency resolution mechanism isn't working as good as it should (=how it works in linux distros). In an ideal world, manually fixing dependencies so that "what conda sees" is different from "what I see in the recipe" should not be needed. Re reproducibility: ok, convinced that it is not necessary. |
We cannot force people to do
Yes, this is good---for new installations. The metadata hotfix is for people who already have the package installed, to prevent them from breaking their installations. Again, the scenario:
|
Ok... I don't really have an opinion on A. This seems to be a problem due to the rolling releases of conda (or better the defaults channel). This doesn't seems to be a problem of real "releases" in linux distros because they come with an additional security channel which only gets patched versions of the one in the upstream channel. Re the second: IMO that is just a bug in B not declaring the right dependency and should be fixed by updating the recipe. I think the main problem here is that it's a rolling release, so all packages are available, even broken ones and all package versions are used to resolve dependencies. So conda might end up with a solution which requires the broken package. This looks like a problem (and a "need for") in conda itself ("No way to mark a package as 'should be replaced' or even 'no good'") and the "update the metadata of existing packages" seem to be the workaround. If conda-forge gets bigger this will become a problem there, because conda-forge does not have this workaround and so needs to do security patches itself... |
@JanSchulz This is not about security fixes (at least this is not why I am proposing this). This is about preventing conda dependency resolver to create broken environments. And I am having enormous issues with packages that are uploaded with wrong metadata. It's much pain that I have just stopped working on conda-forge until I see those packages either be deleted ( which is really bad IMO) or be hot-fixed. P.S. I was thinking to write the script in a way that takes a file like this as input:
and then you can fix packages in less runs. |
@JanSchulz: you cannot claim that package B is broken just because its upgrade to verison 2.0 breaks another package. No package maintainer can promise to maintain perfect forward- and backward-compatibility across major upgrades. If any package is broken, it is A: or rather, its dependency metadata. If the maintainer of A had perfect foresight, he would have specified a dependency of The user of package A, however, has no interest in placing blame on A or B. They want a fix. Metadata hotfixes are always going to be, in some case, the right way to go about it. I'm done going back and forth on this. @183amir, you know of course I fully support the need to hotfix metadata in a case like this. |
I'm little confused here -- what problem is being addressed here? conda has the wonderful concept of "build version", so if you need to update a package, even just for meta-data, you can increment the build number, and push a new version. I'm missing what problems this doesn't solve, that hot-patching would. |
The problem is we may decide to pin a dependency later (that was previously unpinned). However, the unpinned packages remain on conda-forge. In some situations, the solver may cough these old packages back up in attempt to meet user requirements. As a result, broken environments are created. To me, this is clearly a package problem and it needs to be fixed somehow. Here are the options as I see them.
While 1 works, it is a bit time consuming to do manually especially with 2 is nice, but it is tricky ATM to select which "broken" package one wants. Maybe there are tweaks we can make to conda (channel association? something else?) to help. Also there are different kinds of broken. Maybe "unpinned" or similar would be better? 3 is effective assuming we know how something needs to be hot-fixed. It also enters this controversial territory. IMHO the user wants to be able to reproduce a working environment. So, hot-fixing pinnings to accomplish that seem sensible. Though we are doing this through force overwriting. There are ways for this to go wrong possibly, but that is only worth talking about once we are happy with this general idea. |
On Wed, Jun 29, 2016 at 9:25 AM, jakirkham notifications@github.com wrote:
(showing may major ignorance of the solver) In any case, this sounds like a bug in the solver, though I know how hard -CHB Christopher Barker, Ph.D. Emergency Response Division |
See conda/conda#2219. Basically, newer builds can serve two purposes; to indicate that the previously build was bad or to specificity a new/different configuration of the build (like libgdal which can be built in a number of different options enabled or disabled). |
IMO there is a 4th option: let the resolver (in default mode) only consider the latest available version for a package. If a screwup happens, the the package maintainer can upload a new package version with the right dependency version and conda will only use that. conda is interesting compared to linux distribution which only keep the latest version around, so they are actually available to install. IMO a user should be protected from this. IMO a package with a lower version than the latest available should not be installed without a) a explicit package version given to a install command and b) if this forces a lower version of a dependency to be installed, it should warn the user. This way you can still do reproducible envs/builds be specifying all versions but you still get all the benefits of the normal model of "buggy packages get fixed by uploading a newer fixed version". The current situation is IMO a security problem for conda-forge as (in case conda-forge ever ends up with a full distribution (=can be used without |
@JanSchulz I understand what you are saying but what do you think goes wrong if we hotfix package dependencies? What bad thing will happen if we hotfix a package dependencies. For example, a package is uploaded and depends on |
Also, as far as I am aware of, Arch Linux (for example, a rolling release distribution) does not support installing old versions of packages while conda supports that. You should always upgrade all packages in Arch Linux and not just one.
This will just confuse users with warnings all over the place because eventually With your proposal, we need to rebuild all of our packages again ( |
The main thing is that it means that there are now two truth: one in the package recipe (which the users see and might use to reproduce the package; and which is used to build the next version for the package) and one in the things conda sees. IMO this should be avoided, because the fixer now has to remember to fix two places: the (internal?) recipe and via the hotfix-magic. What I also see is that this "hotfixing" is something which patches over a issue in conda itself (due to the "rolling but all versions are kept available" nature of conda or better conda-forge) and which will surface in conda-forge which does not have such hotfixing capabilities. I'm not sure what the implication of giving conda-forge admins (or each maintainer?) the ability to hotix would be. Currently I can transparently see what gets uploaded into the conda-forge distribution, how would "maintainer xyz adds a dependency on package 'kill_my_home_dir' via hotfix script" be surfaced in the github repo? |
Yes they should both hotfix old ones and fix the recipe so that new ones will be correct. This can be documented so that everyone gets it right.
That's a question of trust and not related to this pull. The admins can already do whatever they want with packages that are uploaded to anaconda.org. |
Can you please who are in favor of hotfixing packages with this script upvote this comment and others who disagree downvote this comment? (with emojis 👍 👎 ) |
I'm not ready to vote on this script outright as is, @183amir, but I think you know my opinion on hot-fixing now if it was not already clear which is 👍. |
One practical suggestion: save off the original |
Maybe make it a dictionary, with timestamps as keys. |
Since the hotfix point is somewhat controversial (though I'm glad to see it is getting such support), could I ask that we save this script as is, but consider a less controversial approach? What if we simply label the unpinned packages with some label (not feeling very creative ATM so |
I guess it won't work for people (like me) who would try to create old environments unless you already know exactly what package you want to install. |
But I guess since those packages are a real pain and are leaking in our builds, let's move them to another label (
As you can see almost all of |
I was thinking that maybe we can add the md5sum of all folders except the |
Could I please get a review of this pull? (there are new commits). Also please see: #191 |
So I feel your frustration @183amir. One of the reasons we are in the position of having a lingering decisions is that there is no coherent strategy for dealing with the problem raised in #126. In general the only way that a community can arrive at such a decision is if there is a formal enhancement proposal process which considers all of the implications in a balanced way, as well as discussing mitigation strategies for issues that come are likely to come up. To my knowledge nobody has put together such a document, and therefore we would simply be making a decision in the dark if we were to go ahead with hot-fixing - it is for this reason that the unpalatable approach of deleting distributions has persisted. Given your need for a solution to this, are you interested in getting started with an enhancement proposal for managing distributions that are fundamentally broken which takes this objective look at the leading options? From this we can put together a proposal, and finally move forwards to implementing the strategy as a community. |
See #191 (comment) for something that might help get you start with that document. |
@ocefpaf I am answering you here:
This will never be implemented in
Okay. I think I will put it in my first comment in this pull request. |
That would be a great start, thank you. It is always hard to write these things once a piece of code has been written, but it would be extremely helpful if the analysis was objective and didn't necessarily lean towards any one solution. With that, we can find solutions to the problems, and ultimately choose the "best" one. Thank you @183amir. |
I would not be so certain of that. It may take time, or someone from the community, but that is bad in other situations beyond our discussion here and I believe it will be fixed one day. |
Okay I updated my first comment and wrote a summary of situation and possible resolutions. I tried writing it objectively but it also has my opinions in it. So if you want your argument added there please mention them again here and I will update my first comment. |
@conda-forge/core how do I go ahead with this? You have asked me to update the first comment and explain the situation which I did but I did not get any feedback since then. |
@183amir -- the best way would be to go to the hackpad for the conda-forge meetings and add it to the agenda: https://conda-forge.hackpad.com/conda-forge-meetings-2YkV96cvxPG The next meeting is still a couple of weeks out, but might as well add it now. The core team is also working on a procedure for enhancement proposals. |
It's already been added. |
I am not sure that this remains relevant. I'll close this is 24 hrs if there are no objections. Please feel free to reopen. |
This PR is stale and @scopatz 24 hrs are now ~1334 hours, so closing this. |
The issue was originally raised here: #126 but a lot of discussions has happened here. Here is a summary of the problem and different solutions to it.
The problem:
When we package a program, we make mistakes but they get published on anaconda.org and people start using them:
boost 1.60.0
and all packages that were linking against it were listingboost
only in their dependencies. But when we decided to updateboost
toboost 1.61.0
Update to 1.61 boost-feedstock#9 we had to look for all packages that were linking againstboost
and pinboost
on version1.60.*
there (sinceboost
breaks API compatibility with each minor version). This resulted in a large but necessary effort of pinningboost
in other feedstocks. This included not only feedstocks that listedboost
as a dependency like this: conda-forge/bob.blitz-feedstock@342fcf2 but also other feedstocks which were linking againstboost
but were not listingboost
as a dependency like this: conda-forge/bob.core-feedstock@cf03ad6This is all fine but the problem is that those old packages which did not pin
boost
are also available in the channel and conda dependency resolver will pick those packages and create an environment with them with anyboost
version. Something like thisconda create -n test boost=1.62.0 bob.blitz_built_with_boost_1.60_but_not_pinned
will be created and results in a broken environment. An example can be seen here: https://circleci.com/gh/conda-forge/staged-recipes/5795There are mainly 3 solutions to this:
Solution 1: deleting the old packages. (current solution)
pros
: security is slightly higher compared to other approaches. easier to do.cons
: it will take away reproducibility in a very hard way. We end up deleting a lot of packages since this happens often and will happen more as we continue to upgrade our stack.Solution 2: hotfixing packages
with the script I provided here.
pros
: Does not delete old packages and lets you use these packages in future as you may want to.cons
: md5sum of packages will change. If people are attempting to achieve reproducibility down to the MD5 level, they will be unable to do so. (And yes, we know people who are doing this.)concerns
:possible workarounds
:Solution 3: change conda to use only the latest build when creating an environment unless explicitly specified.
pros
: That's what build numbers are for.cons
: conda is not a rolling release distribution and you are promised that you can create old environments today and they still work (at least in the defaults channel). I am not talking about reproducibility here, I am talking about creating new environments with an old dependency (say numpy 0.6) and let conda create a working environment for me. I don't want warnings and a broken environment at the end; I want something that actually works.With your proposal, we need to rebuild all of our packages again (numpy 0.6 for example) to reflect those broken dependency changes with a new build number. So if some dependencies get updated in future and they were not pinned and we could not foresee it, we need to release a new build number of numpy 0.1 till 1.11 just to make sure that unpinned dependency gets pinned in ALL numpy versions now.
Solution 4: Archive broken packages in another label in the channel.
pros
: not controversial. reproducibility with environment.yml files still possible.cons
: Those packages cannot be used anymore in a normal way. We still have no means to verify archived packages integrity unless you had its md5 before.