Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating anaconda.org metadata #126

Open
jakirkham opened this issue May 7, 2016 · 23 comments
Open

Updating anaconda.org metadata #126

jakirkham opened this issue May 7, 2016 · 23 comments

Comments

@jakirkham
Copy link
Member

@jakirkham jakirkham commented May 7, 2016

Sometimes the metadata of a package changes. For instance, it's license and/or summary. When this happens, we can update the feedstock by re-rendering. However, I'm a bit unclear on if there is a way to update the metadata on the anaconda.org.

If there is a way to do this, I'd be curious on how one goes about doing it. If this can't be done by others (e.g. requires a token), it would be nice if we could figure out a way to do this automatically (possibly as part of the team update script or similar). Though it might be nice to automate this even if there are no such difficult requirements. If there is no way to do this, it would be good if we could figure out one.

@msarahan

This comment has been minimized.

Copy link
Member

@msarahan msarahan commented May 8, 2016

I have pointed out this issue to the anaconda.org team - hopefully they can give you some insight here.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented May 8, 2016

anaconda.org updates it repodata.json indices programmatically, upon upload of a new package. I do not believe it is possible to hotfix metadata there without force-uploading the corrected package. (The rules are different for defaults because it is hosted elsewhere and mirrored on anaconda.org)

Kale and I were talking about the logistics of doing this. Currently we do push metadata hotfixes up to defaults, so that the metadata contained in the tarballs can be out of date. In most cases conda is correctly giving priority the repodata.json data; conda index is the one current exception I can think of, and that needs to be fixed.

The right thing to do is to update the tarballs as well, but this changes the MD5 signature, posing a problem for people who are choosing to be so careful about reproducibility that they track and verify those signatures. @kalefranz and I discussed what it would look like to handle this use case. In short, when we force-upload a new tarball, the old one would be moved aside by appending the MD5 signature to the filename. Furthermore, we would incorporate a history into the repodata index, so that it would be known that these files exist and what their changes were.

@ijstokes

This comment has been minimized.

Copy link

@ijstokes ijstokes commented May 9, 2016

I may be misunderstanding the proposal, however in general a package found at a particular canonical URL should never change once put into Anaconda Cloud (or Anaconda Enterprise). If there is a problem with it then a new one (new version) can be uploaded, and optionally the old one could be removed. But it is a bad idea to have Anaconda Cloud return two different versions of the package at two different points in time -- which is what would be possible if one could edit package meta-data after it had been uploaded, if I am understanding correctly the proposal here.

@msarahan

This comment has been minimized.

Copy link
Member

@msarahan msarahan commented May 9, 2016

I think this is less about a single package file (which I agree should not change) and more about things like the license, description, doc url, dev url, etc - primarily the "about" section.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented May 9, 2016

That's right. I agree that changing the functional content of a package merits a version or build bump, but metadata-only changes do not necessarily need this. In fact there are even situations where dependencies should be, even need to be, hotfixed.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented May 9, 2016

And @ijstokes, for the record, we do metadata hotfixes and force uploads, for better or worse, in the defaults channel. Thus it simply isn't the case in practice that the URLs (as we currently offer them) are "canonical". The proposal Kale and I discussed above will help us rectify that, but we do so not by requiring that the package-version-build.tar.bz2 URL itself be canonical. Rather, every uploaded package would be stored in a filename with its MD5 hash appended; e.g., package-version-build.tar.bz2-hash. That would be a canonical URL. Then package-version-build.tar.bz2 would alias to the latest revision of that package. If a package-version-build.tar.bz2 file is removed from the index, the hash-suffixed version would remain.

@jakirkham

This comment has been minimized.

Copy link
Member Author

@jakirkham jakirkham commented Jun 2, 2016

My concern was license and URL data being fixed as @msarahan correctly deduced.

Though there are definitely use cases for hotfixing dependency pinnings of packages. See this issue ( #111 ) where @183amir proposes this exact solution recently in a discussion with @pelson.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

Yep. There are several scenarios one can consider where hotfixing the dependency list is important.

@183amir

This comment has been minimized.

Copy link
Contributor

@183amir 183amir commented Jun 3, 2016

I am having a lot problems with a lot of packages that are not pinned! I have tried deleting some of them from anaconda.org but I think the best solution is to change their metadata.
How do we feel about this?

@183amir

This comment has been minimized.

Copy link
Contributor

@183amir 183amir commented Jun 3, 2016

Like this one:

conda execute conda-forge.github.io/scripts/list_deps.py conda-forge --package bob.blitz --dependencies boost 
Fetching package metadata: ..
bob.blitz 2.0.8 np110py27_0 None   : boost
bob.blitz 2.0.8 np110py27_1 None   : boost
bob.blitz 2.0.8 np110py27_2 None   : boost
bob.blitz 2.0.8 np110py27_3 None   : boost
bob.blitz 2.0.8 np110py27_4 None   : boost
bob.blitz 2.0.8 np110py27_5 None   : boost
bob.blitz 2.0.8 np110py27_6 None   : boost
bob.blitz 2.0.8 np110py27_7 None   : boost 1.60.*
bob.blitz 2.0.8 np110py27_8 None   : boost 1.61.*
bob.blitz 2.0.8 np110py34_0 None   : boost
bob.blitz 2.0.8 np110py34_1 None   : boost
bob.blitz 2.0.8 np110py34_2 None   : boost
bob.blitz 2.0.8 np110py34_3 None   : boost
bob.blitz 2.0.8 np110py34_4 None   : boost
bob.blitz 2.0.8 np110py34_5 None   : boost
bob.blitz 2.0.8 np110py34_6 None   : boost
bob.blitz 2.0.8 np110py34_7 None   : boost 1.60.*
bob.blitz 2.0.8 np110py34_8 None   : boost 1.61.*
bob.blitz 2.0.8 np110py35_0 None   : boost
bob.blitz 2.0.8 np110py35_1 None   : boost
bob.blitz 2.0.8 np110py35_2 None   : boost
bob.blitz 2.0.8 np110py35_3 None   : boost
bob.blitz 2.0.8 np110py35_4 None   : boost
bob.blitz 2.0.8 np110py35_5 None   : boost
bob.blitz 2.0.8 np110py35_6 None   : boost
bob.blitz 2.0.8 np110py35_7 None   : boost 1.60.*
bob.blitz 2.0.8 np110py35_8 None   : boost 1.61.*
bob.blitz 2.0.8 np111py27_0 None   : boost
bob.blitz 2.0.8 np111py27_1 None   : boost
bob.blitz 2.0.8 np111py27_2 None   : boost
bob.blitz 2.0.8 np111py27_3 None   : boost
bob.blitz 2.0.8 np111py27_4 None   : boost
bob.blitz 2.0.8 np111py27_5 None   : boost
bob.blitz 2.0.8 np111py27_6 None   : boost
bob.blitz 2.0.8 np111py27_7 None   : boost 1.60.*
bob.blitz 2.0.8 np111py27_8 None   : boost 1.61.*
bob.blitz 2.0.8 np111py34_0 None   : boost
bob.blitz 2.0.8 np111py34_1 None   : boost
bob.blitz 2.0.8 np111py34_2 None   : boost
bob.blitz 2.0.8 np111py34_3 None   : boost
bob.blitz 2.0.8 np111py34_4 None   : boost
bob.blitz 2.0.8 np111py34_5 None   : boost
bob.blitz 2.0.8 np111py34_6 None   : boost
bob.blitz 2.0.8 np111py34_7 None   : boost 1.60.*
bob.blitz 2.0.8 np111py34_8 None   : boost 1.61.*
bob.blitz 2.0.8 np111py35_0 None   : boost
bob.blitz 2.0.8 np111py35_1 None   : boost
bob.blitz 2.0.8 np111py35_2 None   : boost
bob.blitz 2.0.8 np111py35_3 None   : boost
bob.blitz 2.0.8 np111py35_4 None   : boost
bob.blitz 2.0.8 np111py35_5 None   : boost
bob.blitz 2.0.8 np111py35_6 None   : boost
bob.blitz 2.0.8 np111py35_7 None   : boost 1.60.*
bob.blitz 2.0.8 np111py35_8 None   : boost 1.61.*

Basically all the boost packages should be pinned on boost 1.60.*.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

Basically all the boost packages should be pinned on boost 1.60.*.

@183amir, why is that? If boost 1.60 works, shouldn't boost 1.61 work as well?

The problem here is not the unpinned versions but the overpinned ones. What I'm seeing here is exactly the kind of problem that I've been concerned about with conda-forge's efforts to "automatically" pin packages. Pinning this precisely is bound to cause problems for people.

For instance, if boost 1.60 is already installed, conda is going to resist upgrading bob.blitz to its latest build, because it doesn't want to have to upgrade boost 1.61 if it doesn't have to. Oh sure, conda will upgrade it if you explicitly ask it to do so, and schedule an upgrade to boost 1.61 in the process. But it will never be part of a second-level upgrade. That is: if package foo depends on bob.blitz, and you do conda upgrade foo, it will not update bob.blitz to build 8, because that would require upgrading boost as well. And conda always seeks to minimize the number of unspecified dependencies that it upgrades.

Dependencies need to be as weak as possible, but no weaker, otherwise you are going to invite conflicts like this. And instead of doing wildcard pinning, you should be considering inequality pinning; e.g., boost >=1.60. And they need to be consistent across builds as much as possible.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

Even worse: if someone already has boost 1.59 installed, and then they do conda install bob.blitz, they are quite likely going to be given builds 6 or earlier, because those are the only versions compatible with their currently installed version of boost.

EDIT: no, this will upgrade boost, nevermind. But here's a scenario that does what I'm thinking: boost 1.59 is installed, but bob.blitz is not. User does conda install foo, which has a dependency on bob.blitz. They will be served build 6, because that allows the dependencies to be fully satisfied without upgrading boost.

@183amir

This comment has been minimized.

Copy link
Contributor

@183amir 183amir commented Jun 3, 2016

Dear Michael, bob packages (at least) look for exactly that version of
boost. So even though you can install this package alongside boost 1.61, it
will crash. I have seen this crash more than one time in our builds.

On Fri, Jun 3, 2016, 2:25 PM Michael C. Grant notifications@github.com
wrote:

Even worse: if someone already has boost 1.59 installed, and then they do conda
install bob.blitz, they are quite likely going to be given builds 6 or
earlier, because those are the only versions compatible with their
currently installed version of boost.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#126 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AFeQx6O8IIs1V7uYR7CRqLbP5GEdkVxCks5qIB0agaJpZM4IZf3X
.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

Wow, that's a serious inconvenience. Well, then I certainly agree that the metadata needs to be corrected to reflect that, across all builds.

@183amir

This comment has been minimized.

Copy link
Contributor

@183amir 183amir commented Jun 3, 2016

Yes. But it is also mentioned in our wiki
https://github.com/conda-forge/staged-recipes/wiki/Pinned-dependencies that
boost should be pinned like that. 👍

On Fri, Jun 3, 2016, 2:36 PM Michael C. Grant notifications@github.com
wrote:

Wow, that's a serious inconvenience. Well, then I certainly agree that the
metadata needs to be corrected to reflect that, across all builds.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#126 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AFeQx0jzWnJ9a9ugtk2wgRghgJwt3UV-ks5qIB_bgaJpZM4IZf3X
.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

Very good! Then it really is just a matter of what to do about retroactive fixes.

@183amir

This comment has been minimized.

Copy link
Contributor

@183amir 183amir commented Jun 3, 2016

Yes! I was thinking of downloading packages, changing "info/index.json" and
replacing them. Will changing that file only be enough?

On Fri, Jun 3, 2016, 3:02 PM Michael C. Grant notifications@github.com
wrote:

Very good! Then it really is just a matter of what to do about retroactive
fixes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#126 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AFeQx9r8AZS7rjjbWMUfOXsnFFfJ_x5Dks5qICX-gaJpZM4IZf3X
.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

I believe so, yes.

We've been talking about this question a lot internally, because anaconda.org users don't have the same flexibility we at Continuum have with the "defaults" channel. For instance, on defaults, we can hotfix just the repodata.json file without having to change the package. That's because defaults is a static site. In contrast, anaconda.org updates repodata.json automatically when a package is added or removed from a channel. So the only way to do metadata hotfixes on anaconda.org is by updating the package.

Well, you might argue that we should update the package, even on defaults, when we change repodata.json. I am inclined to agree except for one wrinkle: changing the package alters its MD5 signature. So by replacing the package you are effectively removing a package from the repo that someone else might have already downloaded and is depending on. If they're attempting to achieve reproducibility down to the MD5 level, they will be unable to do so. (And yes, we know people who are doing this.)

One thing we're considering is that instead of overwriting a package when a replacement is uploaded, we move the old package aside by appending its MD5 prefix to the URL. That way it is no longer available in the index, but still available for explicit request.

@jakirkham

This comment has been minimized.

Copy link
Member Author

@jakirkham jakirkham commented Jun 3, 2016

Sorry @mcg1969, but Boost breaks ABI just about every minor version. See this report. Now this doesn't always happen, but without generating our own ABI reports ( #150 ), we cannot determine a larger range that is acceptable for Boost.

In short, I disagree that this is a case of over pinning.

@mcg1969

This comment has been minimized.

Copy link

@mcg1969 mcg1969 commented Jun 3, 2016

Fair enough @jakirkham! I mean, @183amir convinced me it was necessary for bob.blitz, but I was unaware this was a problem primarily caused by boost. Thanks for the info!

@jankatins

This comment has been minimized.

Copy link
Contributor

@jankatins jankatins commented Jun 3, 2016

See also

  • #157 -> package native libs in packages which include the version/SONAME, so that multiple versions can be installed.
  • conda/conda-build#966 (comment) -> Including minimum version in the lib packge itself so that it can be used to set the package dependency at build time
@jakirkham

This comment has been minimized.

Copy link
Member Author

@jakirkham jakirkham commented Jun 3, 2016

Sure, we have been trying to do our homework with these. 😄 Though there may be other ones that we may need to revisit. What we have started doing (very recently I might add), is writing the pinnings that seem acceptable for the stack in one central location. From this PRs are generated for feedstocks to make them compliant. This way we can attempt to ensure the same pinnings are applied across the stack.

@183amir

This comment has been minimized.

Copy link
Contributor

@183amir 183amir commented Jun 7, 2016

@conda-forge/core I wrote an initial script to hotfix dependencies, please take a look at it so that we can go forward with this as soon as possible: #170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.