Version parsing/ordering in conda #2071

groutr · 2016-02-12T23:28:26Z

Currently, version parsing and ordering are accomplished in conda via the MatchSpec, VersionSpec, VersionOrder, and Package classes.

I thought I would open discussion on using PyPA's packaging module to handle parsing/ordering/comparing versions in conda. There are a few pros and cons that need to be considered though.

Documentation: https://packaging.pypa.io/en/latest/
Repository: https://github.com/pypa/packaging

Pros:

packaging implements PEP440 and PEP508. The current versioning classes in conda seem to implement the older PEP380.
code reuse is usually a good idea
maintained by PyPA, who also maintain pip
Useful abstractions for dealing with versions and version ordering
Code is neatly organized and well documented
Covers as many versioning schemes as I can think of

Cons:

packaging depends on six and pyparsing. Conda strives to have as few external dependencies as it can.
Initial implementation might take a lot of work, but hopefully not too much. The version handling classes have been in conda for a long time. Rooting out all of the assumptions and peculiarities throughout the conda codebase might take time.
Could possibly break other code if other people have depended on conda's internal classes. Hopefully, this is a non-issue, but it's listed here for completeness sake.
The way conda does things works well enough already

This issue is for discussion. Is this a desirable change for the conda community? Why or why not? There are also a few other libraries that handle version parsing and ordering, though none seem to be as mature as packaging.

The text was updated successfully, but these errors were encountered:

msarahan · 2016-02-13T00:01:05Z

Unifying with PyPA would be very good, and I also like getting rid of code that we are maintaining. We should do rigorous regression testing, but overall I support this idea.

mcg1969 · 2016-02-13T00:46:34Z

👍

jakirkham · 2016-02-27T20:13:39Z

This seems like an interesting idea. Generally, I think I am in favor.

I just wonder how this will affect some other cases conda tries to address like build numbers, build strings, or cases where we allow 4 numbers in the version string. Some of these can only be known by having a POC to look at. Though the 4th version number is something I am a little worried about.

Personally, adding pure Python dependencies don't feel like a bad idea to me. As long as they all get packaged in the miniconda and anaconda install, then it seems fine. These particular ones could be useful at solving other issues that we might have (simplifying Python 2/3 compat, working with grammars more simply in versioning, and other things possibly on the build side).

jakirkham · 2016-02-27T20:14:49Z

@stuarteberg, might be of interest.

@ukoethe, I think you would be pretty interested in this and how it shakes out given your existing use cases.

stuarteberg · 2016-02-27T20:58:28Z

This is a nice idea, but we should approach it with extreme caution, as you've noted.

Initial implementation might take a lot of work, but hopefully not too much. The version handling classes have been in conda for a long time.

That code isn't as old as you might think. Last fall, @ukoethe did a ton of work to fix issues in conda's version ordering scheme. He and @asmeurer hashed out most of the details in #1601, but if you really want to catch up, you should probably read everything in this list. (Have fun...)

packaging implements PEP440 and PEP508. The current versioning classes in conda seem to implement the older PEP380.

See the above-mentioned #1601 for discussion of conda's compliance with PEP440. In particular, this comment and this comment. (But there is no mention of PEP508, as far as I can tell.)

code reuse is usually a good idea

maintained by PyPA, who also maintain pip

Indeed, these are laudable motivations. If an externally managed library from a reliable source really does cover all of our use-cases, then I guess we must admit that the amount of previous effort that's been sunk into this issue is not so relevant (even if it would be frustrating to discard it).

Covers as many versioning schemes as I can think of

Of course, it is critical to keep in mind that conda supports multiple languages, and therefore has slightly different requirements than PyPA. For instance, #1652 was implemented specifically for compatibility with R's silly versioning scheme.

PS -- PyPA? I might be biased, but as far as I'm concerned, the "authority" on packaging python packages at this point is the Anaconda project. Maybe they should be using our code, not the other way around... :-P

kalefranz · 2016-02-27T21:43:44Z

This is a great discussion. I'm right now not opposed to using some of this same code. I wouldn't want to make it an external dependency, but in this case explicitly vendored in so we can have tight control over it.

@stuarteberg Thanks for all of those back references. They're especially useful for me.

@groutr Correct me if I'm wrong, but I think creating this ticket was in part motivated by our discussion about conda needing better data modeling. If that's the core of this issue, than it's something I also feel strongly about being critically important. Conda's needs have several layers of generalization beyond PyPA's though, so I'm leery about using packaging as a foundation. I do agree that conda needs to be compatible with PEP440 and PEP508 to the extent possible.

msarahan · 2016-02-28T01:52:47Z

@stuarteberg thanks for the reminder about conda being more general than the ecosystem that the PyPA oversees. I forget that sometimes, but we definitely need to keep that in mind. Conda is more than pip, in concept and in actuality.

ukoethe · 2016-02-28T20:07:34Z

As @stuarteberg said, I thoroughly revised conda's version comparison just a few months ago. It is clean, carefully tested and well documented code that mostly conforms to PEP 440 and gives good error messages for syntactically incorrect version strings. The remaining differences to PEP 440 are deliberate:

While PEP 440 rejects non-conforming version numbers, conda parses and compares them with essentially the same rules as the conforming ones. This is especially important for non-Python (e.g. C/C++ and R) packages that are outside of PEP 440's scope. I also needed this capability to work around the limitations of conda's feature resolution by means of version tags like 1.2.3.vc11 (rejected by PEP 440), but this may no longer be necessary after release of conda 4.
PEP 440 allows version components to be separated by '.', '_' and '-'. conda disallows the latter character because it already has a different meaning here.
PEP 440 requires the letters 'c' to be equivalent to 'rc' and 'r' to be equivalent to 'rev' and 'post'. conda treats 'c' and 'r'as normal characters for three reasons:
- The PEP 440 convention is incompatible with existing conventions outside the Python world. For example, openssl counts versions with letters and requires c < r < rc. PEP 440 implies the order c == rc < r and disallows charcters d, e, ... entirely.
- In practice, equating 'c' with 'rc' and 'r' with 'rev' is a non-issue as long as all packages adhere to self-consistent versioning. That is, packages should not switch arbitrarily between 'c' and 'rc' or 'r' and 'rev', but stick to the convention originally chosen. Since version numbers are never compared across packages, different conventions will never occur in the same comparison expression.
- The PEP 440 special cases may result in unpleasant surprises for users that don't read the smallprint to the end, i.e. for almost everyone.
PEP ignores version component separators that are immediately followed by a letter, whereas conda always respects version components. Thus, PEP 440 interpretes 1.2.dev2 as 1.2dev2, resulting in the order 1.2a1 > 1.2.dev2. In contrast, conda interpretes 1.2.dev2 as 1.2.0dev2, resulting in the order 1.2a1 < 1.2.dev2 (due to the rule that 1.2a* precedes 1.2.0*). I recall that the PEP 440 convention had certain undesirable side effects, but can't remember exactly which. In practice, this is again a non-problem when version numbers are self-consistent in the sense that letters are always added to the same version component.

A small omission of conda's code is that rev is not equivalent to post, but this is trivial to correct.

Regarding PEP 508: According to the documentation, conda does not currently support the operators === and ~=. If desired, these operators should be easy to add without resorting to a major new library. I didn't yet check the other changes proposed in PEP 508.

ukoethe · 2016-02-28T21:18:54Z

I'd also like to suggest that work on build string syntax and semantics shoud be a much higher priority than work on version comparison. This would vastly improve conda's ability to reason about binary compatibilty of compiled packages. Observe that version numbers are an attribute of the source code. While version numbers have implications on binary compatibility (especially when semantic versioning is used), neither concept completely covers the other. Conda already recognizes this by encoding important binary properties of a package into the build string, e.g. py27_vc11_1, but this mechanism appears to be entirely ad-hoc. This woudn't be a problem if build strings were merely informative, but they do in fact take part in version comparison as a ternary comparison criterion (see conda/resolve.py lines 388 and 396). Since build string order is just lexicographic order, and the internal structure of build strings is poorly standardized, the outcome of these comparisons is very fragile.

Even build number comparison (the secondary criterion) is affected by poor build string standardization, because build number parsing fails in certain situations: If the code has not changed in the meantime, the build number 2 is correctly extracted from a build string consisting of a number and a git hash like 2_3af2853, but if the hash happens to contain only digits like 2_12345, the build number becomes 212345. Conda works around this behavior by prefixing hashes with the letter g, but this is again ad-hoc and poorly documented.

I propose to define syntax and semantics of build strings with comparable expressivity (but different meaning) as version numbers, and to implement build string comparison with the same sophistication as version comparison. This would empower recipe designers to express constraints on binary package properties as precisely as is already possible with version numbers for source properties.

mcg1969 · 2016-03-01T14:36:51Z

I can't agree, @ukoethe, with regard to builds strings. I don't believe build strings should be used for anything but disambiguation of filenames---and we should not be using filenames at all to determine the information about a package.

I think the problems that you're seeking to address with regards to binary compatibility are best addressed through dependencies and perhaps through a provides-type mechanism as found in other package managers, which would enable multiple packages with different names to serve the same dependency purpose.

mcg1969 · 2016-03-01T14:40:59Z

On the question of version numbers, though, I appreciate everyone's education about the history and intent of the current VersionOrder class. I am relatively new to the conda project and rely on on the knowledge of those who have come before me.

We agreed yesterday we should tread very carefully on any major revisions to version ordering. In that sense, it might be best if we close this issue to reassure everyone that we're not going to be stepping on toes.

That said, one idea we thought of---for the long term, mind you---is to move to a plugin architecture for version ordering. That is to say, we should make it easy to support multiple version ordering schemes in the code, and augment the metadata to allow packages to specify which version ordering scheme they require.

ukoethe · 2016-03-01T17:31:04Z

I can't agree, @ukoethe, with regard to builds strings.

That's very reasonable as long as the desired functionality is achieved in a different way. But then the build string should be eliminated from version comparison entirely, because the code at conda/resolve.py, lines 388 and 396 is very suspicious and contradicts the documentation (http://conda.pydata.org/docs/spec.html#package-metadata says that the build string is not used).

I'm also not sure about the future role of the build number (technically a part of the build string). Are higher build numbers simply shadowing lower ones, or is the rule more complex? For example, if build 1 defines a feature, but build 2 doesn't, which one will version resolution consider if the feature is being tracked? If build numbers are to remain a part of the build string and continue to take part in version resolution (which I think will be the case), there should at least be an unambiguous syntax to indetify the build number in the build string, and improved documentation.

plugin architecture for version ordering

This is a good idea, although I'm not sure if a full fledged plugin architecture is necessary. Probably it will be sufficient to provide a small and fixed set of schemes to pick from. Plugins always imply the possibility that a users doesn't have the plugin, which worsens the installation experience.

mcg1969 · 2016-03-01T18:04:55Z

Indeed, build strings are being removed from comparison.

mcg1969 · 2016-03-01T18:05:28Z

The removal of file name information from the logic is a work in progress.

jakirkham · 2016-03-01T19:10:05Z

I think I agree that build strings should be removed from comparison. It isn't clear to me how a comparison should work in the case of build strings so until we understand what that should be, if any, simple removal looks like the right path.

groutr · 2016-03-12T01:49:56Z

I'm glad to see the lively discussion here. I learned a lot from this discussion. @kalefranz, I agree that conda desperately needs to coalesce its ways of internally representing bits of data.

The idea that @mcg1969 has about a pluggable interface for version comparisons is interesting. Are there formal versioning schemes the majority of software follows, or is it more per project preferences? I'm already aware of PEP440 for Python, SemVer, and even/odd releases. I imagine each language community has a common "best practice" versioning scheme. Are there others?

jakirkham · 2016-03-21T12:14:08Z

Something to consider (though a bit off topic), we may want more than just pluggable versions systems. In particular, we may want pluggable languages. Versioning is one aspect. Though there are others like integrating with the language's package manager (if it has one). Also, the ability to generate a simple recipe using conda skeleton. This will likely yield some very nicely refactored code and hopefully make conda more extensible.

On these points, I think @alexbw would provide valuable input to this conversation given that he has recently worked on adding Lua support to conda and conda build. He may have some thoughts on how this refactoring should proceed to make the process of adding a language simpler.

AnneTheAgile · 2016-07-29T19:45:38Z

fyi, required by this ticket;
Cannot pass environment.yml file to conda create. #2124

github-actions · 2021-09-12T06:01:09Z

Hi there, thank you for your contribution to Conda!

This issue has been automatically locked since it has not had recent activity after it was closed.

Please open a new issue if needed.

groutr added the 2 - Needs Discussion label Feb 12, 2016

msarahan mentioned this issue Feb 23, 2016

Cannot pass environment.yml file to conda create. #2124

Closed

kalefranz added Data Model and removed 2 - Needs Discussion labels Mar 10, 2016

ukoethe mentioned this issue Mar 29, 2016

WIP: Build customization conda/conda-build#848

Closed

msarahan mentioned this issue Jul 28, 2016

__conda_buildnum__.txt feature is broken (and its deprecation is not documented) conda/conda-build#1023

Closed

kalefranz added the pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding label May 4, 2017

kalefranz added the tag:important-discussion label May 12, 2017

This was referenced May 12, 2017

Conda gives "Malformed version string" for versions with wildcard after underscore in 4.2.0 #3964

Closed

Exclude Pre-Releases #5021

Open

kalefranz closed this as completed Jun 2, 2017

conda deleted a comment from swevrywhere Nov 13, 2017

This was referenced Jan 26, 2018

conda install does not pick latest development builds from anaconda #5675

Closed

Conda confused about version ordering? #5916

Closed

version 2.0.0~alpha0 is not a valid conda version conda-forge/pyside2-feedstock#14

Closed

jakirkham mentioned this issue Mar 26, 2018

linter valid version numbers conda-forge/conda-smithy#722

Closed

ColCarroll mentioned this issue Jun 1, 2018

Theano dependency in conda-forge pymc-devs/pymc#2995

Closed

kalefranz mentioned this issue Jan 31, 2019

VersionOrder __lt__() gives incorrect results #8194

Closed

github-actions bot added the locked [bot] locked due to inactivity label Sep 12, 2021

github-actions bot locked as resolved and limited conversation to collaborators Sep 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version parsing/ordering in conda #2071

Version parsing/ordering in conda #2071

groutr commented Feb 12, 2016

msarahan commented Feb 13, 2016

mcg1969 commented Feb 13, 2016

jakirkham commented Feb 27, 2016

jakirkham commented Feb 27, 2016

stuarteberg commented Feb 27, 2016

kalefranz commented Feb 27, 2016

msarahan commented Feb 28, 2016

ukoethe commented Feb 28, 2016

ukoethe commented Feb 28, 2016

mcg1969 commented Mar 1, 2016

mcg1969 commented Mar 1, 2016

ukoethe commented Mar 1, 2016

mcg1969 commented Mar 1, 2016

mcg1969 commented Mar 1, 2016

jakirkham commented Mar 1, 2016

groutr commented Mar 12, 2016

jakirkham commented Mar 21, 2016

AnneTheAgile commented Jul 29, 2016

github-actions bot commented Sep 12, 2021

Version parsing/ordering in conda #2071

Version parsing/ordering in conda #2071

Comments

groutr commented Feb 12, 2016

msarahan commented Feb 13, 2016

mcg1969 commented Feb 13, 2016

jakirkham commented Feb 27, 2016

jakirkham commented Feb 27, 2016

stuarteberg commented Feb 27, 2016

kalefranz commented Feb 27, 2016

msarahan commented Feb 28, 2016

ukoethe commented Feb 28, 2016

ukoethe commented Feb 28, 2016

mcg1969 commented Mar 1, 2016

mcg1969 commented Mar 1, 2016

ukoethe commented Mar 1, 2016

mcg1969 commented Mar 1, 2016

mcg1969 commented Mar 1, 2016

jakirkham commented Mar 1, 2016

groutr commented Mar 12, 2016

jakirkham commented Mar 21, 2016

AnneTheAgile commented Jul 29, 2016

github-actions bot commented Sep 12, 2021