Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version parsing/ordering in conda #2071

Closed
groutr opened this issue Feb 12, 2016 · 19 comments
Closed

Version parsing/ordering in conda #2071

groutr opened this issue Feb 12, 2016 · 19 comments
Labels
locked [bot] locked due to inactivity pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding

Comments

@groutr
Copy link
Contributor

groutr commented Feb 12, 2016

Currently, version parsing and ordering are accomplished in conda via the MatchSpec, VersionSpec, VersionOrder, and Package classes.

I thought I would open discussion on using PyPA's packaging module to handle parsing/ordering/comparing versions in conda. There are a few pros and cons that need to be considered though.

Documentation: https://packaging.pypa.io/en/latest/
Repository: https://github.com/pypa/packaging

Pros:

  • packaging implements PEP440 and PEP508. The current versioning classes in conda seem to implement the older PEP380.
  • code reuse is usually a good idea
  • maintained by PyPA, who also maintain pip
  • Useful abstractions for dealing with versions and version ordering
  • Code is neatly organized and well documented
  • Covers as many versioning schemes as I can think of

Cons:

  • packaging depends on six and pyparsing. Conda strives to have as few external dependencies as it can.
  • Initial implementation might take a lot of work, but hopefully not too much. The version handling classes have been in conda for a long time. Rooting out all of the assumptions and peculiarities throughout the conda codebase might take time.
  • Could possibly break other code if other people have depended on conda's internal classes. Hopefully, this is a non-issue, but it's listed here for completeness sake.
  • The way conda does things works well enough already

This issue is for discussion. Is this a desirable change for the conda community? Why or why not? There are also a few other libraries that handle version parsing and ordering, though none seem to be as mature as packaging.

@msarahan
Copy link
Contributor

Unifying with PyPA would be very good, and I also like getting rid of code that we are maintaining. We should do rigorous regression testing, but overall I support this idea.

@mcg1969
Copy link
Contributor

mcg1969 commented Feb 13, 2016

👍

@jakirkham
Copy link
Member

This seems like an interesting idea. Generally, I think I am in favor.

I just wonder how this will affect some other cases conda tries to address like build numbers, build strings, or cases where we allow 4 numbers in the version string. Some of these can only be known by having a POC to look at. Though the 4th version number is something I am a little worried about.

Personally, adding pure Python dependencies don't feel like a bad idea to me. As long as they all get packaged in the miniconda and anaconda install, then it seems fine. These particular ones could be useful at solving other issues that we might have (simplifying Python 2/3 compat, working with grammars more simply in versioning, and other things possibly on the build side).

@jakirkham
Copy link
Member

@stuarteberg, might be of interest.

@ukoethe, I think you would be pretty interested in this and how it shakes out given your existing use cases.

@stuarteberg
Copy link
Contributor

This is a nice idea, but we should approach it with extreme caution, as you've noted.

Initial implementation might take a lot of work, but hopefully not too much. The version handling classes have been in conda for a long time.

That code isn't as old as you might think. Last fall, @ukoethe did a ton of work to fix issues in conda's version ordering scheme. He and @asmeurer hashed out most of the details in #1601, but if you really want to catch up, you should probably read everything in this list. (Have fun...)

  • packaging implements PEP440 and PEP508. The current versioning classes in conda seem to implement the older PEP380.

See the above-mentioned #1601 for discussion of conda's compliance with PEP440. In particular, this comment and this comment. (But there is no mention of PEP508, as far as I can tell.)

  • code reuse is usually a good idea
  • maintained by PyPA, who also maintain pip

Indeed, these are laudable motivations. If an externally managed library from a reliable source really does cover all of our use-cases, then I guess we must admit that the amount of previous effort that's been sunk into this issue is not so relevant (even if it would be frustrating to discard it).

  • Covers as many versioning schemes as I can think of

Of course, it is critical to keep in mind that conda supports multiple languages, and therefore has slightly different requirements than PyPA. For instance, #1652 was implemented specifically for compatibility with R's silly versioning scheme.

PS -- PyPA? I might be biased, but as far as I'm concerned, the "authority" on packaging python packages at this point is the Anaconda project. Maybe they should be using our code, not the other way around... :-P

@kalefranz
Copy link
Contributor

This is a great discussion. I'm right now not opposed to using some of this same code. I wouldn't want to make it an external dependency, but in this case explicitly vendored in so we can have tight control over it.

@stuarteberg Thanks for all of those back references. They're especially useful for me.

@groutr Correct me if I'm wrong, but I think creating this ticket was in part motivated by our discussion about conda needing better data modeling. If that's the core of this issue, than it's something I also feel strongly about being critically important. Conda's needs have several layers of generalization beyond PyPA's though, so I'm leery about using packaging as a foundation. I do agree that conda needs to be compatible with PEP440 and PEP508 to the extent possible.

@msarahan
Copy link
Contributor

@stuarteberg thanks for the reminder about conda being more general than the ecosystem that the PyPA oversees. I forget that sometimes, but we definitely need to keep that in mind. Conda is more than pip, in concept and in actuality.

@ukoethe
Copy link
Contributor

ukoethe commented Feb 28, 2016

As @stuarteberg said, I thoroughly revised conda's version comparison just a few months ago. It is clean, carefully tested and well documented code that mostly conforms to PEP 440 and gives good error messages for syntactically incorrect version strings. The remaining differences to PEP 440 are deliberate:

  • While PEP 440 rejects non-conforming version numbers, conda parses and compares them with essentially the same rules as the conforming ones. This is especially important for non-Python (e.g. C/C++ and R) packages that are outside of PEP 440's scope. I also needed this capability to work around the limitations of conda's feature resolution by means of version tags like 1.2.3.vc11 (rejected by PEP 440), but this may no longer be necessary after release of conda 4.
  • PEP 440 allows version components to be separated by '.', '_' and '-'. conda disallows the latter character because it already has a different meaning here.
  • PEP 440 requires the letters 'c' to be equivalent to 'rc' and 'r' to be equivalent to 'rev' and 'post'. conda treats 'c' and 'r'as normal characters for three reasons:
    • The PEP 440 convention is incompatible with existing conventions outside the Python world. For example, openssl counts versions with letters and requires c < r < rc. PEP 440 implies the order c == rc < r and disallows charcters d, e, ... entirely.
    • In practice, equating 'c' with 'rc' and 'r' with 'rev' is a non-issue as long as all packages adhere to self-consistent versioning. That is, packages should not switch arbitrarily between 'c' and 'rc' or 'r' and 'rev', but stick to the convention originally chosen. Since version numbers are never compared across packages, different conventions will never occur in the same comparison expression.
    • The PEP 440 special cases may result in unpleasant surprises for users that don't read the smallprint to the end, i.e. for almost everyone.
  • PEP ignores version component separators that are immediately followed by a letter, whereas conda always respects version components. Thus, PEP 440 interpretes 1.2.dev2 as 1.2dev2, resulting in the order 1.2a1 > 1.2.dev2. In contrast, conda interpretes 1.2.dev2 as 1.2.0dev2, resulting in the order 1.2a1 < 1.2.dev2 (due to the rule that 1.2a* precedes 1.2.0*). I recall that the PEP 440 convention had certain undesirable side effects, but can't remember exactly which. In practice, this is again a non-problem when version numbers are self-consistent in the sense that letters are always added to the same version component.

A small omission of conda's code is that rev is not equivalent to post, but this is trivial to correct.

Regarding PEP 508: According to the documentation, conda does not currently support the operators === and ~=. If desired, these operators should be easy to add without resorting to a major new library. I didn't yet check the other changes proposed in PEP 508.

@ukoethe
Copy link
Contributor

ukoethe commented Feb 28, 2016

I'd also like to suggest that work on build string syntax and semantics shoud be a much higher priority than work on version comparison. This would vastly improve conda's ability to reason about binary compatibilty of compiled packages. Observe that version numbers are an attribute of the source code. While version numbers have implications on binary compatibility (especially when semantic versioning is used), neither concept completely covers the other. Conda already recognizes this by encoding important binary properties of a package into the build string, e.g. py27_vc11_1, but this mechanism appears to be entirely ad-hoc. This woudn't be a problem if build strings were merely informative, but they do in fact take part in version comparison as a ternary comparison criterion (see conda/resolve.py lines 388 and 396). Since build string order is just lexicographic order, and the internal structure of build strings is poorly standardized, the outcome of these comparisons is very fragile.

Even build number comparison (the secondary criterion) is affected by poor build string standardization, because build number parsing fails in certain situations: If the code has not changed in the meantime, the build number 2 is correctly extracted from a build string consisting of a number and a git hash like 2_3af2853, but if the hash happens to contain only digits like 2_12345, the build number becomes 212345. Conda works around this behavior by prefixing hashes with the letter g, but this is again ad-hoc and poorly documented.

I propose to define syntax and semantics of build strings with comparable expressivity (but different meaning) as version numbers, and to implement build string comparison with the same sophistication as version comparison. This would empower recipe designers to express constraints on binary package properties as precisely as is already possible with version numbers for source properties.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 1, 2016

I can't agree, @ukoethe, with regard to builds strings. I don't believe build strings should be used for anything but disambiguation of filenames---and we should not be using filenames at all to determine the information about a package.

I think the problems that you're seeking to address with regards to binary compatibility are best addressed through dependencies and perhaps through a provides-type mechanism as found in other package managers, which would enable multiple packages with different names to serve the same dependency purpose.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 1, 2016

On the question of version numbers, though, I appreciate everyone's education about the history and intent of the current VersionOrder class. I am relatively new to the conda project and rely on on the knowledge of those who have come before me.

We agreed yesterday we should tread very carefully on any major revisions to version ordering. In that sense, it might be best if we close this issue to reassure everyone that we're not going to be stepping on toes.

That said, one idea we thought of---for the long term, mind you---is to move to a plugin architecture for version ordering. That is to say, we should make it easy to support multiple version ordering schemes in the code, and augment the metadata to allow packages to specify which version ordering scheme they require.

@ukoethe
Copy link
Contributor

ukoethe commented Mar 1, 2016

I can't agree, @ukoethe, with regard to builds strings.

That's very reasonable as long as the desired functionality is achieved in a different way. But then the build string should be eliminated from version comparison entirely, because the code at conda/resolve.py, lines 388 and 396 is very suspicious and contradicts the documentation (http://conda.pydata.org/docs/spec.html#package-metadata says that the build string is not used).

I'm also not sure about the future role of the build number (technically a part of the build string). Are higher build numbers simply shadowing lower ones, or is the rule more complex? For example, if build 1 defines a feature, but build 2 doesn't, which one will version resolution consider if the feature is being tracked? If build numbers are to remain a part of the build string and continue to take part in version resolution (which I think will be the case), there should at least be an unambiguous syntax to indetify the build number in the build string, and improved documentation.

plugin architecture for version ordering

This is a good idea, although I'm not sure if a full fledged plugin architecture is necessary. Probably it will be sufficient to provide a small and fixed set of schemes to pick from. Plugins always imply the possibility that a users doesn't have the plugin, which worsens the installation experience.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 1, 2016

Indeed, build strings are being removed from comparison.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 1, 2016

The removal of file name information from the logic is a work in progress.

@jakirkham
Copy link
Member

I think I agree that build strings should be removed from comparison. It isn't clear to me how a comparison should work in the case of build strings so until we understand what that should be, if any, simple removal looks like the right path.

@groutr
Copy link
Contributor Author

groutr commented Mar 12, 2016

I'm glad to see the lively discussion here. I learned a lot from this discussion. @kalefranz, I agree that conda desperately needs to coalesce its ways of internally representing bits of data.

The idea that @mcg1969 has about a pluggable interface for version comparisons is interesting. Are there formal versioning schemes the majority of software follows, or is it more per project preferences? I'm already aware of PEP440 for Python, SemVer, and even/odd releases. I imagine each language community has a common "best practice" versioning scheme. Are there others?

@jakirkham
Copy link
Member

Something to consider (though a bit off topic), we may want more than just pluggable versions systems. In particular, we may want pluggable languages. Versioning is one aspect. Though there are others like integrating with the language's package manager (if it has one). Also, the ability to generate a simple recipe using conda skeleton. This will likely yield some very nicely refactored code and hopefully make conda more extensible.

On these points, I think @alexbw would provide valuable input to this conversation given that he has recently worked on adding Lua support to conda and conda build. He may have some thoughts on how this refactoring should proceed to make the process of adding a language simpler.

@AnneTheAgile
Copy link

fyi, required by this ticket;
Cannot pass environment.yml file to conda create. #2124

@kalefranz kalefranz added the pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding label May 4, 2017
@conda conda deleted a comment from swevrywhere Nov 13, 2017
@github-actions
Copy link

Hi there, thank you for your contribution to Conda!

This issue has been automatically locked since it has not had recent activity after it was closed.

Please open a new issue if needed.

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Sep 12, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding
Projects
None yet
Development

No branches or pull requests

8 participants