Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate Versioneer #263

Closed
wants to merge 2 commits into from
Closed

Conversation

ebenolson
Copy link
Member

This adds Versioneer to automatically set lasagne.__version__ based on the git tag and commit.

If merged, should also create a tag git tag 0.1-dev or similar.

@ebenolson ebenolson mentioned this pull request May 22, 2015
@benanne
Copy link
Member

benanne commented May 22, 2015

I only glanced over it but this looks... really complicated. Do we really need something like this to solve this problem? Are there no simpler solutions that satisfy our needs?

@dnouri
Copy link
Member

dnouri commented May 22, 2015

I'd say: Does this problem that we're trying to solve here really exist? I'm totally fine with versions out of Git just having a ".dev" at the end. That's always worked fine for me.

It's easy enough to ask people to run git log -n1 to find out which Git version they're working with -- if it's really needed. Hopefully soon, users will mostly use proper releases anyway.

Btw, here's how I set __version__ in another project: https://github.com/ottogroup/palladium/blob/master/palladium/__init__.py

@ebenolson
Copy link
Member Author

@benanne there is a simpler alternative here https://github.com/clearclaw/pyver but it's GPL, not public domain. However, while this looks complicated since the source is embedded, the actual setup only took 10 minutes and I doubt it would ever need to be touched again.

@dnouri given the number of users that don't even know if they're using nolearn or lasagne, I think anything that makes it simpler to get information is good. Many people's knowledge of git ends at git clone.

Also, this makes one less thing to remember when creating a release.

Somewhat OT, but do you really think people are going to mainly use release versions? It seems like Lasagne's intended audience will want the cutting edge.

@benanne
Copy link
Member

benanne commented May 22, 2015

I think it's a bit iffy to copy and paste a bunch of code into the library like that though, updating it will be a mess. It adds a lot of complexity to the codebase for something I consider of fairly minor importance, hence my reservations. Of course you could argue that this is somewhat separate from the main code base, but it's still hundreds of lines of Python sitting in our repo :)

Somewhat OT, but do you really think people are going to mainly use release versions? It seems like Lasagne's intended audience will want the cutting edge.

That is a good point. I think the situation will be somewhat similar to Theano, where most casual users probably use a release but the most active contingent of the user base will use master from git.

@ebenolson
Copy link
Member Author

I think it's a bit iffy to copy and paste a bunch of code into the library like that though, updating it will be a mess. It adds a lot of complexity to the codebase for something I consider of fairly minor importance, hence my reservations. Of course you could argue that this is somewhat separate from the main code base, but it's still hundreds of lines of Python sitting in our repo :)

The big files are autogenerated by the versioneer setup script, I would say it's more like adding something to requirements.txt than copy-pasting.

You're right though, it does add a lot of lines for something pretty minor. I felt this was less work and less brittle than hacking something similar together ourselves, but like @dnouri says, this problem may not really need solving.

@f0k
Copy link
Member

f0k commented May 23, 2015

Regarding @dnouri's comment from the original thread:

I think you want to check that you have the right version at install time, not at runtime (when it's too late).

Nope, at runtime. I want to be able to write code that works on different versions of (bleeding-edge) Lasagne. That would also make it easier to keep bleeding-edge nolearn in sync with bleeding-edge lasagne. Granted, in most cases, you could just test whether a particular feature or keyword argument is there, but I think it would make life easier sometimes.

Also, how would you check that 715c461 is newer than 92f501f?

That's exactly what versioneer solves. Quoting their README:

The default "pep440" style yields strings like 0.11, 0.11+2.g1076c97, or 0.11+2.g1076c97.dirty.

So I could add a check like if lasagne.__version__ >= '0.1+23' to check if the version is at least 23 commits after the last release.

Eben said:

given the number of users that don't even know if they're using nolearn or lasagne

:)

Somewhat OT, but do you really think people are going to mainly use release versions? It seems like Lasagne's intended audience will want the cutting edge.

That's what I thought as well. We could try to "release early, release often", but we probably don't want to do that after every single feature we add.

Sander said:

I think it's a bit iffy to copy and paste a bunch of code into the library like that though, updating it will be a mess.

I don't think the versioneer.py should be in there at all, just the _version.py that was autogenerated once and does not need to be updated ever again. I'm not sure why it uses versioneer.get_version() in setup.py instead of _version.get_versions()['version']. In any case, _version.py is still pretty long, we can see if there's an easier way to achieve the same. Numpy/Scipy/Matplotlib and Theano also have some versioning magic for checkouts, I think.

@benanne
Copy link
Member

benanne commented May 23, 2015

Theano also have some versioning magic for checkouts, I think.

They do, but it only works if you python setup.py install it I think, otherwise the version shows up as 'unknown'. At least it did last time I checked.

@ebenolson
Copy link
Member Author

Another argument - if someone installs from git then deletes the source, or uses pip install git+http://github.com/Lasagne/Lasagne.git, it won't help to tell them to run git log -n1.

Most of the time the advice will be "install the latest github version", but if someone does encounter a real bug, it would be nice if we could determine what code they were running.

@ebenolson
Copy link
Member Author

Perhaps this would be more suitable?

https://github.com/ioam/param/blob/master/param/version.py

It requires manually updating the version string for each release, but it's a lot simpler and still provides revision information.

@f0k
Copy link
Member

f0k commented May 23, 2015

It requires manually updating the version string for each release

Which they sell as as a feature. They check whether the declared version in setup.py matches the version in __version__ and the latest v*.* git tag and refuse uploading to pip if there is a mismatch.

but it's a lot simpler and still provides revision information.

Even for git archives, it seems. Not sure if that would include all the pip install git:... forms, though, versioneer might have an edge here because it immortalizes the version as a fixed string on setup.py install. Also I don't really like that it makes __version__ a Version class instance instead of a string, but that seems easy to change.

We're getting closer... I'd still very much like a 0.11+2.g1076c97 style version independent of how it's installed. The basic ingredients seem to be git describe --long --match 'v*.* --dirty (to get a version string from a git checkout), the export-subst git attribute (to materialize a version string in git archive), and some solution for setup.py install (to rescue a version string from a git checkout into an installed version).

Oh, and by the way, another reason for having a descriptive __version__ was to be able to make #262 work correctly. This needs at least a distinction between "release x.y" and "latest master" (which is probably the minimum we'd all agree on supporting anyway, but which could be solved with a simple string in __init__.py).

@dnouri
Copy link
Member

dnouri commented May 23, 2015

Regarding @dnouri's comment from the original thread:

I think you want to check that you have the right version at install
time, not at runtime (when it's too late).

Nope, at runtime. I want to be able to write code that works on
different versions of (bleeding-edge) Lasagne.

You can specify that your code works with different versions of Lasagne
fine with both setup.py and requirements.txt. (By the way, here's a
pretty good post explaining the difference between the two.)

The problem with doing a runtime check is that it's too late. At
install time you can install the required version if it's not there, at
runtime you can just bail out.

That would also make it easier to keep bleeding-edge nolearn in sync
with bleeding-edge lasagne.

I don't know how?

The issue with keeping nolearn in sync with Lasagne is that the latter
introduces breaking changes in Git (and that's ok). So what I do is I
say: this version of nolearn is known to work with this Git version of
Lasagne (through requirements.txt). If you want to use a newer version
of Lasagne then you're on your own. (No way I'll guarantee that every
latest commit in Git doesn't break things.) This will be more relaxed
once there's Lasagne releases and maybe a more conservative deprecation
policy.

The default "pep440" style yields strings like 0.11,
0.11+2.g1076c97, or 0.11+2.g1076c97.dirty.

So I could add a check like if lasagne.version >= '0.1+23' to
check if the version is at least 23 commits after the last release.

Yeah I guess I don't really think allowing ranges of versions from Git
is useful.

It seems contradictory that on the one hand you're using "bleeding
edge", but on the other hand you're very exact with checking versions.
Maybe better to just make a Lasagne release instead. :-) (Or, in the
meantime, use one "known good version" in requirements.txt.)

Somewhat OT, but do you really think people are going to mainly use
release versions? It seems like Lasagne's intended audience will
want the cutting edge.

That's what I thought as well. We could try to "release early,
release often", but we probably don't want to do that after every
single feature we add.

I'm not someone who uses every new feature immediately. Feeling really
conservative now. :-)

Still a big -1 from me for any of this (IMO) overcomplicated versions
stuff. Let's just stick with the standard ways of dealing with
dependencies. It seems to work fine for everyone else, too (and we're
not that special ;-).

@f0k
Copy link
Member

f0k commented May 24, 2015

You can specify that your code works with different versions of Lasagne fine with both setup.py and requirements.txt.

Let's say I've got Lasagne installed with python setup.py develop, and that other code that depends on Lasagne doesn't even have a setup.py, but is just part of an experiment specification (similar to a Pylearn2 config file). Should I have a requirements.txt for every config file to record the commit of Lasagne I had when writing it? Should my training script collect the different requirements.txt files and do something with it before running? It would seem more convenient to me to add some "minimum-version" check or version switch to the config files that need it. (Granted, I'm not sure that would be very often, but my crystal ball tells me it would be useful some time.)

At install time you can install the required version if it's not there, at runtime you can just bail out.

You can also be compatible with multiple versions via a if version < ...: do this; else: do that. (Granted, that's only useful when working with an unreleased version that may introduce breaking changes. To react on non-breaking changes like additions to the library, it's easier to directly check if a particular class or function is defined.)

I'm not someone who uses every new feature immediately.

Say what?! :) No, that's not what I was implying. It's enough that in our target audience, there will always be someone who wants the new feature we just added (say, some idea from a recent paper), so there will always be users that will use a version of Lasasgne between releases. And it wouldn't be bad to have __version__ produce useful output for them, especially when they find bugs.

Still a big -1 from me for any of this (IMO) overcomplicated versions stuff.

Yes, I've sensed that :)

Soo... I don't feel that strongly about pep440-style versions, I just thought they'd be very nice to have. If the majority is against it, we can just have a fixed __version__ = 'something' line in __init__.py that's set to a version number for every release, and to 'bleeding-edge' for every commit that's not a release. Basically, that's saying: "Either use one of the releases, or use the latest git commit and don't fall behind".

@ebenolson
Copy link
Member Author

Still a big -1 from me for any of this (IMO) overcomplicated versions
stuff. Let's just stick with the standard ways of dealing with
dependencies. It seems to work fine for everyone else, too (and we're
not that special ;-).

Plenty of other projects do something similar, including numpy, scipy and Theano.

@ebenolson ebenolson closed this May 24, 2015
@ebenolson ebenolson reopened this May 24, 2015
@ebenolson
Copy link
Member Author

Oops, clicked close by accident. Anyway, I don't really get why this is considered complicated. It's developed by an active external project, was simple to configure and works as advertised. And if it does break, what real harm does it do?

@benanne
Copy link
Member

benanne commented May 24, 2015

it adds a ton of code for a relatively minor feature, that's my main concern. Other than that I'm not really bothered by whatever versioning scheme we end up using.

If it was only _version.py and not the versioneer.py that would be a little more reasonable.

@ebenolson
Copy link
Member Author

The issue with keeping nolearn in sync with Lasagne is that the latter
introduces breaking changes in Git (and that's ok). So what I do is I
say: this version of nolearn is known to work with this Git version of
Lasagne (through requirements.txt). If you want to use a newer version
of Lasagne then you're on your own. (No way I'll guarantee that every
latest commit in Git doesn't break things.) This will be more relaxed
once there's Lasagne releases and maybe a more conservative deprecation
policy.

With something like this in place, nolearn could check on import that the user had a known good Lasagne commit and throw an exception or warning otherwise.

Lasagne has gone from zero to a pretty featureful library without a single release, I just have a hard time imagining the schedule will change any time soon. Honestly I don't think the traditional release model is that useful for a project like this which is used more for experimenting than building production systems.

@benanne
Copy link
Member

benanne commented May 24, 2015

I think most people who use it in combination with nolearn would prefer to stick with a released version though. They may not be the primary target audience but I still think putting out some releases now and then would be a good thing to do. As I mentioned before I see it a little bit like the Theano situation, where there are sporadic releases but the most active part of the user base uses master from git.

@f0k
Copy link
Member

f0k commented May 24, 2015

but the most active part of the user base uses master from git.

But not the largest part. I was dumbfounded to see how many of my colleagues were using Theano 0.6.0 just because nobody had told them otherwise. We might want to encourage users to use the latest master in our installation instructions even after we have a release, or at least make it sound less scary than in the Theano instructions.

@dnouri
Copy link
Member

dnouri commented May 26, 2015

Jan writes:

Let's say I've got Lasagne installed with python setup.py develop,
and that other code that depends on Lasagne doesn't even have a
setup.py, but is just part of an experiment specification (similar to
a Pylearn2 config file). Should I have a requirements.txt for every
config file to record the commit of Lasagne I had when writing it?

I'd say clearly yes if you want to make sure your experiment is truly
reproducible, even in three months' time. With only a minimum
requirement, or similarly, with "latest Git", it's too easy to break
things otherwise. Even in very subtle ways, e.g. if Lasagne changed
the default nonlinearity.

This is what Sander's team writes in their ndsb docs:

We recommend installing the following commit, which our code is known
to work with: ​f445b71

If they said "at least f445b71", their instructions would likely stop
working soon.

Should my training script collect the different requirements.txt
files and do something with it before running?

I'd tell your users to run pip install -r requirements.txt before
running the training script (or else, rely on them to know what they're
doing).

pip freeze should help with writing the requirements file. It allows
you to collect all requirements at the time you write the experiment in
one file.

Soo... I don't feel that strongly about pep440-style versions, I just
thought they'd be very nice to have.

The wording here seems wrong. This issue is not a matter of being
pep440 compatible or not, i.e. we're perfectly pep440 compatible
without the git sub-versions feature suggested here.
If the majority is against it, we can just have a fixed version =
'something' line in init.py that's set to a version number for
every release, and to 'bleeding-edge' for every commit that's not a
release.

I think the better thing to do here is to use the following in
init.py so that it's in sync with what version is defined in
setup.py:

import pkg_resources
version = pkg_resources.get_distribution("Lasagne").version

And then in setup.py follow pep440, where proper release version
numbers are e.g. '1.0' and development versions (i.e. our versions in
Git) have e.g. '1.0.dev' (which would be the development version that
comes before 1.0, according to the PEP). The end result wouldn't be
much different to what you're suggesting, it's just more in line with
what tools like pip would expect. (A version that's called
'bleeding-edge' is broken as far as pip and pep440 are concerned.)

Eben writes:

With something like this in place, nolearn could check on import that
the user had a known good Lasagne commit and throw an exception or
warning otherwise.

That's correct. Though it also means this user ignored the
installation instructions (which seems to happen frequently enough).
I'd rather fix this at the documentation level and explain: Use pip
install -r some-requirements.txt instead of throwing together random
versions from Git, or maybe random versions that you happen to have
installed (which may also be too new).

@f0k
Copy link
Member

f0k commented May 26, 2015

The wording here seems wrong. This issue is not a matter of being pep440 compatible or not, i.e. we're perfectly pep440 compatible without the git sub-versions feature suggested here.

Yes, sorry, I meant using the local version identifier to include the number of commits and commit SHA1 for versions in between releases, as in the 0.11+2.g1076c97.dirty style.

pip freeze should help with writing the requirements file. It allows you to collect all requirements at the time you write the experiment in one file.

Hmm, given that I should also record the version of Theano, probably that's the cleanest way. However, if I rely on Theano not changing in incompatible ways and lasagne.__version__ was including the commit SHA1, I could just record that along with the model files, which would be a little more convenient than invoking and parsing subprocess.check_output(['pip', 'freeze']) or figuring out the equivalent sequence of pip function calls.

And then in setup.py follow pep440, where proper release version numbers are e.g. '1.0' and development versions (i.e. our versions in Git) have e.g. '1.0.dev' (which would be the development version that comes before 1.0, according to the PEP).

So we should change that in our current setup.py (it currently misses the dot).

The end result wouldn't be much different to what you're suggesting, it's just more in line with what tools like pip would expect.

👍

That's correct. Though it also means this user ignored the installation instructions (which seems to happen frequently enough).

The world isn't perfect. Even with updated documentation, people will use whatever version they have installed, and report bugs. We should provide an easy way for them to let us know which version they have installed. pip freeze | grep Lasagne would work, but place some additional burden on us (such as running git describe to make sense of the SHA1 commit id). I still don't think 0.11+2.g1076c97.dirty would be superfluous, but I agree that we shouldn't include a thousand code lines for that functionality.

So... until we can find a simpler "versioneer" variant (the one from param seems to miss some features) that lowers the barrier, do you want to update the version in setup.py to 0.1.dev and add a __version__ = snippet to __init__.py? We can then update #262 to link to the latest master whenever __version__ contains dev, and to a specific version otherwise. (We cannot link to the correct commit for development versions with this versioning scheme.) That will at least give us something for the first release.

@benanne
Copy link
Member

benanne commented May 26, 2015

Are we doing 1.0 or 0.1 by the way? Both seems to be used interchangeably in this thead, but we should probably decide :) I think 0.1 looks a little more "modest" and it follows the same convention as Theano, so I have a very slight preference for that but I don't care much either way.

@ebenolson
Copy link
Member Author

What about copying the revelant bits from Theano?

It looks like basically the same thing used by numpy and scipy.

@f0k
Copy link
Member

f0k commented May 26, 2015

What about copying the revelant bits from Theano?

That only sets the version to whatever you had when running setup.py. When you install with python setup.py develop, the version doesn't get updated when you update the source tree.

Are we doing 1.0 or 0.1 by the way? Both seems to be used interchangeably in this thead

No, 1.0 and 1.0.dev was just an example by Daniel to illustrate pep440-compatible development release versions. I guess we should stick to 0.1 :)

@benanne
Copy link
Member

benanne commented Jun 10, 2015

I guess we've decided not to do this for now? It's superseded by #272, right? Or did I misunderstand that? If it is I'll close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants