Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] install.upgrade: add pre-upgrade sanity check #891

Closed
wants to merge 2 commits into from

Conversation

smithfarm
Copy link
Contributor

Add a pre-upgrade sanity check to the install.upgrade task. This sanity check:

  1. determines the version of Ceph installed on the remote ("installed_version")
  2. determines the version of Ceph in the target gitbuilder repo
    ("upgrade_version")
  3. if "installed_version" is greater than "upgrade_version", fail immediately

Since the upgrade itself cannot succeed if upgrade_version < installed_version,
it's better to fail before attempting the upgrade.

Before this patch, we were failing in the verify_package_version() sanity check
which is run after attempting (and failing) to upgrade the packages.

Fixes: http://tracker.ceph.com/issues/16521
Signed-off-by: Nathan Cutler ncutler@suse.com

@smithfarm smithfarm changed the title install.upgrade: add pre-upgrade sanity check [DNM] install.upgrade: add pre-upgrade sanity check Jun 29, 2016
@zmc
Copy link
Member

zmc commented Jun 29, 2016

I like the idea here, but I don't see how it would cause a job to fail; it looks like it just adds debugging output. Maybe I'm being thick this morning, but could you explain please?

Edit: I didn't notice the DNM at first :)

@vasukulkarni
Copy link
Contributor

If someone wants to try a downgrade test later, will it affect that path?

@smithfarm smithfarm changed the title [DNM] install.upgrade: add pre-upgrade sanity check install.upgrade: add pre-upgrade sanity check Jun 29, 2016
@smithfarm
Copy link
Contributor Author

@vasukulkarni I'm not aware of any downgrade tests in teuthology. If, hypothetically speaking, such a test were to be implemented, this function upgrade_common() could not be used because it fails in a downgrade scenario.

@smithfarm
Copy link
Contributor Author

@zmc Yeah, I opened the PR early. Now it's ready for review I think.

@smithfarm
Copy link
Contributor Author

@yuriw Care to review?

@smithfarm
Copy link
Contributor Author

Changelog:

  • added exception thrower
  • fixed whitespace

@smithfarm
Copy link
Contributor Author

Changelog:

  • fixed syntax error(!)

@smithfarm
Copy link
Contributor Author

Test running at http://pulpito.ceph.com/smithfarm-2016-06-29_12:02:35-upgrade:jewel-x:stress-split-kraken-distro-basic-vps/

Unfortunately, master was merged into kraken so the exception won't trigger.

i=installed_version,
u=upgrade_version
))
if installed_version > upgrade_version:
Copy link
Member

@zmc zmc Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type are these values?

Edit: don't mean to be snarky :)

What I'm getting at is, for example, _get_gitbuilder_project(ctx, remote, node).version is going to be something this, which is currently 11.0.0-111.gbe55cfc. Will the above comparison work properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are strings. This is the output from the debug message added in this commit:

2016-06-29T12:46:48.267 INFO:teuthology.task.install:Ceph rpm upgrade from 10.2.2 to 11.0.0

I believe the comparison will work just fine, but do correct me if I'm wrong!

Copy link
Member

@zmc zmc Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My basic tests didn't look terribly promising:

>>> '11.0.0-111.gbe55cfc' > '11.0.0-111.gbe55cfc'
False
>>> '11.0.0-111.gbe55cfc' > '10.0.0-111.gbe55cfc'
True
>>> '11.0.0-111.gbe55cfc' > '9.0.0-111.gbe55cfc'
False
>>> '10.0.0-111.gbe55cfc' > '9.0.0-111.gbe55cfc'
False

Might want to look into a more robust version comparison
Edit: bonus points if you split any/all of this out into a separate function and write unit tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I do any extra work, please explain why the actual values of installed_version and upgrade_version are "10.2.2" and "11.0.0" respectively? If you don't believe me, see the log in http://pulpito.ceph.com/smithfarm-2016-06-29_12:02:35-upgrade:jewel-x:stress-split-kraken-distro-basic-vps/ (search for "upgrade from")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but I see your point - '9.0.0' will indeed be greater than '10.0.0'

OK, back to the drawing board :-)

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

@zmc I modified it to use StrictVersion for the version number comparison. See, for example:

>>> from distutils.version import StrictVersion
>>> StrictVersion('10.0.0') > StrictVersion('9.0.0')
True

Of course, if there are any circumstances under which the values will be as you indicated, this will fail:

>>> StrictVersion('10.0.0-111.gbe55cfc') > StrictVersion('9.0.0-111.gbe55cfc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/distutils/version.py", line 40, in __init__
    self.parse(vstring)
  File "/usr/lib64/python2.7/distutils/version.py", line 107, in parse
    raise ValueError, "invalid version number '%s'" % vstring
ValueError: invalid version number '10.0.0-111.gbe55cfc'

If you like, I can use re to pinch off just the x.y.z part. However, in my tests so far both values are showing up as just "x.y.z" and not "x.y.z-something" so I wonder if this extra string processing is justified.

I guess it all comes down to how robust you think this needs to be.

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

@zmc For rpm GitbuilderProject.version should always just be x.y.z because that part of the string is already being extracted here: https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L699-L702

Will test [*] to see if this holds for deb as well as rpm.

[*] http://pulpito.ceph.com/smithfarm-2016-06-29_14:29:51-upgrade:jewel-x:stress-split-kraken-distro-basic-vps/

@smithfarm
Copy link
Contributor Author

@zmc So, if you think it's possible that the strings might be "x.y.z-somefoo" or "vx.y.z-somefoo" instead of "x.y.z", I can write a filter function that takes a string, tests whether it is "x.y.z" or "x.y.z-somefoo" or "vx.y.z-somefoo", and returns the string "x.y.z" on success or null on failure. Then I can modify the logic so the comparison is performed only if both strings pass this filter function.

Would that make this PR more palatable?

@zmc
Copy link
Member

zmc commented Jun 29, 2016

@smithfarm well, my example from before has a value of 11.0.0-111.gbe55cfc, so the current version of this PR will break very many tests. In fact, even this is problematic:

>>> StrictVersion('11.0.0-111') > StrictVersion('9.0.0-111')
[...]
ValueError: invalid version number '11.0.0-111'

I doubt if we can safely throw away the -111. I'm leaning more and more towards having unit tests here. Have you tried LooseVersion ?

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

So, the package version for deb is indeed "x.y.z-foo"

2016-06-29T14:53:57.081 INFO:teuthology.task.install:Ceph deb upgrade from 10.2.2-63-g8542898-1trusty to 11.0.0-82-g3ad5b63-1trusty

whereas for rpm it is just "x.y.z"

2016-06-29T12:46:48.267 INFO:teuthology.task.install:Ceph rpm upgrade from 10.2.2 to 11.0.0

@smithfarm
Copy link
Contributor Author

Which means for deb we can try to compare on the -111 part but for rpm we cannot.

Note that I was not intending for this sanity check to catch cases like "11.0.0-111" vs. "11.0.0-112" - the failure scenario this is designed to catch is when, e.g., jewel is at 10.2.2 and master is at 10.2.0 (which we were facing in http://tracker.ceph.com/issues/14995 until Sage added the "v11.0.0" tag to master)

@zmc
Copy link
Member

zmc commented Jun 29, 2016

I don't understand your first statement. Why can't we compare that?

And, similarly, re: the scope of the feature: why wouldn't we want to test the release portion (-111) ?

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

@zmc Because #891 (comment) - the quoted lines are from actual logs from actual tests of this PR branch.

Here is the reason: https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L699-L702

@smithfarm
Copy link
Contributor Author

So first we would have to fix how teuthology handles rpm version numbers so it matches deb, then we could compare on the -111

@smithfarm
Copy link
Contributor Author

And, indeed, LooseVersion looks like a better fit!

>>> LooseVersion('10.2.2-63-g8542898-1trusty') < LooseVersion('11.0.0-82-g3ad5b63-1trusty')
True

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

Since the failure scenario I'm checking for is missing version tags in the git repo, and these tags always have the form "vx.y.z", I am only interested in comparing on the the x.y.z part and failing if the source is greater than the target. If they are equal, or the source is less than the target, that is "sane enough" for the test to continue IMO.

If somehow the source were "10.0.0-111" and the target were "10.0.0-110", the upgrade will fail, but that is a different (and purely hypothetical) failure scenario - nothing to do with a missing "vx.y.z" tag in the target branch.

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

Changelog:

  • rebase
  • use LooseVersion for the version comparison, as it properly handles both rpm version numbers (e.g. 10.2.0) and deb version numbers (e.g. 10.2.2-63-g8542898-1trusty)

@zmc
Copy link
Member

zmc commented Jun 29, 2016

Just a minor correction: the RPM version numbers are not what you think they are (see the example I posted a couple times)

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

If that were the case, it would be a major correction, but I beg to differ. The rpm version numbers really are just "x.y.z". Have you examined #891 (comment) (the quoted lines are from actual logs from actual tests of this PR branch) and https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L699-L702 where the "git describe" output is munged down to just "x.y.z" ?

@zmc
Copy link
Member

zmc commented Jun 29, 2016

Okay, here's my example a third time. Please click that link.

Packages in that repo are named e.g. ceph-11.0.0-118.g13c13c7.x86_64.rpm

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

Zack, we're going around in circles here. I saw your example, but GitbuilderProject.version() still returns just "11.0.0" in your example.

This is proven by #891 (comment)

@smithfarm
Copy link
Contributor Author

Also, I think this is correct behavior, because in rpm the version really is just "11.0.0" - the "-111..." part is the release.

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 29, 2016

The point is that LooseVersion handles both rpm and deb version numbers properly, and there is no known failure scenario that would cause an upgrade suite to attempt to upgrade from a higher to a lower release (within the same version).

Anyway, let's sleep on it and revisit tomorrow :-)

@smithfarm
Copy link
Contributor Author

BTW LooseVersion() properly handles the version number format from your example.

@zmc
Copy link
Member

zmc commented Jun 29, 2016

Ah, I see your point now. It may or may not be a bug that GbP.version omits the release for rpms, but includes the debian_revision for debs. In any case, for this scope it's probably moot as LooseVersion does the right thing in all cases (that I tried)

@smithfarm
Copy link
Contributor Author

Do you still want me to write unit tests? I am doing so right now but I don't know how to test them.

@smithfarm
Copy link
Contributor Author

Let me know if you want me to explore this path further or if I can drop these last two commits.

@ktdreyer
Copy link
Member

In my tests internal to Red Hat I use the Python rpm library with rpm.labelCompare(). Can we do the same thing here?

@yuriw
Copy link
Contributor

yuriw commented Jun 30, 2016

@smithfarm sorry I missed your ping, but @zmc is much better person to review this PR

@smithfarm smithfarm changed the title install.upgrade: add pre-upgrade sanity check [DNM] install.upgrade: add pre-upgrade sanity check Jun 30, 2016
@smithfarm
Copy link
Contributor Author

@ktdreyer We could implement separate comparison logic for rpm and deb, yes, but I guess it's not worth the trouble if, as we suspect, LooseVersion works for both.

@zmc I found out that teuthology unit tests only work in Ubuntu. Now I have a Docker container set up for this purpose and can run the tests. Running them on this PR, I saw that it breaks test_upgrade_common(), so I'm reinstating DNM until I can straighten that out.

@zmc
Copy link
Member

zmc commented Jul 1, 2016

@smithfarm re: unit tests "only on Ubuntu" - they at least also work on MacOS, even. Would you please file a ticket? I want them to work everywhere.

jan--f pushed a commit to jan--f/teuthology that referenced this pull request Jan 25, 2017
Fixes regression introduced by ceph#891

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Add a pre-upgrade sanity check to the install.upgrade task. This sanity check:

1. determines the version of Ceph installed on the remote ("installed_version")
2. determines the version of Ceph in the target gitbuilder repo
("upgrade_version")
3. if "installed_version" is greater than "upgrade_version", fail immediately

Since the upgrade itself cannot succeed if upgrade_version < installed_version,
it's better to fail before attempting the upgrade.

Before this patch, we were failing in the verify_package_version() sanity check
which is run after attempting (and failing) to upgrade the packages.

Fixes: http://tracker.ceph.com/issues/16521
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
@smithfarm smithfarm closed this Jan 25, 2017
jan--f pushed a commit to jan--f/teuthology that referenced this pull request Jan 25, 2017
Fixes regression introduced by ceph#891

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit c84bde3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants