Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce an upgrade framework and related testing #659

Merged
merged 5 commits into from
Nov 17, 2020

Conversation

OddBloke
Copy link
Collaborator

@OddBloke OddBloke commented Nov 6, 2020

Proposed Commit Message

introduce an upgrade framework and related testing

This commit does the following:

  • introduces the cloudinit.persistence module, containing
    CloudInitPickleMixin which provides lightweight versioning of
    objects' pickled representations (and associated testing)
  • introduces a basic upgrade testing framework (in
    cloudinit.tests.test_upgrade) which unpickles pickles from previous
    versions of cloud-init (stored in tests/data/old_pickles) and tests
    invariants that the current cloud-init codebase expects
  • uses the versioning framework to address an upgrade issue where
    Distro.networking could get into an unexpected state, and uses the
    upgrade testing framework to confirm that the issue is addressed

Additional Context

This work was started in #609, and a lot of design discussion happened over there.

Test Steps

Upgrade from pre-refactor cloud-init

ubuntu:ffae848ee5a0 is a focal image containing a pre-networking-refactor cloud-init (which will exhibit this bug); it is launched as reproducer:

$ lxc launch ubuntu:ffae848ee5a0 reproducer
$ lxc exec reproducer -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking_cls"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'Distro' object has no attribute 'networking_cls'
$ lxc exec reproducer -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'Distro' object has no attribute 'networking'

$ lxc exec reproducer -- apt update
...

$ lxc exec reproducer -- apt install cloud-init
...

# networking_cls is present, networking is not!
$ lxc exec reproducer -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking_cls"
$ lxc exec reproducer -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'Distro' object has no attribute 'networking'

reproducer2 is launched from the same image, and has a deb built from this branch (instead of the archive) installed into it:

$ lxc launch ubuntu:ffae848ee5a0 reproducer2
$ lxc file push cloud-init_20.3-89-g07443698-1\~bddeb\~20.04.1_all.deb reproducer2/
$ lxc exec reproducer2 -- apt install /cloud-init_20.3-89-g07443698-1\~bddeb\~20.04.1_all.deb
...
# Both networking_cls and networking are present, as expected
$ lxc exec reproducer2 -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking_cls"
$ lxc exec reproducer2 -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking"

Upgrade from post-refactor cloud-init

Following the same steps as for reproducer2 above, we can see that both .networking_cls and .networking remain available:

$ lxc launch ubuntu:f reproducer
$ lxc file push cloud-init_20.3-89-g07443698-1\~bddeb\~20.04.1_all.deb reproducer/
$ lxc exec reproducer -- apt install /cloud-init_20.3-89-g07443698-1\~bddeb\~20.04.1_all.deb
...
# Both networking_cls and networking are present, as expected
$ lxc exec reproducer -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking_cls"
$ lxc exec reproducer -- python3 -c "from cloudinit.stages import _pkl_load; obj = _pkl_load('/var/lib/cloud/instance/obj.pkl'); obj.distro.networking"

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

cloudinit/tests/test_persistence.py Outdated Show resolved Hide resolved
cloudinit/tests/test_persistence.py Show resolved Hide resolved
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. One question inline and will approve.

"""

def __getstate__(self):
return self.__dict__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth an assert "_ci_pkl_version" not in self.dict ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ci_pkl_version will never be in __dict__ unless it's explicitly added to instance state: I don't think we'd ever trigger that assert here without modifying this test subclass to set such instance state. Am I missing a case this could cover?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently not triggered, but if we didn't pop the _ci_pkl_version off of the state dict in __setstate__ we also don't cause any tests to fail either.

diff --git a/cloudinit/persistence.py b/cloudinit/persistence.py
index 85aa79df3..543af4da2 100644
--- a/cloudinit/persistence.py
+++ b/cloudinit/persistence.py
@@ -49,7 +49,7 @@ class CloudInitPickleMixin:
         See https://docs.python.org/3/library/pickle.html#object.__setstate__
         for further background.
         """
-        version = state.pop("_ci_pkl_version", 0)
+        version = state.get("_ci_pkl_version", 0)
         self.__dict__.update(state)
         self._unpickle(version)
 

cloudinit/tests/test_upgrade.py Show resolved Hide resolved
@raharper
Copy link
Collaborator

Thanks for working on this. It looks quite straight forward.

Do we have any obj.pkl upgrade items in debian/* that could be dropped with this framework? Azure has a systemd dropin config to rm the obj.pkl to ensure networking is regenerated on every boot; and I thought there was an issue around ec2-classic VMs and networking; I know I tried at one time to update existing obj.pkl with the EventType.BOOT flag to regenerate network-config on each boot there (when shutting down ec2-classic instances, the MAC of the nic is lost and changes on startup).

@OddBloke
Copy link
Collaborator Author

Do we have any obj.pkl upgrade items in debian/* that could be dropped with this framework?

A quick look around the {post,pre}inst doesn't reveal anything immediately obvious: there is some handling of obj.pkl there, but it looks like that's all to do with moving it around wholesale, rather than handling upgrade issues in the state it contains.

Azure has a systemd dropin config to rm the obj.pkl to ensure networking is regenerated on every boot

I would expect this to be replaceable with the new framework: DataSourceAzure could implement _unpickle which sets the appropriate state so this happens automatically. (Of course, it's possible that there is now more code which relies on this obj.pkl removal, so some testing would be required to ensure that we aren't regressing something unexpected by dropping the drop-in.)

and I thought there was an issue around ec2-classic VMs and networking; I know I tried at one time to update existing obj.pkl with the EventType.BOOT flag to regenerate network-config on each boot there (when shutting down ec2-classic instances, the MAC of the nic is lost and changes on startup).

This issue certainly sounds familiar, and I think we'd be able to do something similar to the Azure case too.

Thanks for the review!

This introduces a utility mixin class, ``CloudInitPickleMixin`` which
implements the pickling magic methods (``__getstate__`` and
``__setstate__``) to add a version to the state persisted for classes to
which it is added.
@OddBloke
Copy link
Collaborator Author

Updated description with the manual testing I have performed.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! address my final testing assert nit if you feel it is needed.

Thanks again for the responses, I do agree that any pickle upgrade path testing uncovered by this framework will be much more accessible with our simpler integration tests.

cloudinit/tests/test_upgrade.py Show resolved Hide resolved
"""

def __getstate__(self):
return self.__dict__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently not triggered, but if we didn't pop the _ci_pkl_version off of the state dict in __setstate__ we also don't cause any tests to fail either.

diff --git a/cloudinit/persistence.py b/cloudinit/persistence.py
index 85aa79df3..543af4da2 100644
--- a/cloudinit/persistence.py
+++ b/cloudinit/persistence.py
@@ -49,7 +49,7 @@ class CloudInitPickleMixin:
         See https://docs.python.org/3/library/pickle.html#object.__setstate__
         for further background.
         """
-        version = state.pop("_ci_pkl_version", 0)
+        version = state.get("_ci_pkl_version", 0)
         self.__dict__.update(state)
         self._unpickle(version)
 

@OddBloke OddBloke merged commit 4f2da1c into canonical:master Nov 17, 2020
This was referenced May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants