Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MIME policy failure on python version upgrade (SC-201) #934

Merged
merged 5 commits into from
Jul 15, 2021

Conversation

TheRealFalcon
Copy link
Member

Proposed Commit Message

Fix MIME policy failure on python version upgrade

Python 3.6 added a new `policy` attribute to `MIMEMultipart`.
MIMEMultipart may be part of the cached object pickle of a datasource.
Upgrading from an old version of python to 3.6+ will cause the
datasource to be invalid after pickle load.

This commit uses the upgrade framework to attempt to access the mime
message and fail early (thus discarding the cache) if we cannot.
Commit 78e89b03 should fix this issue more generally.

Additional Context

See #857. One downside of this approach is we'll still get an ugly traceback in the log. It doesn't indicate a failure of cloud-init, just a failure of reading the pickle. E.g.,:

2021-07-01 21:10:09,497 - handlers.py[DEBUG]: start: init-local/check-cache: attempting to read from cache [check]
2021-07-01 21:10:09,497 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2021-07-01 21:10:09,497 - util.py[DEBUG]: Read 5635 bytes from /var/lib/cloud/instance/obj.pkl
2021-07-01 21:10:09,507 - util.py[WARNING]: Failed loading pickled blob from /var/lib/cloud/instance/obj.pkl
2021-07-01 21:10:09,507 - util.py[DEBUG]: Failed loading pickled blob from /var/lib/cloud/instance/obj.pkl
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 1072, in _pkl_load
    return pickle.loads(pickle_contents)
  File "/usr/lib/python3/dist-packages/cloudinit/persistence.py", line 54, in __setstate__
    self._unpickle(version)
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 248, in _unpickle
    str(self.userdata)
  File "/usr/lib/python3.9/email/message.py", line 135, in __str__
    return self.as_string()
  File "/usr/lib/python3.9/email/message.py", line 152, in as_string
    policy = self.policy if policy is None else policy
AttributeError: 'MIMEMultipart' object has no attribute 'policy'
2021-07-01 21:10:09,508 - stages.py[DEBUG]: no cache found
2021-07-01 21:10:09,508 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: no cache found
2021-07-01 21:10:09,508 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance
2021-07-01 21:10:09,510 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>
2021-07-01 21:10:09,510 - __init__.py[DEBUG]: Looking for data source in: ['NoCloud', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']
2021-07-01 21:10:09,512 - __init__.py[DEBUG]: Searching for local data source in: ['DataSourceNoCloud']
2021-07-01 21:10:09,512 - handlers.py[DEBUG]: start: init-local/search-NoCloud: searching for local data from DataSourceNoCloud
2021-07-01 21:10:09,512 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloud'>
2021-07-01 21:10:09,512 - __init__.py[DEBUG]: Update datasource metadata and network config due to events: boot-new-instance

Test Steps

Integration test added at tests/integration_tests/test_persistence.py (using a hacked together pickle), but this should be verified manually to ensure that the original issue is actually fixed.

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

Python 3.6 added a new `policy` attribute to `MIMEMultipart`.
MIMEMultipart may be part of the cached object pickle of a datasource.
Upgrading from an old version of python to 3.6+ will cause the
datasource to be invalid after pickle load.

This commit uses the upgrade framework to attempt to access the mime
message and fail early (thus discarding the cache) if we cannot.
Commit 78e89b0 should fix this issue more generally.
@TheRealFalcon TheRealFalcon changed the title Fix MIME policy failure on python version upgrade Fix MIME policy failure on python version upgrade (SC-201) Jul 1, 2021
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, but I really don't like us leaving around Tracebacks in cloud-init logs (this is also something we typically grep for now in upgrade tests etc as a smell that something has gone wrong).

What do you think about better handling of the Datasource._unpickle with a try/except AttributeError and raise a custom DatasourcUnpickleUserDataErrror. Then our _pkl_load can field this case and avoid logging a nasty traceback in logs.

Here's a diff with integration test changes if you think it makes sense.

diff --git a/cloudinit/sources/__init__.py b/cloudinit/sources/__init__.py
index 14880747..13f3dadc 100644
--- a/cloudinit/sources/__init__.py
+++ b/cloudinit/sources/__init__.py
@@ -75,6 +75,10 @@ NetworkConfigSource = namedtuple('NetworkConfigSource',
                                  _NETCFG_SOURCE_NAMES)(*_NETCFG_SOURCE_NAMES)
 
 
+class DatasourceUnpickleUserDataError(Exception):
+    """Raised when userdata is unable to be unpickled due to python upgrades"""
+
+
 class DataSourceNotFoundException(Exception):
     pass
 
@@ -245,7 +249,14 @@ class DataSource(CloudInitPickleMixin, metaclass=abc.ABCMeta):
             # Calling str() on the userdata will attempt to access this
             # policy attribute. This will raise an exception, causing
             # the pickle load to fail, so cloud-init will discard the cache
-            str(self.userdata)
+            try:
+                str(self.userdata)
+            except AttributeError as e:
+                LOG.debug(
+                    "Unable to unpickle datasource: %s."
+                    " Ignoring current cache.", e
+                )
+                raise DatasourceUnpickleUserDataError()
 
     def __str__(self):
         return type_utils.obj_name(self)
diff --git a/cloudinit/stages.py b/cloudinit/stages.py
index 3688be2e..20b1ffbd 100644
--- a/cloudinit/stages.py
+++ b/cloudinit/stages.py
@@ -28,7 +28,10 @@ from cloudinit.event import (
     EventType,
     userdata_to_events,
 )
-from cloudinit.sources import NetworkConfigSource
+from cloudinit.sources import (
+    DatasourceUnpickleUserDataError,
+    NetworkConfigSource,
+)
 
 from cloudinit import cloud
 from cloudinit import config
@@ -1070,6 +1073,8 @@ def _pkl_load(fname):
         return None
     try:
         return pickle.loads(pickle_contents)
+    except sources.DatasourceUnpickleUserDataError:
+        return None
     except Exception:
         util.logexc(LOG, "Failed loading pickled blob from %s", fname)
         return None
diff --git a/tests/integration_tests/modules/test_persistence.py b/tests/integration_tests/modules/test_persistence.py
index 9e46768c..fd450e24 100644
--- a/tests/integration_tests/modules/test_persistence.py
+++ b/tests/integration_tests/modules/test_persistence.py
@@ -22,8 +22,7 @@ def test_log_message_on_missing_version_file(client: IntegrationInstance):
     assert client.execute('cloud-init status --wait').ok
     log = client.read_from_file('/var/log/cloud-init.log')
     verify_ordered_items_in_text([
-        'Failed loading pickled blob from /var/lib/cloud/instance/obj.pkl',
-        "'MIMEMultipart' object has no attribute 'policy",
+        "Unable to unpickle datasource: 'MIMEMultipart' object has no attribute 'policy'. Ignoring current cache.",
         'no cache found',
         'Searching for local data source',
         'SUCCESS: found local data from DataSourceNoCloud'

@TheRealFalcon
Copy link
Member Author

Thanks @blackboxsw , your changes are very helpful.

@rjschwei and @Moustafa-Moustafa , will this fix the issue you were seeing? I verified it via somewhat contrived means. Is there a way you can verify it works for you?

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We'll let @Moustafa-Moustafa and/or @rjschwei respond whether this is good enough to "solve" the upgrade problem they saw.

@rjschwei
Copy link
Contributor

@TheRealFalcon LGTM. I do not have a real world use case either.

@blackboxsw blackboxsw merged commit eacb035 into canonical:main Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants