-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MIME policy failure on python version upgrade (SC-201) #934
Conversation
Python 3.6 added a new `policy` attribute to `MIMEMultipart`. MIMEMultipart may be part of the cached object pickle of a datasource. Upgrading from an old version of python to 3.6+ will cause the datasource to be invalid after pickle load. This commit uses the upgrade framework to attempt to access the mime message and fail early (thus discarding the cache) if we cannot. Commit 78e89b0 should fix this issue more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, but I really don't like us leaving around Tracebacks in cloud-init logs (this is also something we typically grep for now in upgrade tests etc as a smell that something has gone wrong).
What do you think about better handling of the Datasource._unpickle with a try/except AttributeError and raise a custom DatasourcUnpickleUserDataErrror. Then our _pkl_load can field this case and avoid logging a nasty traceback in logs.
Here's a diff with integration test changes if you think it makes sense.
diff --git a/cloudinit/sources/__init__.py b/cloudinit/sources/__init__.py
index 14880747..13f3dadc 100644
--- a/cloudinit/sources/__init__.py
+++ b/cloudinit/sources/__init__.py
@@ -75,6 +75,10 @@ NetworkConfigSource = namedtuple('NetworkConfigSource',
_NETCFG_SOURCE_NAMES)(*_NETCFG_SOURCE_NAMES)
+class DatasourceUnpickleUserDataError(Exception):
+ """Raised when userdata is unable to be unpickled due to python upgrades"""
+
+
class DataSourceNotFoundException(Exception):
pass
@@ -245,7 +249,14 @@ class DataSource(CloudInitPickleMixin, metaclass=abc.ABCMeta):
# Calling str() on the userdata will attempt to access this
# policy attribute. This will raise an exception, causing
# the pickle load to fail, so cloud-init will discard the cache
- str(self.userdata)
+ try:
+ str(self.userdata)
+ except AttributeError as e:
+ LOG.debug(
+ "Unable to unpickle datasource: %s."
+ " Ignoring current cache.", e
+ )
+ raise DatasourceUnpickleUserDataError()
def __str__(self):
return type_utils.obj_name(self)
diff --git a/cloudinit/stages.py b/cloudinit/stages.py
index 3688be2e..20b1ffbd 100644
--- a/cloudinit/stages.py
+++ b/cloudinit/stages.py
@@ -28,7 +28,10 @@ from cloudinit.event import (
EventType,
userdata_to_events,
)
-from cloudinit.sources import NetworkConfigSource
+from cloudinit.sources import (
+ DatasourceUnpickleUserDataError,
+ NetworkConfigSource,
+)
from cloudinit import cloud
from cloudinit import config
@@ -1070,6 +1073,8 @@ def _pkl_load(fname):
return None
try:
return pickle.loads(pickle_contents)
+ except sources.DatasourceUnpickleUserDataError:
+ return None
except Exception:
util.logexc(LOG, "Failed loading pickled blob from %s", fname)
return None
diff --git a/tests/integration_tests/modules/test_persistence.py b/tests/integration_tests/modules/test_persistence.py
index 9e46768c..fd450e24 100644
--- a/tests/integration_tests/modules/test_persistence.py
+++ b/tests/integration_tests/modules/test_persistence.py
@@ -22,8 +22,7 @@ def test_log_message_on_missing_version_file(client: IntegrationInstance):
assert client.execute('cloud-init status --wait').ok
log = client.read_from_file('/var/log/cloud-init.log')
verify_ordered_items_in_text([
- 'Failed loading pickled blob from /var/lib/cloud/instance/obj.pkl',
- "'MIMEMultipart' object has no attribute 'policy",
+ "Unable to unpickle datasource: 'MIMEMultipart' object has no attribute 'policy'. Ignoring current cache.",
'no cache found',
'Searching for local data source',
'SUCCESS: found local data from DataSourceNoCloud'
Thanks @blackboxsw , your changes are very helpful. @rjschwei and @Moustafa-Moustafa , will this fix the issue you were seeing? I verified it via somewhat contrived means. Is there a way you can verify it works for you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We'll let @Moustafa-Moustafa and/or @rjschwei respond whether this is good enough to "solve" the upgrade problem they saw.
@TheRealFalcon LGTM. I do not have a real world use case either. |
Proposed Commit Message
Additional Context
See #857. One downside of this approach is we'll still get an ugly traceback in the log. It doesn't indicate a failure of cloud-init, just a failure of reading the pickle. E.g.,:
Test Steps
Integration test added at tests/integration_tests/test_persistence.py (using a hacked together pickle), but this should be verified manually to ensure that the original issue is actually fixed.
Checklist: