Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-init does not respect declared MIME types in multipart archives #3764

Closed
ubuntu-server-builder opened this issue May 12, 2023 · 13 comments
Labels
launchpad Migrated from Launchpad priority Fix soon

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1888822

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = 2020-08-25T19:31:59.685827+00:00
date_created = 2020-07-24T09:49:15.112105+00:00
date_fix_committed = 2020-08-25T19:31:59.685827+00:00
date_fix_released = 2020-08-25T19:31:59.685827+00:00
id = 1888822
importance = critical
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1888822
milestone = None
owner = rcvanvo
owner_name = Robert Van Voorhees
private = False
status = fix_released
submitter = rcvanvo
submitter_name = Robert Van Voorhees
tags = []
duplicates = []

Launchpad user Robert Van Voorhees(rcvanvo) wrote on 2020-07-24T09:49:15.112105+00:00

In #290 we landed a change to user-data processing which expanded the set of MIME types we would consider signifying "unknown content" to include many (if not all) of the MIME types we would normally expect to be used in user-data multipart archives[0].

This means that every part is now assigned its MIME type based on the first line of its content; the declared MIME types are ignored.

In the specific reported case, a "text/cloud-boothook" part started with #!, which is appropriate and correct, but was therefore detected as "text/x-shellscript" due to this bug.

[0] Specifically, it was expanded to include all the values in the dict at https://github.com/canonical/cloud-init/blob/master/cloudinit/handlers/__init__.py#L43-L54

[Original Report]

In the upstream Kubernetes project Cluster API, specifically the Cluster API AWS Provider, it will download a file securely from AWS Secrets Manager in the cloud-init script, save that file to a well known location, and then restart the cloud-init service through systemd. After the cloud-init script is restarted, it will resolve the secrets file (that had previously not been there) and execute its commands.

This worked fine on versions of cloud-init up until 19.4-33-gbb4131a2-0ubuntu118.04.1. Once upgrading to 20.2-45-g5f7825e2-0ubuntu118.04.1 the secrets file is never resolved again.

Some other information:

  • cloud-init is definitely successfully running twice based on systemd and cloud-init-output.
  • The /var/lib/cloud/instance/user-data.txt does show the reference to the well-known file at /etc/secret-userdata.txt
  • The "resolved" version of user-data at /var/lib/cloud/instance/user-data.txt.i does not include the resolved file. Deleting this file and then restarted cloud-init does not solve the problem, as the file resolves again without it.

Is there another command that is now required if you plan on restarting cloud-init for another execution where files are now present that were previously not?

  1. Cloud Provider: AWS
  2. Upstream issue: cloud-init v20.2-45-g5f7825e2-0ubuntu1~18.04.1 never runs /etc/secret-userdata.txt kubernetes-sigs/cluster-api-provider-aws#1839 Instructions to recreate can be found in that issue including 2 public AMIs.
@ubuntu-server-builder ubuntu-server-builder added launchpad Migrated from Launchpad priority Fix soon labels May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Robert Van Voorhees(rcvanvo) wrote on 2020-07-24T09:49:15.112105+00:00

Launchpad attachments: results of cloud-init log collector

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Robert Van Voorhees(rcvanvo) wrote on 2020-07-24T09:50:00.743853+00:00

user-data.txt
Launchpad attachments: The user-data that is gzipped and passed to the instance

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Robert Van Voorhees(rcvanvo) wrote on 2020-07-24T09:50:32.170358+00:00

resolved user-data missing the secrets file.
Launchpad attachments: user-data.txt.i

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Robert Van Voorhees(rcvanvo) wrote on 2020-07-24T09:53:29.769821+00:00

After running cloud-init clean cloud-init will hang when run again.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user detiber(detiber) wrote on 2020-07-28T16:49:01.566316+00:00

I believe I've tracked the issue down to the following PR: #290

It looks like because we are declaring the boothook using only the content type, the content type is being overridden with x-shellscript because of the following code:

if ctype_orig in TYPE_NEEDED or (ctype_orig in
INCLUDE_MAP.values()):
ctype = find_ctype(payload)
if ctype is None:
ctype = ctype_orig

I don't believe this behavior is correct since it is overriding correctly set content types with different content types (in this case overriding text/cloud-boothook with text/x-shellscript).

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-07-28T18:22:09.697467+00:00

Do you have a collect-logs from a successful run on 19.4 ? The logs included have two days (2020-07-23 and 2020-07-24, the former using 19.4 and the latter using 20.1).

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2020-07-28T19:28:29.008901+00:00

Hi Robert, detiber,

Thanks for using cloud-init, for filing a bug, and for the triage! Ryan and I have been chatting on IRC (feel free to join us in #cloud-init on Freenode) and we agree this is a regression. Apologies!

Some older platforms always pass user-data in MIME multipart archives which use "text/x-shellscript" for the part (even if the user is passing "#cloud-config" user-data). The commit you've identified mistakenly means that for every part with a MIME type we know about, we will use the first line of that parts content to determine its type, ignoring the MIME type. The first line of your boothook is "#!", which maps to x-shellscript. This in turn means that it runs later in boot, and everything else falls apart as a result.

The initial fix we've identified is to only use the content to determine the true MIME type if the given MIME type is x-shellscript. This relies on the fact that if an x-shellscript part does not start with a #!, then cloud-init will fail to execute it; it follows that every currently-functional x-shellscript MIME part starts with #!. This means that we will always detect true x-shellscript parts as x-shellscript from their content. And it follows, in turn, that we can safely always use the content of x-shellscript parts to determine their type.

(The reason we cannot do the same for other MIME types is because they do not have the same "detection roundtrip" guarantee.)

In the meantime, if you modify your generated boothook to start with "#cloud-boothook", it will be correctly detected and handled.

Thanks, and apologies, again!

Dan

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-07-28T21:02:40.224642+00:00

#511

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Robert Van Voorhees(rcvanvo) wrote on 2020-08-10T14:45:03.773986+00:00

Are there next steps or anything that could happen to address this PR?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-08-10T15:02:11.103102+00:00

Robert,

Thanks for following up. The PR is waiting on a maintainer review to approve for landing.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user James Falcon(falcojr) wrote on 2020-08-25T19:32:01.414888+00:00

This bug is believed to be fixed in cloud-init in version 20.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Naadir Jeewa(randomvariable) wrote on 2020-10-09T18:12:29.969918+00:00

Hi there,

I think this might be broken again with 20.3, or at least we added the recommended workaround with #cloud-boothook, and machines with 20.3 don't execute it anymore.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Naadir Jeewa(randomvariable) wrote on 2020-10-09T18:58:12.576493+00:00

Actually, found out we need to set ERROR_ON_USER_DATA_FAILURE=False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad priority Fix soon
Projects
None yet
Development

No branches or pull requests

1 participant