Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ec2): multipart user data #11843

Merged
merged 7 commits into from Mar 8, 2021
Merged

feat(ec2): multipart user data #11843

merged 7 commits into from Mar 8, 2021

Conversation

@rsmogura
Copy link
Contributor

@rsmogura rsmogura commented Dec 2, 2020

Add support for multiparat (MIME) user data for Linux environments. This
type is more versatile type of user data, and some AWS service
(i.e. AWS Batch) requires it in order to customize the launch
behaviour.

Change was tested in integ environment to check if all
user data parts has been executed correctly and with proper charset
encoding.

fixes #8315


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

@gitpod-io gitpod-io bot commented Dec 2, 2020

@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Dec 3, 2020

I think this has to be work in progress for a moment (soft review is welcome) - I analysed integration test results and the approach with auto generating boundary is not perfect when the token is used inside part body, it can lead to different hashes used as boundary and change of user data in turn.

@rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Dec 14, 2020

I don't really understand the design.

Why isn't it just a self-contained MimeUserData extends UserData class? Why do we have to turn original UserData class into an IMultipartUserDataPartProducer ?

And generally the boundary is just a large pseudorandom number which we'll assume won't occur in the various parts. Getting a hash from the part contents seems unnecessarily complicated?

Let's start by defining what parts there are. I assume you'll have a "commands" part (the "old" UserData, so to speak) and then a bunch of "attachments", right? Which are basically just files to stick somewhere on the filesystem?

@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Dec 14, 2020

SeeHi Ricco,

I got your point. I'll try to clarify some "whys" so maybe it will be better understable and we could go with this or something better.

Let me start from last part.

Multipart User Data is more like archive than a message with attachments.

Each script is a separate part (attachment), executed during different phases. Many scripts can be executed for the same phase.

(There's more kinds of attachments than script types).

Thus I thought that the Multipart has to inherit from UserData, but it doesn't allow adding commands (to what part or type? it's just archive)

In order to add parts and use existing classes (LinuxUserData)
there's IMultipartUserDataPartProducer. Part requires additional attributes to be rendered (like type), besides body. And there's at least two hooks where scripts can be added.

This allows reusing the current command like user data for Multipart user data.


Let me give a use case:
In Multipart user data I want to have two part with following content-types

cloud-boothook - to reconfigure docker and increase size of docker volume
x-shellscript - to I.e. register in ECS or system manager

So in this case Multipart will be archive for shell scripts, executed by cloud unit.

(Worth of mentioning but not implementing now is that there's more types supported by cloud init: like part handlers, include url, and more - in future every such type could get own class implementing IMultipartUserDataPartProducer)

So that's very nice to have good design review.

@rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Jan 6, 2021

This allows reusing the current command like user data for Multipart user data.

I see. I don't mind that at all, I think you are right. But then we still have a couple of choices:

UserData implements IMultipart

image

The different UserData implementations that can be reused implement IMultipart

image

There's an integration class that implements IMultipart

image


Out of these, I think the last one is most self-contained, so has my preference. There might be some syntactic overhead there, but we can hide that with convenience methods:

multipart.addUserDataPart(new LinuxUserData());

class MultipartUserData {
  public addPart(part: IMultipart) { ... }

  public addUserDataPart(userData: UserData) {
    this.addPart(new UserDataMultipart(userData));
  }
}

That way we don't need to pollute the existing UserData classes with multipart-specifics.

@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Jan 7, 2021

Agree. The last idea is really good, instead of bloating code with interfaces just use adaptor.

And later we could think how to solve separator issue - however fallback will be to specify separator as option to Multipart.

@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Jan 21, 2021

Hi @rix0rrr can you check now?

Copy link
Contributor

@rix0rrr rix0rrr left a comment

I like where this is going!

I have some requests around naming and simplifying the implementation a little, but otherwise great job!

Also looking for a real life example of using more than 1 part :)

packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ec2/lib/user-data.ts Outdated Show resolved Hide resolved
@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Feb 17, 2021

Thank you. And sorry for late answer I missed notification.

I'll take a look at it.

I think real life example could be setting size for Docker container disk in batch environment:

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-failure-disk-space/

@yashda
Copy link

@yashda yashda commented Feb 25, 2021

We have a similar requirement which is another real life example of using more than 1 part:

  1. Configure the output/log file for cloud-init using cloud-config directives on CentOS images.
  2. Execute user-data script(bash) on startup of the Instance.

These changes would help us a lot in simplifying the user-data handling.

@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Mar 1, 2021

I want to test it with Windows machines, too. I'm bit concerned about line ending characters.

Radek Smogura and others added 4 commits Dec 2, 2020
Add support for multiparat (MIME) user data for Linux environments. This
type is more versatile type of user data, and some AWS service
(i.e. AWS Batch) requires it in order to customize the launch
behaviour.

Change was tested in integ environment to check if all
user data parts has been executed correctly and with proper charset
encoding.

fixes #8315
* Remove `IMultipartUserDataPartProducer`
* Add `MultipartUserDataPart` & `IMultipart`
* Concrete types to represent raw part and UserData wrapper can be created with
`MultipartUserDataPart.fromUserData` & `MultipartUserDataPart.fromRawBody`
* Removed auto-generation of separator (as with tokens hash codes can differ when tokens are not resolved)
- remove `MultipartContentType`
- remove `MultipartUserDataPartWrapperOptions`
- remove `IMultipart`
- rename `MultipartUserDataPart` -> `MultipartBody`
- other removals
- restructure other classes
- moved part rendering to part class
- set default separator to hard codeded string
- added validation of boundry
@rsmogura rsmogura force-pushed the rsmogura:issue-8315 branch from 68a350d to f50d10b Mar 4, 2021
@mergify mergify bot dismissed rix0rrr’s stale review Mar 4, 2021

Pull request has been modified.

@rsmogura
Copy link
Contributor Author

@rsmogura rsmogura commented Mar 4, 2021

Hi @rix0rrr

I made a bigger refactor of code, I removed few methods, and followed tips. I think it's more clean right now, please check.

In context of Windows machines, I think multipart is not supported for Windows (at least I could not run any kind of multipart data on Windows, and documentation as well does not mention that multipart is supported for Windows EC2).


Minor thing which concerns me. MIME RFC as it's telnet based suggests CRLF as new line, however cloud-init is fine with \n (I made number of tests here, however I think it's worth to call it out)

Copy link
Contributor

@rix0rrr rix0rrr left a comment

Thanks! This is looking great. Just some small tweaks.

protected static readonly DEFAULT_CONTENT_TYPE = 'text/x-shellscript; charset="utf-8"';

/** The body of this MIME part. */
public abstract get body(): string | undefined;

This comment has been minimized.

@rix0rrr

rix0rrr Mar 8, 2021
Contributor

I think I'd prefer you don't declare these abstract members, and just implement renderBodyPart() twice--once for the two concrete types of classes.

It may feel like unnecessary duplication, but the actual amount of duplication won't be that bad and we'll have a good reduction in case analysis too (fewer ifs).

This comment has been minimized.

@rsmogura

rsmogura Mar 11, 2021
Author Contributor

Now I got the point. I thought that main concern was about options interfaces, so I moved towards abstracts getters and template method pattern.

Thanks, that's a good comment.


return this;
}

This comment has been minimized.

@rix0rrr

rix0rrr Mar 8, 2021
Contributor

Why get rid of the addUserDataPart(userData, contentType) here? I rather liked it as a convenience method.

(Of course it's not strictly necessary, but it reads nicer than the current alternative)

@mergify mergify bot dismissed rix0rrr’s stale review Mar 8, 2021

Pull request has been modified.

@rix0rrr rix0rrr changed the title feat(ec2): introduce multipart user data feat(ec2): multipart user data Mar 8, 2021
@rix0rrr
rix0rrr approved these changes Mar 8, 2021
@mergify
Copy link
Contributor

@mergify mergify bot commented Mar 8, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation commented Mar 8, 2021

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: bc6e06c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

@mergify mergify bot commented Mar 8, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

This was referenced Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants