-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for hybrid environments #12
Conversation
|
|
|
||
if [[ -s "${USER_DATA}" ]] \ | ||
&& [[ ! -s "${SSM_AGENT_LOCAL_STATE_DIR}/registration" ]] \ | ||
&& jq --exit-status '.ssm' "${USER_DATA}" &>/dev/null ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically fetch_from_json
; we'd probably want to fail here, too, if .ssm
was empty inside.
I think it might be easier to use set -e
for the whole script. Then you could use fetch_from_json
here too. I think then you could remove the set -e
in prepare_persistent_state_dir
, make fetch_from_json
return 1 instead of exit 1, and change the main call to if ! amazon-ssm-agent -bla
instead of checking $?
afterward, and it'd be fine? If there are any lines we expect could fail, we could use if ! bla; log explanation; exit 1; fi
for extra clarity, but I wouldn't expect the simple mkdir/chmod/rmdir/ln to fail.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new changes look good to me. The main thing I'm hoping to see is more testing of failure cases - #12 (comment). I want to make sure it's going to be possible to troubleshoot issues with activations using console output. (Also a couple nit comments open.)
For some context on the last piece, @tjkirch and I were concerned that if we printed the whole log file it might print errors from previous boots. In my testing, I found this not to be the case as the log file was fresh each time. |
Updated the PR description to include the output of several failure cases. (Click each case to reveal EC2 console output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! 🧩
Updated the PR description to include the SSM Agent changes since 3.0.732.0 (the version we included in v0.4.2) |
"activation-id": "foo", | ||
"activation-code": "bar", | ||
"region":"us-west-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names aren't consistent with the user data used in the admin container (foo-bar
vs foo_bar
). Was this intentional? As much as we can, I'd prefer that the "convention" stays consistent for users. This aligns with Bottlerocket's user-data.toml
naming (which uses skewers) so one could argue that this one is the "right way" and the admin is in the wrong.. but we already have the admin change out, so I'm more inclined to align with that user data as this user data and the admin's, to me, roughly feels like it'll be treated as same namespace and with similar conventions by end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the other settings I see when I run apiclient -u /settings
I would say that the admin container is the weird one, and that foo-bar
is the "correct" convention, but I do not have strong feelings. I think the it might be worth getting some additional opinions. I think the admin container is the way it is because the authorized keys are typically stored in a file named authorized_keys
, but that may have been a poor decision on my part. To make this conundrum even more fun, when you generate an activation token with aws ssm create-activation
it returns JSON with a FooBar
convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion, we're going to keep the admin container convention and use foo_bar
(snake_case) for the settings inside host-container user data in general. I'll add this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of sticking with the skewer (foo-bar
) implementation here and adding a backlog item to fix the admin container to accept the skewer also.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set the permission bits on the script before adding - and then.. ship it!
🚢 👍🏽
This change allows users running on-premises or in a hybrid environment to register with SSM as a managed-instance. SSM's local state directory (/var/lib/amazon/ssm) has moved to a persistent directory in /.bottlerocket/host-containers/current/. Activations parameters are parsed from a base64-encoded JSON block in the user-data setting. Example JSON: { "ssm":{ "activation-id":"foo", "activation-code":"bar", "region":"us-west-2", } }
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a suggest edit for the README - script looks great!
Once you've created your JSON, you'll need to base64-encode it and put it in the control host container's `user-data` setting in your [instance user data](https://github.com/bottlerocket-os/bottlerocket#using-user-data). | ||
|
||
For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a stab at rewording this because it took a pass or two to full parse the sentence. Now that its split up I think this conveys the same and reads a bit easier (IMO) - what do you think about this instead?
Once you've created your JSON, you'll need to base64-encode it and put it in the control host container's `user-data` setting in your [instance user data](https://github.com/bottlerocket-os/bottlerocket#using-user-data). | |
For example: | |
Once you've created your JSON, you'll need to provide it via the control container's user data. | |
The host-container `user-data` must be base64-encoded and provided - in its encoded form - using your [instance's user data](https://github.com/bottlerocket-os/bottlerocket#using-user-data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the original, but I appreciate that you're looking at the documentation 😃
|
||
``` | ||
[settings.host-containers.control] | ||
# ex: echo '{"ssm":{"activation-id":"foo","activation-code":"bar","region":"us-west-2"}}' | base64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my system, I need base64 -w0
to avoid newlines in the output.
# ex: echo '{"ssm":{"activation-id":"foo","activation-code":"bar","region":"us-west-2"}}' | base64 | |
# ex: echo '{"ssm":{"activation-id":"foo","activation-code":"bar","region":"us-west-2"}}' | base64 -w0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we avoided it since not all systems may necessarily have -w
. (MacOS, for example, does not)
Ran the tests again both with and without a hybrid activation in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧇
Issue number:
N/A
Description of changes:
This change allows users running on-premises or in a hybrid environment to register with SSM as a managed-instance. SSM's local state directory (
/var/lib/amazon/ssm
) has moved to a persistent directory in/.bottlerocket/host-containers/current/
. Activations parameters are parsed from a base64-encoded JSON block in user-data.Example JSON:
Also bumped the SSM Agent version to 3.0.882.0.
And bumped container version to v0.5.0.
Testing done:
Created an ssm hybrid activation.
Launched
aws-ecs-1
ami with added[settings.host-containers.control] user-data
.Instance connected to ecs cluster.
Test task deployed successfully.
Check AWS Systems Manager to see if my activation was used.
Fetched instance-id from AWS Systems Manager and verified it began with
mi-
prefix.Connected to control container via ssm session.
Verified that
host-containers.control.user-data
contained a base64-encoded block.Disabled the
control
container and enabled acurrent
container.Verified that /.bottlerocket/host-containers/current/ contained an ssm directory.
Enabled and launched admin container.
Connected to admin container via ssh.
Ran
sudo sheltie
to verify root shell was still available.Checked for failed systemd units.
Launched additional instance with no user data and verified everything still works.
Launched additional instances with purposefully malformed user data detailed below.
Some possible error messages:
Bad Region: us-weast-2
Bad Activation ID: character missing when copy/pasting
Empty Element: "activation-code":""
Missing Element: no region specified
SSM Agent changes since v0.4.2:
3.0.882.0
3.0.854.0
3.0.755.0
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.