-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add windows server 2019 packer template #546
Add windows server 2019 packer template #546
Conversation
Over here, we had an entertaining time with setting up a local user account to run the bk agent service as (we chose NSSM as the manager). https://serverfault.com/questions/946882/how-to-programmatically-cause-a-new-windows-users-profile-to-be-created is relevant re: creating user profile directory. |
packer-windows/buildkite-ami.json
Outdated
"spot_price": "auto", | ||
"spot_price_auto_product": "Windows (Amazon VPC)", | ||
"user_data_file":"scripts/ec2-userdata.ps1", | ||
"communicator": "winrm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reasonably sure that it is now possible to use openssh-win32 and packer with great success. https://operator-error.com/2018/04/16/windows-amis-with-even/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. I can see value in adding ssh access and enabling the AuthorizedUsersUrl
stack setting which would automatically configure the .ssh/authorized_keys
file.
Thanks a lot @petemounce for the link about setting up a local user account. We backed out of our initial effort to run the agent using a |
We're actually provisioning our images (in GCE) via ansible, and so there we're using the lookup plugin - https://github.com/azavea/ansible-buildkite-agent/blob/develop/tasks/install-on-Windows.yml#L2-L14 has an example. We create an ansible user for packer to use as follows, and delete it after a successful provisioning run before sysprep. I don't think at the time I had come across the post I'm linking above. $ErrorActionPreference = 'Stop'; # stop on all errors
$Count = Get-Random -min 24 -max 32
$TempPassword = -join ((65..90) + (97..122) + (48..57) | Get-Random -Count $Count | % {[char]$_})
$UserName = "packeransible"
write-output "Making $UserName ..."
New-LocalUser -Name $UserName -PasswordNeverExpires -Password ($TempPassword | ConvertTo-SecureString -AsPlainText -Force) | out-null
write-output "Adding to Administrators ..."
Add-LocalGroupMember -Group "Administrators" -Member $UserName | out-null
write-output "Saving password to file ..."
set-content -path "$($env:WINDIR)/temp/host.password.txt" -value $TempPassword -NoNewLine
write-output "Finished." We do that to work around packer-at-the-time not making the WinRMPassword available to its ansible provisioner. That's fixed now. Edit: I misunderstood you. We create a randomised password for the buildkite-agent user, don't record it anywhere, and that's fine (for us). |
We use windows' new openssh package to run Through a painful trial and error process, I learned how to use One other piece of painfully won information is that the user who will be using the key(s) needs to be the one to load them - I wasn't able to load them as one user on behalf of the buildkite-agent user. $write_to = "the path to the file on disk"
# https://superuser.com/questions/1296024/windows-ssh-permissions-for-private-key-are-too-open
# https://github.com/PowerShell/Win32-OpenSSH/wiki/Security-protection-of-various-files-in-Win32-OpenSSH
Write-Host "Setting filesystem permissions on key to allow it to be loaded to ssh-agent."
Write-Host "Giving ownership to $($username), running this script as $($env:username) (should match!)"
& icacls "$write_to"
& icacls "$write_to" /c /t /inheritance:d
& icacls "$write_to" /c /t /grant "$($username):F"
& icacls "$write_to" /c /t /remove Administrator BUILTIN\Administrators BUILTIN Everyone System Users
& icacls "$write_to"
Write-Host "Loading key to agent..."
& ssh-add "$($write_to)"
if ($LASTEXITCODE -ne 0) {
throw "Failed to load key to ssh-agent."
}
# illustrate success
& ssh-add -L
# so the key material is not left on disk at rest, remove it.
Remove-Item "$write_to" -force |
plugins-path="C:\buildkite-agent\plugins" | ||
experiment="${Env:BUILDKITE_AGENT_EXPERIMENTS}" | ||
priority=%n | ||
shell=powershell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know how people feel about using shell=powershell
. Does anyone think using the default cmd.exe
would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Windows, Powershell is the choice I'd expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on what versions of Windows this template supports. If we want to be able to support older versions of windows where Powershell's availability and/or stability is suspect, we might want to make it configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer CMD.exe
, just because we don't yet support Powershell officially in the agent.
I want to make a note here that this PR does install lifecycled and configures a handler script but I'm not sure that it really does anything useful. My understanding is that when lifecycled receives a termination notification then it postpones the instance termination while it runs the handler script to gracefully stop the buildkite agents. The graceful agent shutdown allows the agents to finish any running jobs. During my testing on Windows the agent stopped immediately without finishing the running job. Can someone confirm that the Windows agent isn't able to gracefully stop? If it can't then should we just remove lifecycled from this PR? |
@petemounce the implementation we have in this Windows AMI PR uses the same preinstalled bash hooks and I like this implementation because it allows us to reuse as much pre-existing work as possible. Please let me know if you think we're missing anything in our implementation. |
That sounds comprehensive. I'm not doing any of that because where I am uses GCP & vault, so had more wiring to do. |
That makes sense. We're transitioning to vault too but I'll keep this implementation the way it is for the sake of keeping it compatible with elastic-ci-stack-for-aws. |
Sounds good to me. I'm sure at some stage someone will make a vault-integration for the AWS secrets-management thing(s??). |
Sorry for the slow response, have been on vacation, just catching up now! This looks awesome, will review in more depth. ❤️ |
I'm torn on whether we should try and have this in the same repo. On one hand, there is a single spot to have things, but on the other hand it means a lot of config that might not apply to windows and it raises the bar for adding new features on the linux side. The other option is to move this code over to a lightweight elastic stack focused on windows at https://github.com/buildkite/elastic-ci-stack-for-aws-windows. Thoughts folks? |
I think it's a better UX for contributors to have things in the same repo
I don't think it's necessary for Windows & Linux to be in sync, features-wise, just because they're in the same place. Internally, CI-with-buildkite docs have a feature-matrix - we describe what we offer, then we have a tick or not for each platform, and we fill them in. Sets expectations fine, and shows progress. |
My other concern is it slows down iteration speed even further as we need to wait for CI for windows and linux. I guess we can do the mono-repo thing of detecting changes in subpaths. |
I'm leaning towards same repo presently, just thinking it through. |
The other advantages to having it in the same repo, is that this project can now say its "Multiplatform" (Amazon Linux and Windows). For each parameter description, we might need to include one of "(Linux and Windows)", "(Linux only)", or "(Windows only)" type thing? |
My preference would to have things be in a single repository as well. Having something like |
Yup, cool, I agree, let's have this in the one repo. @jeremiahsnapp could we move things into a |
@lox I relocated the packer templates and squashed a bunch of the commits. Let me know if there's anything else I can do to help. |
@lox just wanted to check in on where we are with this. I've been staging some work internally with hopes to consume this. |
The plan at this stage is to get this merged in soon, but we're debating whether we want to do a major release first with the new fast autoscaling stuff. Will update soon. |
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
@lox I updated this to be compatible with the 4.3.1 stack so it works with the new lambda scaling as well as the git mirror experiment option. It also uses @petemounce I also used some of your code example in your comments to create a |
@jeremiahsnapp why grant admin? |
@petemounce I'm still developing our Windows AMIs for our testing purposes and I think some of our tests are needing admin privilege but I might just not know of alternative solutions to our needs yet. Do you think it would be worth having it as a non-admin user by default and adding a cloudformation parameter that would allow us to choose to make it an admin user during instance startup if we wanted? Similar to the |
Personally; yes, definitely. |
Yeah, I reckon that would be a good idea. |
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Ok @lox and @petemounce, I added |
Ok, lemme get a point release out today for the last of the 4.x series and then lets get this merged into master. |
Merging this in, thanks for all your hard work @jeremiahsnapp! 💪🏻 |
FWIW, I've decided to remove the optional Windows Administrator setting in favour of always adding the user to the Admin group. The docker socket wasn't accessible to non-administrators, and with access to the docker socket you effectively have root access anyway. |
Would it be reasonable to instead include the access to the docket socket into the flag, so it's possible to run without admin and in so doing not have ability to docker? |
Yeah, I'll give it a try. |
This adds a packer template that creates a Windows Server 2019 (with docker installed) AMI that is fully functional with elastic-ci-stack-for-aws cloudformation. It even works with Buildkite's docker plugins.
The following lists the few things I identified as missing when compared with the existing Amazon Linux 2 packer template.
buildkite-agent user account is not createdAuthorizedUsersUrl
cloudformation setting does nothingBuildkiteAdditionalSudoPermissions
cloudformation setting does nothing because it has no context in windowsEnableDockerUserNamespaceRemap
cloudformation setting does nothing because dockeruserns-remap
functionality only works on linuxbk-check-disk-space.sh
script (equivalent windows script is not created)fix-buildkite-agent-builds-permissions
script (equivalent windows script is not created but I'm not sure we need this on Windows)docker-gc
hourly cron job (equivalent windows scheduled task is not created)docker-low-disk-gc
hourly cron job (equivalent windows scheduled task is not created)git-lfs
is not explicitly installed but the output ofchoco install git
makes me wonder if it actually installs itgoss
is not installed because it is only supported on linuxTo use the Windows AMI we download Buildkite's latest cloudformation yaml to
aws-windows-stack.yml
and replace theUserData
section with the following content.Then we just use terraform's
aws_cloudformation_stack
resource, point itstemplate_body
ataws-windows-stack.yml
and set other parameters appropriately. For example, the following shows the settings we use for oursingle-use
windows queue. It has only one agent per instance and the agent only runs one job and then the instance terminates itself. We use this queue for jobs that must run on the host (not in docker). The ephemeral nature of the instances ensures each job starts with a clean environment.We currently only use the following
buildkite_boot_windows.ps1
bootstrap script to increase docker'sstorage-opts
size
setting to enable a larger container filesystem.