Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for coder_script #9584

Merged
merged 68 commits into from
Sep 25, 2023
Merged

feat: add support for coder_script #9584

merged 68 commits into from
Sep 25, 2023

Conversation

kylecarbs
Copy link
Member

@kylecarbs kylecarbs commented Sep 7, 2023

This allows users to define separated log streams for different scripts, and customize the names + icons.

The diff here is massive, but 90% of this is boilerplate for just piping through such a significant amount of new data to the workspace agent. The only real refactor is how the workspace agent handles scripts.

This breaks any customers that are manually piping logs with our external API as it is right now, so I'll be fixing that prior to a merge. I think I'll use a hard-coded UUID for an external source, and call it "External" for now... then eventually our external tools can catch up.

@kylecarbs kylecarbs self-assigned this Sep 7, 2023
@kylecarbs
Copy link
Member Author

@mafredri I'm going to just make the FE work as it did before, and then I believe this could be ready to merge. It's obviously massive, but I think there's a very low chance of regression here. I manually tested with an old version of the provider as well, and everything just works.

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of the changes in this PR, nice work!

Identified a few issues, building and tests seem broken, and I do think we'll introduce two breaking changes:

  1. The startup blocking behavior for old clients will break
  2. Old agents won't be able to send logs or may have issues communicating with coderd (didn't verify, just a hunch)

agent/agentscripts/agentscripts.go Outdated Show resolved Hide resolved
agent/agentscripts/agentscripts.go Show resolved Hide resolved
agent/agentscripts/agentscripts.go Outdated Show resolved Hide resolved
agent/agentscripts/agentscripts.go Outdated Show resolved Hide resolved
if script.Timeout > 0 {
var cancel context.CancelFunc
// Add a buffer to forcefully kill with the context.
ctx, cancel = context.WithTimeout(ctx, script.Timeout+(3*time.Second))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may require a rethink. We're passing the agent context in New and using it here (r.ctx). This means that when the agent is interrupted, all scripts will be terminated, and shutdown scripts won't be able to run.

Currently in (*agent).Close we use a background context as base, not the agent context. So I don't think it's a good idea tying the scripts context and agent context together. Instead we should rely on New and Close to perform the appropriate startup/teardown. In (agent).Close, the Execute (stop) should run for as long as it needs to, or until the script timeout is hit (which will interrupt + wait X seconds + kill).

PS. I think this could be bumped to like 5 or 10 seconds at least.

coderd/provisionerdserver/provisionerdserver.go Outdated Show resolved Hide resolved
coderd/workspaceagents.go Show resolved Hide resolved
codersdk/workspaceagents.go Outdated Show resolved Hide resolved
provisioner/terraform/resources.go Outdated Show resolved Hide resolved
wait = false
default:
return xerrors.Errorf("unknown startup script behavior %q", workspaceAgent.StartupScriptBehavior)
for _, script := range workspaceAgent.Scripts {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this change, I believe old clients connecting to coderd will be broken by this since the API no longer includes the startup script behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mafredri do you think I should just include the field and leave it as a static value to not break old clients?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylecarbs I'm assuming static would be non-blocking? For anyone using blocking I suppose it would still be a breaking change. But I'm not sure it's worth including so much legacy to keep piping it to the DB, so I think it's acceptable. But if we leave it like that, I think we should mention it in the release notes.

@bpmct
Copy link
Member

bpmct commented Sep 23, 2023

This could be a separate PR, but thoughts on hiding these by default from the resources list on the dashboard?

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments on the code, but other than that I also noticed one other issue:

I have two startup scripts, one blocks login and both have timed out. The workspace has entered start_timeout lifecycle but the CLI still thinks the scripts are running.

❯ CODER_SSH_WAIT=no ssh coder.test
👋 Your workspace is outdated! Update it here: http://127.0.0.1:3000/@admin/test

==> ⧗ Running workspace agent startup scripts (non-blocking)
2023-09-25 18:19:23.047Z Startup Script 2: Hello, world 2!
2023-09-25 18:19:23.047Z Startup Script: Hello, world!
Notice: The startup scripts are still running and your workspace may be incomplete.

And this shows that the lifecycle has indeed changed (also visible in logs, but didn't attach those).

❯ ./scripts/coder-dev.sh list -o json | grep lifecycle
              "lifecycle_state": "start_timeout",

Both the CLI from this PR and current main behave the same way. I took a quick look but didn't see why this would be.

Finally, one more wish:

  • It would be nice to differentiate scripts in the UI when no icon has been configured, currently a hover is required but that's not practical for 1000s of rows. Perhaps we could have like an array of 10 colors that we give the scripts based on their order/name. I say 10 colors since it seems risky to pick it randomly (if it's red it could feel like an error)

agent/agentscripts/agentscripts.go Show resolved Hide resolved
agent/agentscripts/agentscripts.go Outdated Show resolved Hide resolved

func cmdCancel(cmd *exec.Cmd) func() error {
return func() error {
return syscall.Kill(-cmd.Process.Pid, syscall.SIGINT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When e.g. bash is executing a script, SIGINT gets captured and passed to the current command being executed, this results in that command exiting and then the script executes the next command. Essentially circumventing what we're trying to do here.

Changing this to SIGHUP will work better for what we're trying to do and will ensure everything is canceled. This is what a terminal typically sends to a program when the attached terminal goes away, so seems alright.

The downside is that perhaps it's more common to have handling for SIGINT, in which case this could cause a dirty exit for some script authors. But we can just document this behavior instead so it's a non-issue and can still be handled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I appreciate the thorough thought here! 😍

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, this really turned out nicely btw, awesome job!

@kylecarbs
Copy link
Member Author

@mafredri I did the deterministic colors - that was a great suggestion!

image

@kylecarbs kylecarbs merged commit 1262eef into main Sep 25, 2023
23 of 24 checks passed
@kylecarbs kylecarbs deleted the execscripts branch September 25, 2023 21:47
@github-actions github-actions bot locked and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants