Add startup logs to build log UI #2957

ammario · 2022-07-13T00:57:58Z

I have a complex startup script:

resource "coder_agent" "dev" {
  arch           = "amd64"
  os             = "linux"
  startup_script = <<EOF
    #!/bin/sh
    # install and start code-server
    curl -fsSL https://code-server.dev/install.sh | sh
    code-server --auth none --port 13337
    sudo service docker start
    if [ -f ~/personalize ]; then ~/personalize 2>&1 | tee  ~/.personalize.log; fi
    EOF
}

And it's not clear to me where I could see its log output. It would also be nice to not have to redirect command execution to logs manually.

The natural place for me to find this is the build log on the dashboard, but I also expect some file in my workspace to contain the log.

While we're at it, we should include the agent logs (/var/log/coder-agent.log) in the dashboard build log.

The text was updated successfully, but these errors were encountered:

ammario · 2022-07-31T22:05:44Z

So, the log is in /var/log/coder-startup-script.log. I had to read the documentation to find that, so the issue still stands as a major usability improvement.

ammario · 2022-10-17T21:25:20Z

Let's make sure we use a PrefixSuffixWriter (First N bytes + Last N bytes) to bound how much storage these logs can take. We also may need the same for the provisioner logs, but I haven't investigated the code.

ammario · 2022-10-18T18:08:10Z

An implementation challenge is startup_script can block for an indeterminate amount of time. Some of our examples spin off code-server asynchronously, but others block.

sreya · 2022-10-19T20:15:30Z

Let's make sure we use a PrefixSuffixWriter (First N bytes + Last N bytes) to bound how much storage these logs can take. We also may need the same for the provisioner logs, but I haven't investigated the code.

The only issue is if we're expecting some sort of prompt to be written to the log (something I know @kylecarbs mentioned when we discussed this Monday) then it might get truncated by the PrefixSuffixWriter.

We could potentially only keep the startup logs of the most recent build and maybe also use a PrefixSuffixWriter with a 1MB cap to prevent any maliciousness. That would more than likely solve 99% of cases and prevent us from ballooning over time.

The startup logs of previous builds are not likely to be useful.

ammario · 2022-10-20T02:56:59Z

Some kind of limiter is absolutely necessary to prevent DoS, e.g cat /dev/urandom.

We could potentially only keep the startup logs of the most recent build and maybe also use a PrefixSuffixWriter with a 1MB cap to prevent any maliciousness. That would more than likely solve 99% of cases and prevent us from ballooning over time.

This sounds good to me.

github-actions · 2023-01-04T00:10:27Z

This issue is becoming stale. In order to keep the tracker readable and actionable, I'm going close to this issue in 7 days if there isn't more activity.

BrunoQuaresma · 2023-01-05T19:10:24Z

Since startup logs can run for multiple agents, I think we can add some sort of tabs for it.

kylecarbs · 2023-01-06T16:59:37Z

What would the workspace status be if startup scripts are still running?

ammario · 2023-01-06T17:55:54Z

I think MVP here should be just putting the logs in the UI, and then in a follow-up feature we address waiting for startup scripts to finish. I believe some templates have startup scripts that block, e.g. they run code-server. Thus, a special workspace status for startup scripts could be a breaking / confusing change for existing deployments.

BrunoQuaresma · 2023-01-06T17:55:59Z

This is a good question. I don't know, I think it could be wither running or pending. However, I don't think this influences showing the startup logs on the build page too much.

mafredri · 2023-01-10T09:57:02Z

The discussion in #4677 is relevant to workspace status during startup (/shutdown). Relating to blocking in startup script, I think that should be avoided/discouraged and introduce something new, like coder_agent_service to start programs. This would allow more complex functionality like restarting services after X failed health checks and what not. (Perhaps a topic for a separate issue though).

I agree with @ammario that we limit this scope to simply storing/streaming the log. I suggest we apply compression (e.g. zstd via klauspost/compress) to the logs but do we do it pre or post- limiting output to 1MB? On the one hand pre-compression will ensure the output isn't usually truncated (in failure scenarios there could be a lot of output to sift through). On the other hand, compression + a for loop echoing the same thing could produce a very huge uncompressed log.

ammario · 2023-01-10T17:42:49Z

If we compress the log on capture, wouldn't that make it difficult / impossible to stream the log piecewise in the future?

mafredri · 2023-01-10T19:42:31Z

@ammario I understand your concern. I think it's doable but we can punt it for now. It'll add some complexity and raises the question if we should just utilize postgres compression instead (for the storage layer). My main thought with suggesting it was to allow larger logs to be stored, but reduced bandwidth could be a nice bonus.

On the topic, it's possible to use zstd as a transparent compression layer via SaveTheRbtz/zstd-seekable-format-go (I've only tested that it works, no other experience with seekable). Assuming we can trust it's one frame/line (observed behavior), this would map pretty well to inserts in the database.

One simpler solution we may employ is to compact logs once they're done streaming and store it as a single bytea blob. Unfortunately, lib/pq would require us to fetch the entire blob, so we can't do streaming decompression for memory savings.

We could also just do compaction for the "historical" jobs, which might allow us to keep them around. I personally think they'd be useful for debugging purposes by both admins and users. (I.e. what's different today, this worked yesterday!)

ammario · 2023-01-11T17:37:35Z

Right now logs older than 30 days are getting deleted automatically, so I don't think we have to worry about storage. An upper-bounded workspace would generate 30MB of logs per month.

And good points on the lack of seekability in the pq driver. That is unfortunate.

BrunoQuaresma · 2023-01-31T17:14:24Z

Removing myself from this until we have the BE done. It makes it easier for me to see what issues I should focus on.

mafredri · 2023-02-01T13:55:39Z

@ammario I've got a few questions for implementing this functionality.

(One thing worth noting also is that "startup scripts that end" has been implemented, and our examples adjusted, see #5749.)

Do we want a push or pull architecture, or both?
Regarding proposed PrefixSuffixWriter, do we send the tail once startup script has ended or do we try to stream the tail continuously?

To expand on 1.

Push architecture pros/cons:

+ Agent logs will always be available, even if the workspace is shut down
- N number of agents will simultaneously be sending logs to the server
- Server needs to monitor load and tell agents to back off if it gets too rough
- Causes load even when logs are not needed
- If we want to stream the "tail" part of a "limited" log, we have to either send & delete or wait until e.g. startup script has completed

Pull architecture pros/cons:

+ No additional load except when someone is viewing the logs (e.g. workspace build logs, trying to SSH in before workspace is ready, etc.)
+ Since we're always streaming logs we have more freedom with how we handle limits and streaming the tail part of the log
+ We can do seeking to reveal any part of the log, if needed
- Logs are ephemeral and not available after workspace stop
+ Can be extended to scrape logs by server at certain points

If we go with push, do we go with (d)RPC to reduce load?

If we go with pull, would this be a suitable addition to the "statistics server" on the agent @kylecarbs? We'd probably rename it though. Reason I ask is that I worry about allocating too many listen ports, we've not reserved that many.

And re: 2., assuming we go with push, I imagine an implementation where we allow straight off 1MB(-64KB) to be streamed and stored, then the remaining "tail" is stored in a circular buffer of, say, 64KB and will be committed once the startup script exits. Does this match what you had in mind @ammario?

The downside of this is that the user will not see what's currently happening in the log (which is an edge case since they'd exceeded the ~1MB limit).

The alternative of streaming the tail part doesn't really prevent DoS.

As for /tmp/coder-agent.log, we might to break that out into a separate issue. For instance, if we go with push, we need to decide what parts of the logs are sent/kept. For instance, we could consider "all logs from start until workspace lifecycle = ready" to be included in the build logs. Likewise once we add shutdown, all logs from lifecycle = shutting_down -> stopped could also be added to the build log.

None of that would allow inspecting the current log, though. That'd still fall under pull category. As for why we need to break it up into parts is that my workspace log after running for over a day is 3.8MB.

ammario · 2023-02-01T17:22:39Z

I think the pull architecture is the way to go:

We can avoid PrefixSuffixWriter, always giving the user the full log
We can seek, efficiently serving massive logs to the frontend
The code is significantly simpler

I also think we should only have a single listen server on the agent, as is typical of most daemons.

Refs: #2957

bpmct · 2023-03-28T13:44:44Z

Closing as completed by #6558.

ammario added the feature Something we don't have yet label Jul 13, 2022

ammario changed the title ~~Add startup script log to build log~~ Add startup logs to build log Jul 31, 2022

This was referenced Jul 31, 2022

Can't access logs when workspace fails #3329

Closed

Add Coder agent's startup_script output to build log #2796

Closed

This comment was marked as off-topic.

Sign in to view

ammario changed the title ~~Add startup logs to build log~~ Add startup logs to build log UI Jul 31, 2022

ammario added this to the EE milestone Aug 22, 2022

kylecarbs mentioned this issue Aug 24, 2022

Ability to run multiple commands in a coder_agent #2629

Closed

sreya self-assigned this Sep 8, 2022

ammario assigned ammario and unassigned sreya and ammario Sep 26, 2022

kylecarbs mentioned this issue Oct 16, 2022

Multiple OAuth Git provider support #3078

Closed

bpmct assigned sreya Oct 17, 2022

sreya removed their assignment Nov 4, 2022

bpmct mentioned this issue Nov 9, 2022

Overall UX/Developer Experience #4978

Closed

21 tasks

mafredri mentioned this issue Nov 28, 2022

feat: Add support for shutdown_script #5171

Closed

github-actions bot added the stale This issue is like stale bread. label Jan 4, 2023

BrunoQuaresma self-assigned this Jan 5, 2023

coadler removed the stale This issue is like stale bread. label Jan 5, 2023

mafredri self-assigned this Jan 10, 2023

mafredri mentioned this issue Jan 17, 2023

Change agent startup script behavior from being never-ending to indicating the workspace is ready on end #5749

Closed

13 tasks

BrunoQuaresma removed their assignment Jan 31, 2023

mafredri mentioned this issue Feb 1, 2023

chore: Rename agent statistics server to http api server #5961

Merged

mafredri added a commit that referenced this issue Feb 6, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

71c8f00

Refs: #2957

mafredri mentioned this issue Feb 6, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH) #6060

Closed

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

53fca0e

Refs: #2957

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

f2bfb20

Refs: #2957

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

6d9b47f

Refs: #2957

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

4d12aaf

Refs: #2957

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

b4898e1

Refs: #2957

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

6f49908

Refs: #2957

mafredri added a commit that referenced this issue Feb 7, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

1a92880

Refs: #2957

mafredri added a commit that referenced this issue Feb 8, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

0b45ef3

Refs: #2957

mafredri added a commit that referenced this issue Feb 8, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

217b5d4

Refs: #2957

mafredri added a commit that referenced this issue Feb 8, 2023

feat: Add agent logs in API, show startup_script tail on error (SSH)

ea8a61a

Refs: #2957

code-asher self-assigned this Feb 27, 2023

kylecarbs assigned kylecarbs and unassigned mafredri and code-asher Mar 14, 2023

bpmct closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add startup logs to build log UI #2957

Add startup logs to build log UI #2957

ammario commented Jul 13, 2022 •

edited

ammario commented Jul 31, 2022 •

edited

This comment was marked as off-topic.

ammario commented Oct 17, 2022

ammario commented Oct 18, 2022

sreya commented Oct 19, 2022 •

edited

ammario commented Oct 20, 2022

github-actions bot commented Jan 4, 2023

BrunoQuaresma commented Jan 5, 2023

kylecarbs commented Jan 6, 2023

ammario commented Jan 6, 2023

BrunoQuaresma commented Jan 6, 2023

mafredri commented Jan 10, 2023

ammario commented Jan 10, 2023

mafredri commented Jan 10, 2023

ammario commented Jan 11, 2023

BrunoQuaresma commented Jan 31, 2023

mafredri commented Feb 1, 2023 •

edited

ammario commented Feb 1, 2023 •

edited

bpmct commented Mar 28, 2023

Add startup logs to build log UI #2957

Add startup logs to build log UI #2957

Comments

ammario commented Jul 13, 2022 • edited

ammario commented Jul 31, 2022 • edited

This comment was marked as off-topic.

ammario commented Oct 17, 2022

ammario commented Oct 18, 2022

sreya commented Oct 19, 2022 • edited

ammario commented Oct 20, 2022

github-actions bot commented Jan 4, 2023

BrunoQuaresma commented Jan 5, 2023

kylecarbs commented Jan 6, 2023

ammario commented Jan 6, 2023

BrunoQuaresma commented Jan 6, 2023

mafredri commented Jan 10, 2023

ammario commented Jan 10, 2023

mafredri commented Jan 10, 2023

ammario commented Jan 11, 2023

BrunoQuaresma commented Jan 31, 2023

mafredri commented Feb 1, 2023 • edited

ammario commented Feb 1, 2023 • edited

bpmct commented Mar 28, 2023

ammario commented Jul 13, 2022 •

edited

ammario commented Jul 31, 2022 •

edited

sreya commented Oct 19, 2022 •

edited

mafredri commented Feb 1, 2023 •

edited

ammario commented Feb 1, 2023 •

edited