Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add connection_timeout and troubleshooting_url to agent #4937

Merged
merged 11 commits into from
Nov 9, 2022

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Nov 7, 2022

This commit adds the connection timeout and troubleshooting url fields
to coder agents.

If an initial connection cannot be established within connection timeout
seconds, then the agent status will be marked as "timeout".

The troubleshooting URL will be present, if configured in the Terraform
template, it can be presented to the user when the agent state is either
"timeout" or "disconnected".

Fixes #4678

@mafredri mafredri self-assigned this Nov 7, 2022
Copy link
Collaborator

@BrunoQuaresma BrunoQuaresma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE looks good

@mafredri mafredri force-pushed the mafredri/agent-connection-timeout-troubleshooting branch from a1e1238 to f02dfca Compare November 7, 2022 18:52
@mafredri mafredri marked this pull request as ready for review November 8, 2022 14:13
@mafredri mafredri requested a review from a team as a code owner November 8, 2022 14:13
@mafredri mafredri requested review from code-asher, kylecarbs and a team and removed request for a team November 8, 2022 14:13
coderd/database/databasefake/databasefake.go Outdated Show resolved Hide resolved
@@ -31,8 +31,13 @@ import (
"github.com/coder/coder/testutil"
)

func setupWorkspaceForAgent(t *testing.T) (*codersdk.Client, codersdk.Workspace, string) {
func setupWorkspaceForAgent(t *testing.T, mutate func([]*proto.Agent) []*proto.Agent) (*codersdk.Client, codersdk.Workspace, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mutate ...func? You would get rid of the nil in setupWorkspaceForAgent(t, nil).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean suggestion, but I think we can keep this explicit for now, makes it slightly more convenient to add new arguments, if needed and I don't see a use-case for multiple mutation functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the nil is considered to be an anti-pattern as it has an ambiguous meaning. This is the main reason why I raised this comment, so feel free to leave it as is.

Auth: &proto.Agent_Token{
Token: authToken,
},
ConnectionTimeout: 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: question about the naming conversion, it seems that ConnectionTimeout is reserved only for seconds, should we name this ConnectionTimeoutSec? I hope that we won't need a 500ms timeout in the future :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad suggestion at all. I'm a bit vary of doing it though since this value is propagated through terraform-provider-coder, I was following the previous convention for how an int/second is represented in the code (e.g. codersdk.Healthcheck.Interval).

We do have other places in the code that utilizes *Millis/_ms, however. So there is precedent for both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is precedent for both

Good for me, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth opening an issue to standardize these fields @mafredri... it makes me significantly more nervous about working with code with unspecified durations. The change of course doesn't need to happen in this PR though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create the issue for standardization tomorrow, I did however push the change for this PR in a182cdf already, turned out much better. Thanks for pushing for this change both of you. 👍🏻

Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -53,7 +54,9 @@ type WorkspaceAgent struct {
Version string `json:"version"`
Apps []WorkspaceApp `json:"apps"`
// DERPLatency is mapped by region name (e.g. "New York City", "Seattle").
DERPLatency map[string]DERPRegion `json:"latency,omitempty"`
DERPLatency map[string]DERPRegion `json:"latency,omitempty"`
ConnectionTimeoutSeconds int32 `json:"connection_timeout_seconds"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrunoQuaresma Is this a preferable format, or does a duration string work as well for the frontend (e.g. 2m0s).

NOTE: This is not a field that the frontend needs to care about unless it's useful, the status changes automatically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An integer works better than a duration string.

This commit adds the connection timeout and troubleshooting url fields
to coder agents.

If an initial connection cannot be established within connection timeout
seconds, then the agent status will be marked as `"timeout"`.

The troubleshooting URL will be present, if configured in the Terraform
template, it can be presented to the user when the agent state is either
`"timeout"` or `"disconnected"`.

Fixes #4678
@mafredri mafredri force-pushed the mafredri/agent-connection-timeout-troubleshooting branch from f2620e5 to 7a7315a Compare November 9, 2022 13:20
@mafredri mafredri merged commit 90c34b7 into main Nov 9, 2022
@mafredri mafredri deleted the mafredri/agent-connection-timeout-troubleshooting branch November 9, 2022 15:27
@github-actions github-actions bot locked and limited conversation to collaborators Nov 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a timeout + debug steps to the Coder agent (agent stuck in Connecting... forever isn't accurate)
4 participants