Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set $HOME for coder agent in gcp-linux template #2147

Merged
merged 1 commit into from Jun 8, 2022

Conversation

spikecurtis
Copy link
Contributor

This is the first PR of #1386 but I still want to investigate the other cloud providers to see whether they hit a similar issue with $HOME before closing it.

This modifies the gcp-linux template to setup a home directory before running the agent script.

Signed-off-by: Spike Curtis <spike@coder.com>
Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified that $HOME isn't set already? I'm pretty sure it is by default on GCP, even in the startup script.

@spikecurtis
Copy link
Contributor Author

Have you verified that $HOME isn't set already? I'm pretty sure it is by default on GCP, even in the startup script.

Yeah, have been playing around with this template. Existing code leaves the coder agent without a $HOME, and this fixes.

@spikecurtis spikecurtis changed the title feat: set /Users/spike for coder agent in gcp-linux template feat: set $HOME for coder agent in gcp-linux template Jun 7, 2022
@kylecarbs
Copy link
Member

I'm concerned customers will encounter this and encounter bugs without any clear path to resolve. Do you think having the default Linux behavior of /root and /home/<username> would be dangerous? We'd only do this if $HOME is not already specified.

@spikecurtis
Copy link
Contributor Author

I'm concerned customers will encounter this and encounter bugs without any clear path to resolve. Do you think having the default Linux behavior of /root and /home/ would be dangerous? We'd only do this if $HOME is not already specified.

I'm guessing you are suggesting that we add this to the init_script. I would consider setting $HOME in the init_script to be a layering violation, and am wary because layering violations lead to unpredictable, hard to debug & maintain code.

I get the pragmatic argument, that these are very well known conventions around home directory, and am not aware of any Linux distribution that violates them. However, on any distribution people are free to put their homedir kind of anywhere they want. Ask yourself this question: if the right move is to fall back to convention if $HOME is unset, then why isn't JetBrains doing it?

In this case, the missing $HOME is a symptom of a larger problem, which is that coder agent is being started in a non-standard Linux way, rather than being the problem itself, and fixing it up in init_script is a bandaid. Who knows what other programs will fail because of other problems. Like, the AWS example #2150 we were copying the root environment vars to the ubuntu user, which could affect $PATH or other things that are set in the image.

I'm not opposed to the agent executing some "preflight checks" to catch missing $HOME for template authors. In the short term we could probably just fail and assume someone will check the logs, but longer term it might be nice to have non-fatal check failures get reported to coderd after the agent connects.

@kylecarbs
Copy link
Member

Ask yourself this question: if the right move is to fall back to convention if $HOME is unset, then why isn't JetBrains doing it?

JetBrains remote development solution is very new, while VS Code Remote handles this case just fine. I'd assume this is a bug on their end.

I'm guessing you are suggesting that we add this to the init_script. I would consider setting $HOME in the init_script to be a layering violation, and am wary because layering violations lead to unpredictable, hard to debug & maintain code.

We'd add this to the agent. If os.Getenv("HOME") is unset, we apply reasonable defaults to prevent users from encountering issues.

@spikecurtis
Copy link
Contributor Author

Also note that what I'm doing in this PR is not a great long-term position because we are still executing the agent in a "non-standard" way by running it as root from the startup script. The right thing to do is to have the startup script (or terraform itself) install a systemd unit to run the coder agent as a normal user. The reason I'm not doing it here is that it's more involved and I want to unblock people from #1386

@spikecurtis
Copy link
Contributor Author

JetBrains remote development solution is very new, while VS Code Remote handles this case just fine. I'd assume this is a bug on their end.

Can you give a little more detail here? In particular, I don't think "VS Code Remote runs just fine without $HOME being set" is equivalent to "VS Code Remote defaults to convention for the homedir." As in, maybe VS Code doesn't care about the homedir?

@kylecarbs
Copy link
Member

VS Code Remote defaults to convention for the homedir.

I'm not sure why we don't as well, because it seems like the user only excludes $HOME by accident.

@spikecurtis
Copy link
Contributor Author

Ah, when I try to connect with VS Code Remote - SSH on a workspace where $HOME isn't set, I get

[10:19:30.952] > Creating the server install dir failed...

But it works with my fixed code that sets the home dir.

@kylecarbs
Copy link
Member

Ahh, that's a bad assumption on my part then. I suppose I'd say the onus is more on us in this instance because we're assisting with provisioning the environment and control the shells in userspace.

Anything we have to apply to every(?) template seems we should have a general solution for, otherwise, customers will encounter the same issue.

@spikecurtis
Copy link
Contributor Author

Anything we have to apply to every(?) template seems we should have a general solution for, otherwise, customers will encounter the same issue.

I do agree with this sentiment, but that general solution should be based around starting the agent "the right way", i.e. using systemd. It might be convenient for the coder Terraform provider to also be able to spit out a systemd unit like we currently have in the Digital Ocean template.

@kylecarbs
Copy link
Member

Containers don't have systemd and neither does Windows or Mac, so I'm hesitant to say that's the right way. With that, the agent should run as the target user to maximize cross-platform compatibility, we don't require root privileges for anything.

@spikecurtis
Copy link
Contributor Author

Well, the right way for linux.

@spikecurtis
Copy link
Contributor Author

Interestingly, I tried VS Code Remote to the old gcp-linux template that doesn't set $HOME, and it worked fine. So, VS Code Remote does appear to work around the missing $HOME.

Meaning: some other issue fixed by #2150 was breaking VS Code Remote on AWS. This is the kind of thing I'm talking about when I say that missing $HOME is a symptom of the larger problem, and why I want to tackle that problem rather than focusing narrowly on fixing $HOME with an agent code fix.

@spikecurtis
Copy link
Contributor Author

@kylecarbs , if you're dead set on adding a $HOME fixup to coder agent, I'm not going to stop you, but does that need to hold up this PR?

@kylecarbs
Copy link
Member

I think this isn't a great solution for our users. It's better than nothing so I won't hold it up, but I'd be much more in favor of defaulting a $HOME directory, and I don't feel like I understand your perspective of why we shouldn't.

Comment on lines +76 to +89
metadata_startup_script = <<EOMETA
#!/usr/bin/env sh
set -eux pipefail

mkdir /root || true
cat <<'EOCODER' > /root/coder_agent.sh
${coder_agent.dev.init_script}
EOCODER
chmod +x /root/coder_agent.sh

export HOME=/root
/root/coder_agent.sh

EOMETA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you could just export HOME=/root && ${coder_agent.dev.init_script}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I want to move this template in the direction of writing scripts to disk in anticipation of further refinement where we also write a systemd unit.

@kylecarbs
Copy link
Member

To echo my perspective, every container-based deployment will have to manually set home, or they'll encounter these problems.

@spikecurtis spikecurtis merged commit a86c957 into main Jun 8, 2022
@spikecurtis spikecurtis deleted the spike/1386_gcp_set_home branch June 8, 2022 18:22
kylecarbs pushed a commit that referenced this pull request Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants