Skip to content

fix(aws-vault-proxy): add a cli flag to check the running service#307

Merged
mbevc1 merged 1 commit intoByteNess:mainfrom
gdamjan:vault-proxy-check-running
Mar 19, 2026
Merged

fix(aws-vault-proxy): add a cli flag to check the running service#307
mbevc1 merged 1 commit intoByteNess:mainfrom
gdamjan:vault-proxy-check-running

Conversation

@gdamjan
Copy link
Copy Markdown
Contributor

@gdamjan gdamjan commented Feb 19, 2026

now that we build the aws-vault-proxy docker image from scratch, the image doesn't contain pgrep, or curl or any other executables.

implement a healthcheck command in the same executable, that sends a http request to itself on localhost, and assume healthy for any http response. if the proxy process is stuck, then it'll become unhealthy.

can be tested by running:

docker compose kill --signal STOP aws-vault-proxy

which will make the container "stuck". docker will soon enough (after the default timeouts and retries) mark the service/container unhealthy.

Unstuck it with:

docker compose kill --signal CONT aws-vault-proxy

@gdamjan gdamjan requested a review from mbevc1 as a code owner February 19, 2026 19:10
@github-actions github-actions bot added the fix label Feb 19, 2026
@gdamjan
Copy link
Copy Markdown
Contributor Author

gdamjan commented Feb 19, 2026

I can't see where the "Validate PR" fails now?

@mbevc1 mbevc1 changed the title fix: aws-vault-proxy: add a cli flag to check the running service fix(aws-vault-proxy): add a cli flag to check the running service Feb 19, 2026
@mbevc1
Copy link
Copy Markdown
Contributor

mbevc1 commented Feb 19, 2026

I think double colon might be confusing the check, because strictly features go in brackets 🤔

@gdamjan
Copy link
Copy Markdown
Contributor Author

gdamjan commented Feb 19, 2026

I think double colon might be confusing the check, because strictly features go in brackets 🤔

ahh, thanks

Comment on lines +33 to +41
req, err := http.NewRequest("HEAD", "http://127.0.0.1:80/", nil)
if err != nil {
os.Exit(1)
}
_, err = http.DefaultClient.Do(req)
if err != nil {
os.Exit(1)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I suggest an improvemnt here and add timeout without building a body:

Suggested change
req, err := http.NewRequest("HEAD", "http://127.0.0.1:80/", nil)
if err != nil {
os.Exit(1)
}
_, err = http.DefaultClient.Do(req)
if err != nil {
os.Exit(1)
}
}
client := &http.Client{
Timeout: 2 * time.Second,
}
resp, err := client.Get("http://127.0.0.1:8080/health")
if err != nil {
os.Exit(1)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
os.Exit(1)
}
os.Exit(0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'll need to be port 80 though, otherwise off-course

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course - my typo here!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some concerns,

  • the timeout is actually handled by the docker runtime itself, so if the test command doesn't exit in time, it'll consider it unhealthy and stop it.
    • setting up 2 seconds might not be what docker users would want?
  • signaling unhealthiness on resp.StatusCode != http.StatusOK is not great, since the proxy can be alive and well but the GET /health would return 502 or 404 depending what's outside the proxy. We don't need to ultimately restart the proxy in that case IMHO

Copy link
Copy Markdown
Contributor

@mbevc1 mbevc1 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I was actually thinking about that and I think the question is - is the proxy useful if noting is responding? Do we need a health check if this check is irrelevant or we don't want to restart.

On the 2s, that was just an example and happy to change whatever feel a better fit here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gdamjan . Looks look, the only thing I wonder if we should still add timeout on the connection. I don't have strong opinion on what it should be, but I'd be interested in your thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdamjan no pressure, but wondering if you could push that timeout? 😄

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a 15s timeout. less than the default browser/curl, more than just 2 :D

healtcheck users can still make it lower in docker/docker-compose

@gdamjan gdamjan force-pushed the vault-proxy-check-running branch from 9d976ef to cba4ca5 Compare March 9, 2026 11:52
@gdamjan gdamjan force-pushed the vault-proxy-check-running branch from cba4ca5 to c6881ec Compare March 19, 2026 12:11
now that we build the aws-vault-proxy docker image from **scratch**,
the image doesn't contain pgrep, or curl or any other executables.

implement a healthcheck command in the same executable, that sends a http
request to itself on localhost, and assume healthy for any http response.
if the proxy process is stuck, then it'll become unhealthy.

can be tested by running:
```
docker compose kill --signal STOP aws-vault-proxy
```
which will make the container "stuck". docker will soon enough (after
the default timeouts and retries) mark the service/container unhealthy.

Unstuck it with:
```
docker compose kill --signal CONT aws-vault-proxy
```
@gdamjan gdamjan force-pushed the vault-proxy-check-running branch from c6881ec to c6511b4 Compare March 19, 2026 12:11
@mbevc1
Copy link
Copy Markdown
Contributor

mbevc1 commented Mar 19, 2026

Thanks for your contribution! 🎉

@mbevc1 mbevc1 merged commit 352f419 into ByteNess:main Mar 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants