Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add retry and error handling for failing arvbox services #57

Closed

Conversation

jrandall
Copy link
Contributor

No description provided.

@tetron
Copy link
Member

tetron commented Nov 16, 2016

Couple notes:

  • This appears to be 22 copies of the same file, instead of a symlink to a common implementation
  • Arvbox is a little unorthodox in that it uses process monitoring as a way of managing service dependencies. The service scripts intentionally crash and retry if the prerequisites have not been fulfilled. So stopping the service after a few tries might mean it gives up too soon.
  • If a service isn't starting, the "give up" behavior should be to shut down the whole container (e.g. send TERM to PID 1)

@jrandall jrandall force-pushed the 10545-arvbox-services-retry-then-error branch from 5425333 to 817f2ee Compare November 16, 2016 22:47
@jrandall jrandall force-pushed the 10545-arvbox-services-retry-then-error branch from 817f2ee to dd6f1c8 Compare November 17, 2016 13:15
@jrandall
Copy link
Contributor Author

  • I've changed to a symlink to a shared file in a .common directory (initially I hadn't done that because I didn't know how to let runit know a subdirectory isn't an actual service, as the service directory is a VOLUME).
  • I guess by proposing this change I am saying that I have repeatedly been bitten by arvbox's behavior when it comes to retrying services. I understand sometimes they crash "on purpose" - perhaps a retry count higher than 3 would be better, but looping infinitely when things are in fact failing permanently is annoying.
  • if I give up by killing the container (which was actually my first approach), the user can no longer view the logs using arvbox log ...

@jrandall
Copy link
Contributor Author

The current revision also adds --retry 3 to the bundle install ... calls, as I've occasionally seen temporary network failures cause the two bundle install attempts run one after the other to both fail for the same underlying reason.

@tfmorris
Copy link

@tetron Are @jrandall changes acceptable? Is anything else required?

Actually we probably need DCO-Signoff added, but anything else?

@tetron
Copy link
Member

tetron commented Dec 20, 2017

@tfmorris I'd forgotten about it, I'll take another look at it.

@tfmorris
Copy link

tfmorris commented Jun 4, 2018

@jrandall Sorry for the long delay on this. If you're still interested, please rebase against the current tree and add the new required DCO signoff headers so that we can restart the review process.

@tetron tetron self-assigned this Sep 5, 2018
@tetron
Copy link
Member

tetron commented Aug 30, 2023

Closing old PRs

@tetron tetron closed this Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants