You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 27, 2020. It is now read-only.
Homestead failed. It turns out that this was because of #48. Homestead reported that it was terminating but failed to do so, and continued listening on its port. Homer appeared to be in a similar state
Logs are below.
09-11-2016 11:20:23.576 UTC Status cassandra_store.cpp:181: Configuring store connection
09-11-2016 11:20:23.576 UTC Status cassandra_store.cpp:182: Hostname: localhost
09-11-2016 11:20:23.576 UTC Status cassandra_store.cpp:183: Port: 9160
09-11-2016 11:20:23.576 UTC Status cassandra_store.cpp:211: Configuring store worker pool
09-11-2016 11:20:23.576 UTC Status cassandra_store.cpp:212: Threads: 10
09-11-2016 11:20:23.576 UTC Status cassandra_store.cpp:213: Max Queue: 0
09-11-2016 11:20:23.577 UTC Error main.cpp:745: Failed to initialize the Cassandra cache with error code 3.
09-11-2016 11:20:23.577 UTC Status main.cpp:746: Homestead is shutting down
Impact
Deployment fails completely.
This is because the orchestration (Docker Compose / Kubernetes) cannot terminate and restart the failed container or report that the container has failed.
Correct behaviour is for the container to exit when it has failed and is shutting down the service. Wrong but possibly acceptable behaviour (i.e. code workaround) might be for it to stop listening on well known ports so that the orchestration can detect the failure.
Release and environment
Current master release.
Steps to reproduce
Happens at same time as #48. Misconfiguring cassandra address should do it once #48 is fixed.
The text was updated successfully, but these errors were encountered:
Homestead is run under supervisord. When Homestead emits the above log, we immediately exit with a process exit code of 2, so I doubt the process was stuck.
Homestead is configured with config to restart (up to 5000 times!) if the process fails within a second, and to restart indefinitely if the process dies after a second.
Homestead doesn't open ports (HTTP Signalling, HTTP Management or the Diameter stack) until after it's confirmed it can connect to Cassandra, so I don't know what port you are talking about that, except possibly 22.
Given that this problem was clearly terminal, it seems unlikely that if the entire container had stopped, we'd be in any better position.
Can you comment any further on what you saw here and what behaviour you would expect?
@richardwhiuk I would expect that if my deployment is DOA, and an individual component in it was broken, then I'd know. If the container died, then my orchestration could handle the issue by recreating it, and I'd be getting alarms. You can argue that supervisord is restarting the process and that's fine so far as it goes, but it seems that supervisord is giving up, perhaps after 5000 retries in a few seconds.
I think the right answer is for the orchestration to add a test on the various ports - if the container does not start listening on the HTTP ports within (say) 30 seconds, kill it and recreate it, which can be done by the Kubernetes infrastructure in this case. I'll implement that in the Kubernetes branch shortly, at which point I can close this issue down.
Symptoms
Homestead failed. It turns out that this was because of #48. Homestead reported that it was terminating but failed to do so, and continued listening on its port. Homer appeared to be in a similar state
Logs are below.
Impact
Deployment fails completely.
This is because the orchestration (Docker Compose / Kubernetes) cannot terminate and restart the failed container or report that the container has failed.
Correct behaviour is for the container to exit when it has failed and is shutting down the service. Wrong but possibly acceptable behaviour (i.e. code workaround) might be for it to stop listening on well known ports so that the orchestration can detect the failure.
Release and environment
Current master release.
Steps to reproduce
Happens at same time as #48. Misconfiguring cassandra address should do it once #48 is fixed.
The text was updated successfully, but these errors were encountered: