-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Hubs cloud went offline after 4 months of working perfectly ... and it refuses to go online again #4500
Comments
When you checked the instance,did you see if you probably ran out of space on the EBS volume? |
Thanks Brandon. Current status, as per "df", 3GB free out of 8GB. Filesystem 1K-blocks Used Available Use% Mounted on The message "Hubs Cloud is currently offline. Check back shortly." is still there. |
Syslog shows an eternal loop with errors of Certbot: Aug 11 17:26:21 flamboyant-giant bash[2100]: certbot.default@sopart-01(O): Wed Aug 11 17:26:21 UTC 2021 Renewing LetsEncrypt certificates if neccessary Maybe a problem with certificates? I don't know anything about how Hubs handles certificates. |
to my knowledge, the hub instance shouldn't be handling the ssl cert that would be at the CDN level for AWS set up. I just looked for certbot on my hubs instance and it's not installed |
I am getting direction to saying to terminate the instance again. it may take up to two or 3 times for the stack to self heal. |
I'll do that and will keep you posted. Thanks, Brandon. |
I've restarted the whole stack 7 times, 4 in previous days, and 3 more today. Same result. If I send the syslog, will it help to analyze the problem? A portion of syslog: Aug 11 23:12:50 romantic-rogue bash[2193]: bio-sup(MR): Updating from mozillareality/ita to mozillareality/ita/0.0.1/20200526203229 |
Wait are you restarting the instance or terminating them. Terminating is similar to deleting the ec2 instance. https://www.youtube.com/watch?v=Zwjc1VMKOv0. I know the terminology says restart but you would need to terminate them. The error could be related to " because {{cfg.general.plugin}}'s empty, so the certbot renew's "chef-habitat-run-hook" templated command's not correctly constructed |
Thanks, Brandon. I've been actually putting the stack Offline in Cloudformation, as I understand it, this action terminates the instance. I love Robin, a very nice video. I'll try terminating the instance a few times. I'll be back. |
I terminated the EC2 instance 5 times (waiting 15 minutes) and still "Hubs Cloud is currently offline. Check back shortly." |
Let me report back to my team in the morning to see if we can come together for a possible fix for this |
TQVM Brandon. Good night. |
I have a couple things i want you to do to help with troubleshooting. If you are in the discord community, please DM me because this information will output non public information on your stack |
Thanks Brandon. |
Issue required the stack to be deleted and refreshed. |
Description
We run a permanent museum that went live on April 15, 2021. It worked perfectly until a few days ago when the message "Hubs Cloud is currently offline. Check back shortly." showed up.
I tried everything in bugs 3071, 3429, and 4097 with no luck.
The message shows up trying to use admin, spoke, or the site.
Once I found the offline message:
1.- Check that the server was up and running (t3.medium).
2.- Connected to the server thru SSH.
3.- Ran "top" and couldn't identify a Hubs-related process.
4.- With "ps", I found a few processes belonging to the user "hab", not sure if they are related to hubs.
5.- As per one of the recommendations, I terminated the server, as expected another one started automátically. After 30 minutes the offline message was still there.
6.- Then I updated the stack to put it offline, the server was shut down properly. I waited a few minutes and put it back on line. A new server started nicely but after 30 minutes the offline message was still there. Note: I have AutoPauseDb set to "Yes - Pause database when not in use" but it was running like that all the time, and as I understand, it may delay the process for a few seconds, not minutes.
Questions:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
My Museum should show up.
Screenshots
Hardware
The text was updated successfully, but these errors were encountered: