Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubs Cloud Rollback for Stack Update Offline to Online + Template update fails and rollback fails #6004

Closed
eyesnareinc opened this issue Mar 21, 2023 · 10 comments
Labels
bug needs triage For bugs that have not yet been assigned a fix priority

Comments

@eyesnareinc
Copy link

Hubscloud stack update offline to online fails and rolls back. Hubscloud template update fails and update rollback also fails.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Cloud Formation
  2. Click on Stack name
  3. Scroll down to "online/offline option" and update
  4. AppCloudfrontDistribution shows Status: UPDATE_COMPLETE; AppASG shows Status: UPDATE_IN_PROGRESS, Status reason: New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT30M
  5. RESULT: Update Failed at AppDb Status Reason: Received 0 SUCCESS signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement; Stack rolled back to previous status.

THEN I TRIED
6. Updating the AWS Template to Hubs Cloud Personal 1.1.5 following the steps here (https://hubs.mozilla.com/docs/hubs-cloud-aws-updating-the-stack.html)
7. RESULT: Update failed at AppDb. Resource handler returned message: "Cannot find upgrade target from 10.serverless_21 with requested version 11.13 for serverless engine mode. Also stack roll back failed.

Expected behavior
Stack should update to online mode and work normally.

Hardware
AWS

Additional context
I've been hosting monthly events since Dec 2020 and this has never occurred.

@eyesnareinc eyesnareinc added bug needs triage For bugs that have not yet been assigned a fix priority labels Mar 21, 2023
@eyesnareinc
Copy link
Author

hubscloudrollback1

@eyesnareinc
Copy link
Author

updateFailed1

@standtech
Copy link

If you are getting this DB error then definitely You are using a IAM admin account not root. Dont forget to cleanup route53 . It just keeps piling up. Also make sure your stack name is "short" , seems to be there is some character limits for the resource names which takes this up in their naming conventions. am also having similar issue.

@ikenichiro
Copy link

I was able to fix this issue from the following process, however, I'm not 100% confident that this is the right process since I'm facing a different issue and not able to run the stack ( #6005 ).
Logically it should not be a big issue, but consider it at your own risk.

  1. Open CloudFormation and see Resources for "Appdb" and go to RDS, click Modify.
  2. Check the drop down list, and check if it has "compatible with PostgreSQL 11.16". If the version number is slightly different, remember that number.
  3. Go to (https://hubs.mozilla.com/docs/hubs-cloud-aws-updating-the-stack.html) and download the 1.1.5 to local drive.
  4. Replace the version number of 11.13 with the version number indicated in the drop down list, whereas it was 11.16 in my case. To be more specific,, the string you need to modify is right here (Hubs-Foundation/hubs-ops@28b7276) as in this commit changed it from 12 to 11.13, you should change it from 11.13 to 11.16 or some other version number you have checked earlier.
  5. Upload the modified template to S3 bucket, which can be accessed from public.
  6. Copy the file URL (starting with https) and check if it's accessible from a private browsing mode or another browser.
  7. Go back to CloudFormation, press update for the stack and update the template using the template URL from your own S3 bucket.
  8. AppDb issue should be resolved.

@standtech
Copy link

DB version warning is just a drift not a issue triggering rollback.. I guess Issue is only with APPASG.

@eyesnareinc
Copy link
Author

After the rollback failed, I created a new stack and tried to use the backup and restore steps. The create failed on AppDb with Status reason: Resource handler returned message: "The engine version you requested for your restored DB cluster (11.16) is not compatible with the engine version of the DB cluster snapshot (10.21). I wound up noting my vault Id and deleting the stack. I'm going to rebuild and make sure I keep a backup. (I did this for the first year but everything was working so well, I stopped.) :)

@eyesnareinc
Copy link
Author

If you are getting this DB error then definitely You are using a IAM admin account not root. Dont forget to cleanup route53 . It just keeps piling up. Also make sure your stack name is "short" , seems to be there is some character limits for the resource names which takes this up in their naming conventions. am also having similar issue.

The Route 53 piling up of domain names hasn't happened for me in a long time. Regarding IAM and Root. I'm not sure why that would be an issue as long as you've given your IAM account enough privilege. I have always been told not work work in Root mode. I'll note the character limit for resource names. Thank you for your advice! :)

@eyesnareinc
Copy link
Author

eyesnareinc commented Mar 23, 2023

OK, it happened again with a NEW Stack. I just tried to update a new stack (moving it from offline to online). The stack updated seemed to stall in UPDATE_IN_PROGRESS (OK, it happened again with a NEW Stack. I just tried to update a new stack (moving it from offline to online). The stack updated seemed to stall in UPDATE_IN_PROGRESS (New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT30M) and eventually the updated failed at AppASG) and eventually the updated failed at AppASG.
HubsCloudStall_FAIL

@ikenichiro
Copy link

ikenichiro commented Mar 24, 2023

DB version warning is just a drift not a issue triggering rollback.. I guess Issue is only with APPASG.

Sorry, I was only providing the solution for the roll back failure.
In my opinion, however you would probably need to fix the Appdb problem to have the stack to roll back successfully, or probably after the issues with AppASG/StreamASG would be resolved to launch.
(Its probably a different issue, hence I made a new issue #6005 )
This is from my experience having the same issue, and solving it so I could safely make backups in offline mode or continue to try updates to the stack.

@standtech
Copy link

I faced db issue , role name character limit , region abbreviation issue etc etc .. but common which I couldn't get around was AppASG issue because you at-least need an access to the EC2 instances to check whats going on there , which you can't until you get admin console working , which never happens if fresh install itself fails . .. other symptoms has workaround. .. So I guess All of us are facing same issues because they are all same templates/AMIs ... Backup-restore would also fail in current scenario as it seems AMI are affected ( must be pulling some script on load) not DB or data on EFS...

@Hubs-Foundation Hubs-Foundation locked and limited conversation to collaborators May 4, 2023
@matthewbcool matthewbcool converted this issue into discussion #6063 May 4, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug needs triage For bugs that have not yet been assigned a fix priority
Projects
None yet
Development

No branches or pull requests

3 participants