Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS private instances intermittently fail to terminate agents when idle #397

Closed
kevindixon opened this issue Feb 9, 2015 · 4 comments

Comments

@kevindixon
Copy link

commented Feb 9, 2015

Intermittently, EC2 test agents are failing to be be stopped - its not unusual on my private instance to have 4 or 5 instances that have not been stopped over the period of a week. These instances have been idle for long period of time (8 hours+).
When this happens, the instances are not visible in /getTesters.php and the error log includes this kind of line:

20:00:02 - EC2:Launching EC2 instance. Region: eu-west-1, AMI: ami-d0c76fa7, error: 
The instance ID 'i-XXXXXXXX' does not exist

So far, I have observed this in agents running on us-east-1 and eu-west-1.
Perhaps the re-try mechanism for terminating agents isn't quite working as expected?

Full history of this issue is here: http://www.webpagetest.org/forums/showthread.php?tid=13465.

@pmeenan

This comment has been minimized.

Copy link
Contributor

commented Feb 9, 2015

I think I know what may be going on. The running instance detection code relies on the tags that identify the instances as WPT instances and which locations they are tied to. If the instances start but fail tagging for some reason it would be possible for them to become orphaned (though they really should still be connecting in unless they are completely dead).

I'll change the logic to also include any known AMI ID's and backfill the location information automatically which should catch all running instances even if they fail to get tagged.

Should have something ready later this afternoon.

@pmeenan

This comment has been minimized.

Copy link
Contributor

commented Feb 9, 2015

Fingers crossed this change will fix the issue: 111ea78

If you're running the server AMI it should pick up the change within the next hour.

@kevindixon

This comment has been minimized.

Copy link
Author

commented Feb 10, 2015

IIRC the orphaned agents did indeed have a blank tagging field in the EC2 console, so I'm hopeful this will fix it!

@kevindixon

This comment has been minimized.

Copy link
Author

commented Mar 2, 2015

Closing this now - I've been monitoring EC2 agent instances since this fix was made, and even with 100s of tests overnight for the past month or so, I haven't seen any orphaned instances (where before there were at least a couple a week).

@kevindixon kevindixon closed this Mar 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.