New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment fails with Concurrent::RejectedExecutionError #168
Comments
@Raniz85 we have a fix for this bug in the next release. But there are solutions to this. Can you tell me a little more about the time its happening and may be paste some agent logs ,before this happens . |
Here are the logs from the instance: cd-agent-crash.log Note that we restart the agent 2 minutes after each deployment as a workaround to #32. |
@Raniz85 so here is my hypothesis, When the agent restarts and starts exiting, it should not accept any new poll requests,but it does ,because of a bug in the thread synchronization of the agent, for which the fix has been added. ef65652 . But will be released in the next release. But the fix for this for now, is that if you can wait for the agent to restart properly , may be add a wait and then start a deployment. I know its not ideal , but this can work as a temporary workaround. |
I'm not sure I understand the proposed workaround. Should we ensure that we're never deploying when an agent is restarting? That's not an easy workaround in an environment with more than one or two servers. When can we expect the next release? |
@Raniz85 I totally understand that, and its happening due to a race condition. But another workaround could be you just add a wait after the restart like a sleep for some seconds and then start deploying, we are still working out some issues for next release, but will post once we start releasing the next version. |
I still don't understand what you want me to do. We have the agent running on about 100 instances and deployments are automated via Jenkins. Do you suggest what we add a hook in Jenkins that ensures that no agent has restarted on any of the 100 instances within the last 10 seconds (or so) before starting the deployment? |
@Raniz85 which region are the hosts in ? |
eu-west-1 mostly though some are in cn-north-1 and us-east-2. |
so in eu-west-1 we have released the new version of the agent , which should fix this issue. Can you try that. 1518 |
Hi @rohkat-aws we've the same problem with the agent code-deploy We use the region eu-west-1 and we've to tried update agent(agent_version: OFFICIAL_1.0-1.1518_rpm) to the new version and we got these message:
Can you confirm that the new version is released? Regards, |
Hi @rohkat-aws when you say |
We're seeing this issue as well. Has the fixed been released to the us-east-1 codedeploy install s3 bucket? I do not understand the proposed workaround from @rohkat-aws either. |
@SupportNubersia @woodhull 1.0-1.1518_rpm is the new version. Are you still seeing issues ? |
We've so far only seen this issue on fresh instance boot, so we're rebaking our AMIs in the hope that we get a fresher version of the codedeploy agent. We'll let you know one that is complete, we roll the AMI out, and can test. |
Hi @rohkat-aws The version agent_version: OFFICIAL_1.0-1.1518_rpm that exists in my AMIs was created more than a month ago. Can you confirm that the new release of code-agent is 1.0-1.1518? |
To install the codedeploy-agent we used the official process. |
@SupportNubersia @woodhull yes the OFFICIAL_1.0-1.1518_rpm should fix this. |
Hi @rohkat-aws Please we need to solve it ASAP. |
@SupportNubersia Are you seeing the issue even with the latest version of the agent ? is it possible the Ami is pre-baked with an old version of the agent. And when you say it happens during Scale up, is it because the agent is being restarted in the launch config after the install? |
@rohkat-aws the last version that our AMI has prepared a month ago is the version: OFFICIAL_1.0-1.1518_rpm and we get the same error. |
the 1518 version was released a week back. Not a month back @SupportNubersia. Having said that, can you please look into /tmp/codedeploy-agent.update.log and confirm. If it's not being updated. |
Hi @rohkat-aws we can see that the error is that when the instances launch in fisrt time it execute a update and kill our deployment. Now we're updating the AMI with latest codedeploy-agent version and we will try again. How we can disabled this automatic update in first execution? |
@SupportNubersia Did that work? |
We think that this is now fixed for us after baking a new AMI with the latest codedeploy agent version preinstalled. |
@rohkat-aws its working good. But in the next code-deploy update, The problem will occur aggain... |
no it should not @SupportNubersia this version fixes that. and this is the commit which does that. |
@Raniz85 if you can also confirm, I can close the issue. |
Ok @rohkat-aws perfect! Thanks for your help |
We're preparing to upgrade our AMIs, but haven't gotten there yet. I probably won't have time tomorrow and then I'm on vacation next week, so I'll get back to you after that |
Updating our AMIs with the newest CodeDeploy agent fixed the issue for us. |
@Raniz85 i think we can re open this if you still have issues , Closing this for now |
I'm testing the new agents now and it seems to be working, but my issue is mostly that something changed internally on the AWS side that caused this. I bake AMIs for some applications specifically to not get new versions of the agents frequently. I have apps that are using agent versions from 4 months ago that suddenly started having problems. |
@jgerry We delete the crontab for the autoupdate feature when baking the AMI. A couple of our systems weren't doing this and started seeing this error when the agent was updating itself to the latest version mid-deploy. I.e., the old version was failing to shutdown mid-deploy triggered by the autoupdate feature. |
@rohkat-aws unfortunately, we too are running into this issue about half of the time with a simple scale-up from one to two servers. We have the latest OFFICIAL_1.0-1.1518_rpm on us-east-1. On autoscale, we push out two repos concurrently that are relatively small. They do, however, call out to composer to pull in external packages. Our solution likely for now is going to be to just bake everything into AMIs, which is cumbersome :/. Is there any ETA on a fix? |
@falcor781 is an update happening or the agent that was used was already 1518 . Can you pls also check your update logs |
The agent that was used was already 1518 starting from the AMI. I'm not sure which 'update logs' you are referring to as it is starting from an AMI snapshot; nothing should be getting updated. |
@falcor781 just to confirm you are getting Concurrent::RejectedExecutionError |
We've seen similar failures on multiple servers, they seem to happen randomly.
Here's the log message:
The redacted base64 string decodes into:
The text was updated successfully, but these errors were encountered: