Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build appears to succeed but is reported as a failure #4297

Closed
clippermadness opened this issue Apr 27, 2018 · 17 comments
Closed

Build appears to succeed but is reported as a failure #4297

clippermadness opened this issue Apr 27, 2018 · 17 comments

Comments

@clippermadness
Copy link

https://app.shippable.com/bitbucket/thetalake/portal/runs/1099/1/console
https://app.shippable.com/download/jobConsoles?jobId=5ae27395a74b0e0800aa7a07

This project was using a specific build image from Shippable and was succeeding:
pre_ci_boot:
image_name: drydock/u16ruball
image_tag: v6.1.4

When I removed that section, in an effort to get my builds to run faster and use the default image on my node, which is v6.3.4, now it fails.

But every step of the build that I can see succeeded. What's happening?

@manishas
Copy link
Contributor

@clippermadness we're investigating this.

@clippermadness
Copy link
Author

@manishas Any update on this?

@trriplejay
Copy link

@clippermadness no update yet. it seems like something is happening in our script handler that is causing it to exit without marking successful, even though all steps have succeeded. How often does this happen? is it pretty regular?

next step might be for us to try to analyze the node right after this occurs, so maybe you could update here the next time you see this happen?

@clippermadness
Copy link
Author

@trriplejay This happens every time in this project if I remove the pre_ci_boot section of shippable.yml. It is not intermittent.

@clippermadness
Copy link
Author

@manishas @trriplejay ping :)

@trriplejay
Copy link

We haven't been able to find the root cause yet, but we did release v6.4.4. You could try changing your runtime version to this and see if it avoids the error.

The other workaround for now could be to just change your runtime version back to 6.1.4, since you know that version works. Then you at least wouldn't have to wait to pull the build image.

I notice that you're using ruby version 2.4.1. This version used to be available directly in our older images, but our 6.1.4 through 6.4.4 images have 2.4.3 instead, so if you were to specify this in your yml, you could avoid the ruby download/install that is happening now in your build. Perhaps this change would also avoid the strange failures. If you specifically need 2.4.1, then you could consider changing your runtime version to 5.8.2, which has ruby 2.4.1 pre-installed.

let me know if any of these suggestions work for you. We haven't been able to reproduce the error ourselves yet, so it's hard to say when we'll have more information. Thanks for your patience!

@clippermadness
Copy link
Author

Ok cool - I'll give those ideas a shot and see if they work.

@clippermadness
Copy link
Author

Ok I followed these steps and it's still failing. The only difference that I can find between the the logs of a build that succeeds and one that fails is as follows:

Successful with pre_ci_boot:
https://app.shippable.com/bitbucket/thetalake/portal/runs/1162/1/console

Booting up CEXEC
Running CEXEC script
sudo docker rm -fv $CONTAINER_NAME c.exec.portal.1160.1

Failure without:
https://app.shippable.com/bitbucket/thetalake/portal/runs/1160/1/console

Booting up CEXEC
Running CEXEC script
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/5c0461c2-d85c-486e-987a-3f9e129b2bd4.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/5c0461c2-d85c-486e-987a-3f9e129b2bd4.sh'
Exception Invalid or no script tags received
sudo docker rm -fv $CONTAINER_NAME c.exec.portal.1162.1

@trriplejay
Copy link

Have you tried setting your runtime version back to 6.1.4? since that image version seems to work that might be the best way to go to avoid pulling.

That error you mention is definitely related. Normally our script handler sets a flag once all commands have completed successfully to indicate the overall success of the job. That's the "script tag" that the error is referring to. For some reason, the tag isn't being set in this case, even though everything is working exactly as normal. I'm still unable to reproduce, but am continuing to investigate.

@clippermadness
Copy link
Author

A couple more notes of this.

Switching back to 6.1.4 definitely works.

I also tried changing the underlying node in our subscription. We have been using a 14.04 node, but I changed that to 16.04. That didn't work with either 6.4.4 or 6.3.4: same error.

So at this point, this project builds using the 16.04 6.4.4 node with a pre_ci_boot section in shippable.yml that pulls the older 6.1.4 image.

If I get rid of the pre_ci_boot section and attempt to build on the 6.4.4 image, the build always fails with the above description.

@Bit-Doctor
Copy link

While upgrading to runtime 6.5.4 we noticed the same issue.
https://app.shippable.com/github/thestorefront/tsf-api/runs/5331/1/console
The Console tab doesn't show any problem while downloaded logs print:

Booting up CEXEC
Running CEXEC script
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/f056c1bb-3dc4-4a47-8e97-63bf3415f385.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/f056c1bb-3dc4-4a47-8e97-63bf3415f385.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/c2349b56-a431-4fe6-9cbf-53f367adc770.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/c2349b56-a431-4fe6-9cbf-53f367adc770.sh'
Exception Invalid or no script tags received

Also bumping to this version I had to add apt-get install libcurl4-openssl-dev in order to get libcurl.
Another thing I noticed is when rebuilding failed runs we get an empty git_sync step and no build_ci.
https://app.shippable.com/github/thestorefront/tsf-api/runs/5335/1/console

~/src/github.com/thestorefront/tsf-api ~
fatal: Not a git repository (or any of the parent directories): .git

We have cache enabled and resetting it properly run every steps.

@aurelien-reeves
Copy link

We have issue with the cache too.
But nothing appears to be successful. The build_ci is even not executed, the process fail at "git_sync" step with message "this is not a git repo".

@manishas
Copy link
Contributor

We’re working on fixing this. @rageshkrishna @ric03uec can look into the issue reported for git sync and build_ci not being executed...

@manishas
Copy link
Contributor

manishas commented Aug 2, 2018

Ping @ric03uec

@ric03uec
Copy link

ric03uec commented Aug 2, 2018

@clippermadness @Bit-Doctor this has been fixed and will be available in the next release sometime early next week.
This error is happening because of an underlying bug in rvm(rvm/rvm#4416) that was closed recently. The bug was resetting the bash TRAPs in a few of the shippable scripts that are essential for their successful execution. without the TRAP functions, the cleanup functions were not getting called which resulted in failed builds without any actual errors.

We still haven't been able to test the rvm fix successfully(rvm/rvm#4416 (comment)) so we've added some custom logic to get around this issue which should fix the builds that're failing for you.

I'll keep this issue open till we do the release and you can verify everything is good at your end.

@manishas manishas added approved and removed analyze labels Aug 2, 2018
@clippermadness
Copy link
Author

Fix verified using Shippable base image 6.7.4, ruby 2.4.1 and rails 5.2.0.
Build times now 6m faster without having to pull the old image.
Thanks!

@ric03uec
Copy link

ric03uec commented Aug 8, 2018

this is now fixed, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants