New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
App uploads fail intermittently when GCS is the blobstore #82
Comments
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/155925525 The labels on this github issue will be updated when the story is started. |
Possibly relevant error log output:
|
We are hitting this as well. Pushing an app that is 2.9M, 271 files failed 22 out of 74 attempts. Unfortunately, I don't think CAPI currently provides a way to set this timeout for GCS. The dependency chain of gems is fog -> fog-google -> google-api-ruby-client -> httpclient. The default The interesting part is when we manually removed the Timeout check by changing Ruby files on the CC VM, everything passed. We're running more tests now to determine how changing the timeout value affects our success rate. The mystery is still why it takes > 60 seconds to upload a 3MB app... |
We just re-ran our tests with the default of 60 seconds which failed 7/10 times. Hacking the gem to bump timeout to 5 minutes failed 3/10 times. Removing the timeout entirely passed all 10 times. Sample size is maybe too small to tell for sure but there does seem to be a trend. |
To clarify, we are only seeing this issue when configuring GCS using a service account key, which uses the code-path that @ljfranklin mentioned above. When we use an interop key, which uses the fog-aws gem, it seems to work as expected. |
I'm hitting the same issue, using the use-gcs-blobstore.yml & use-gcs-blobstore-service-account.yml ops files. Am going to try it in the morning with the interop key and i'll report back. |
I can confirm that removing the |
I am hitting the same issues with azure as well. Are there any fix or possible workaround? |
After we hit the issue in Azure we tried this with AWS as well and also hit this issue. |
Issue is fixed from azure. Microsoft provided a fix already |
We have recently:
Are y'all still seeing issues? Or can we close this? Thanks, |
We are still seeing this issue on 2.4.6 of PAS for PCF. Has a fix for this been put in at all? |
We have recently bumped the |
Issue
The Zipkin CATs suite regularly fails when run against an environment using a GCS blobstore. This appears to be because these tests are pushing a realistically sized Java Spring application, rather than a tiny test app.
We've now turned off the Zipkin tests in the environment experiencing the issue, but an example of the output is available here: https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-deployment/jobs/fresh-cats/builds/970
Steps to Reproduce
use-gcs-blobstore.yml
oruse-gcs-blobstore-service-account.yml
CONFIG=~/workspace/my-env/integration_config.json ./bin/test -focus "Zipkin" -untilItFails
Optional step 4:
parallel cf push bubble-dog-{} ::: A B C D E F G H I J K L M N O P
in an app directory containing a bit less than 1G of dog gifs
Note: This was tested on an environment with a bunch more customizations, which we assume aren't relevant but haven't confirmed.
Expected result
Test continues to pass indefinitely, or until the load balancer throws a 502
Current result
Within ~5 runs, we see the test fail with the following error:
If you're running the optional step #4 you are likely to see this failure immediately.
Possible Fix
We might just need to increase the blobstore timeout when using GCS.
Google recommends retrying with truncated exponential backoff in response to GCS errors
The text was updated successfully, but these errors were encountered: