Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOSH Stemcells are very slow to download #94

Closed
xoebus opened this issue May 14, 2019 · 10 comments
Closed

BOSH Stemcells are very slow to download #94

xoebus opened this issue May 14, 2019 · 10 comments
Labels

Comments

@xoebus
Copy link

xoebus commented May 14, 2019

When downloading stemcells from https://bosh.io/stemcells I'm getting download speeds a few orders of magnitude slower than what my connection is capable of. I'm on wired office internet and I can download files from Google Cloud Storage at around 70 megabytes per second (~500 megabits per second).

I've graphed the download speed of a typical stemcell download session speed below.

stemcell_download_speed

The specific stemcell I downloaded was: https://s3.amazonaws.com/bosh-core-stemcells/315.13/bosh-stemcell-315.13-google-kvm-ubuntu-xenial-go_agent.tgz

As you can see this is always an order of magnitude slower than my connection and sometimes drops to 2 orders of magnitude slower. This is really frustrating as these files are so large.

Do you have any idea what could be causing this?

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/166018258

The labels on this github issue will be updated when the story is started.

@xoebus
Copy link
Author

xoebus commented Jun 10, 2019

I've got some actual numbers from GCP. I uploaded a stemcell to GCS and then measured downloading it from there and then from the original endpoint.

GCP

$ time curl https://storage.googleapis.com/xoebus-test/bosh-stemcell-315.34-warden-boshlite-ubuntu-xenial-go_agent.tgz -o gcp.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  492M  100  492M    0     0  56.8M      0  0:00:08  0:00:08 --:--:-- 68.9M

real	0m8.681s
user	0m0.966s
sys	0m1.397s

AWS

$ time curl https://s3.amazonaws.com/bosh-core-stemcells/315.34/bosh-stemcell-315.34-warden-boshlite-ubuntu-xenial-go_agent.tgz -o aws.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  492M  100  492M    0     0  2820k      0  0:02:58  0:02:58 --:--:-- 2992k

real	2m58.867s
user	0m2.028s
sys	0m4.893s

8 seconds vs. nearly 180 seconds makes it seem like there's something fishy with the bucket setup.

@xoebus
Copy link
Author

xoebus commented Jul 18, 2019

It just took me nearly 2 hours to download a BOSH-lite stemcell. 😭

Can I help you with any work to improve this? I expect we're wasting a tonne of time across the org. waiting for stemcell downloads (on workstations and in pipelines).

@dpb587-pivotal
Copy link
Contributor

At one point some downloads were being fronted by CloudFront CDN; not sure if/when/why that may have changed. I think there was some code to recognize and convert supported buckets to the CDN. Let me know if I can help.

@christarazi
Copy link
Contributor

christarazi commented Jul 24, 2019

We enabled transfer acceleration from S3 a while ago, but it seems that you have to do some more leg work to actually use it. We are looking into that today.

Update:

It looks like we will have to change all the URLs to the accelerated endpoint.

@xoebus
Copy link
Author

xoebus commented Jul 24, 2019

I also did a little experiment to mirror all of the stemcells into a GCS bucket using their storage transfer service. It was able to mirror everything (500GB) in around two minutes. This makes it seem like it might be an issue with our office (the peering between AWS and our ISP?) rather than the bucket itself?

@christarazi
Copy link
Contributor

christarazi commented Jul 24, 2019

Interesting. That might be another issue.

We were able to assert that turning on the acceleration brings the downloads closer to GCP levels. Here are the links:

Fast

time curl https://bosh-core-stemcells.s3-accelerate.amazonaws.com/315.34/bosh-stemcell-315.34-warden-boshlite-ubuntu-xenial-go_agent.tgz -o aws.tar.gz

Slow

time curl https://s3.amazonaws.com/bosh-core-stemcells/315.34/bosh-stemcell-315.34-warden-boshlite-ubuntu-xenial-go_agent.tgz -o aws.tar.gz

So for now, you can use the "fast" link as a template for future downloads as we figure out how to change all the URLs.

@xoebus
Copy link
Author

xoebus commented Jul 26, 2019

Just tried it out and it works great - thank you!

Want me to leave this open until the https://bosh.io change is made or should I close it?

@christarazi
Copy link
Contributor

We can leave it open.

@cjnosal
Copy link
Contributor

cjnosal commented Nov 14, 2019

Closing - stemcells were moved to accelerated buckets in https://www.pivotaltracker.com/story/show/168994545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants