Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Input Streaming #3992

Closed
jama22 opened this issue Jun 7, 2019 · 10 comments
Assignees
Projects

Comments

@jama22
Copy link
Member

@jama22 jama22 commented Jun 7, 2019

No description provided.

@jama22 jama22 created this issue from a note in Runtime (In Flight) Jun 7, 2019
@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 10, 2019

@cirocosta pointed out from the difficulties incurred by the rabbitmq team that streaming artifacts to a scheduled task takes a very long time. A large amount of inputs could lead to slow startup times, making it look like concourse scheduling is slow or lagging. We've addressed this in two ways; Gzip is a CPU heavy and also slower at compressing/decompressing resources for transfer so we switched it over to zstd #3880 and this where we will request resources to be streamed in all at once. This should mean the startup time is now limited to the time it takes for the largest resource being streamed..

@kcmannem kcmannem self-assigned this Jun 10, 2019
@ddadlani

This comment has been minimized.

Copy link
Contributor

@ddadlani ddadlani commented Jun 10, 2019

We could rate limit the number of parallel streams and expose that number as a command-line flag for configuration. We may need to test on prod/wings and have a benchmark to see the actual benefit that this provides.

We shouldn't merge this before #3819 which will reduce the cost of streaming, but we should benchmark both of these changes separately.

@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 18, 2019

running drills tests where we stream 6 heavy docker images to a task. Heres the findings...

In Serial In Parallel
7m 30s 3m 45s
7m 17s 1m 43s
4m 43s 2m 34s
7m 13s 3m 34s

Note:

  • I think we got really good volume locality on the second run so ignore this

Here are the network/cpu graphs of serial streams vs parallel streams, each hump relates to a build being run in the order of the table above.
Screen Shot 2019-06-18 at 12 42 01 PM
Screen Shot 2019-06-18 at 12 43 41 PM

Screen Shot 2019-06-18 at 12 40 14 PM
Screen Shot 2019-06-18 at 12 43 48 PM

cc @ddadlani

@cirocosta

This comment has been minimized.

Copy link
Member

@cirocosta cirocosta commented Jun 18, 2019

Thanks for sharing! Do you have web CPU usage as well? I'm curious to see if the way we're doing the streaming through the web nodes ends up being CPU consuming 🤔 Thanks!

@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 18, 2019

@cirocosta good thought, it does in fact consume cpu
Screen Shot 2019-06-18 at 2 10 43 PM

@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 19, 2019

We repeated the test again and saw the the total transmitted between the two deployments are the same. Conclding that even though the network peaks and sustains at different amounts, we are transmitting the same amount of data.

Screen Shot 2019-06-19 at 11 24 30 AM
Screen Shot 2019-06-19 at 11 24 23 AM

@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 19, 2019

we took a look at why the cpu spkies during streaming. It has to do with the tcp buffers.
Screen Shot 2019-06-19 at 2 41 56 PM

cc @cirocosta

@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 20, 2019

@vito we ran strabo with the changes and they're pretty good. We redployed hushhouse to test this. It has a 1 core cpu and when the streaming started, we maxed out the cpu but it didn't cause any errors and it was for a short amount of time. It's also important to note this is a very small web. We hit peak bandthwidth of 2Gibs.

Screen Shot 2019-06-19 at 5 53 36 PM

The initialization time for goes from 1h 20m to 4m

Screen Shot 2019-06-20 at 9 41 19 AM
Screen Shot 2019-06-20 at 9 40 24 AM

@kcmannem

This comment has been minimized.

Copy link
Member

@kcmannem kcmannem commented Jun 20, 2019

updated the helm deployed concourse to 3 cores to see how far we can push bandwidth. We get upto 3Gbps and only reach 60% cpu utilization. We didn't observer any other metrics be affected other than Go runtime's GC taking a 60ms pause afterwards. Thats also really negligible.

Screen Shot 2019-06-20 at 10 49 24 AM

@cirocosta

This comment has been minimized.

Copy link
Member

@cirocosta cirocosta commented Jun 20, 2019

Thanks for sharing, @kcmannem !

With those results, I'd say we should also update https://concourse-ci.org/concourse-web.html#web-properties to account for the fact that streaming of volumes is something that should be factored in the decision of scaling vertically the web nodes.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Runtime
Accepted
4 participants
You can’t perform that action at this time.