Better Scaling for Parallel Workers #273

gatoWololo · 2020-06-17T21:52:37Z

Hello,
my name is Omar and this summer I am doing a internship at VMWare with @ryzhyk and @mbudiu-vmw.

My project is pretty general right now: get better parallel worker scaling for differential datalog programs (I don't have concrete numbers yet on how ddlog programs scale).

I have spend the last couple weeks poking around the differential-dataflow and timely-dataflow repositories seeing how things fit together and reading relevant blogs: https://github.com/frankmcsherry/blog. There is a lot of information so I can't say I have absorbed it all.

I was hoping to get some advice or thoughts on this project. It seems that batch size and timestamp granularity have subtle interplay with latency and throughput. As I see it, there is two general ways to approach the project:

Top-down starting at ddlog: Understand what kind of workflows we're interested in achieving better scaling out of and profile them. Then tune the number of workers, timestamp granularity, operators, etc, to best exploit differentail-dataflow's parallelism. Perhaps there will be common cases among the ddlog programs that I can optimize the differential or timely implementation for.
Working at lower layers: Work at the timely or differential level and try to achieve better parallelism by profiling timely/differential programs respectively exhibiting poor scaling. I don't understand a lot of the timely internals so it is currently unclear to me how feasible this approach is. And even if it is, it may be the case that the workflows we're interested in may not see much speed up from these changes. Any thoughts on possible bottlenecks or places where we could hope optimize the execution for better parallel scaling?

Thank you for any feedback or thoughts.

gatoWololo mentioned this issue Jun 19, 2020

CRDT Slower as workers are added. frankmcsherry/dynamic-datalog#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Scaling for Parallel Workers #273

Better Scaling for Parallel Workers #273

gatoWololo commented Jun 17, 2020

Better Scaling for Parallel Workers #273

Better Scaling for Parallel Workers #273

Comments

gatoWololo commented Jun 17, 2020