Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Scaling for Parallel Workers #273

Open
gatoWololo opened this issue Jun 17, 2020 · 0 comments
Open

Better Scaling for Parallel Workers #273

gatoWololo opened this issue Jun 17, 2020 · 0 comments

Comments

@gatoWololo
Copy link

Hello,
my name is Omar and this summer I am doing a internship at VMWare with @ryzhyk and @mbudiu-vmw.

My project is pretty general right now: get better parallel worker scaling for differential datalog programs (I don't have concrete numbers yet on how ddlog programs scale).

I have spend the last couple weeks poking around the differential-dataflow and timely-dataflow repositories seeing how things fit together and reading relevant blogs: https://github.com/frankmcsherry/blog. There is a lot of information so I can't say I have absorbed it all.

I was hoping to get some advice or thoughts on this project. It seems that batch size and timestamp granularity have subtle interplay with latency and throughput. As I see it, there is two general ways to approach the project:

  1. Top-down starting at ddlog: Understand what kind of workflows we're interested in achieving better scaling out of and profile them. Then tune the number of workers, timestamp granularity, operators, etc, to best exploit differentail-dataflow's parallelism. Perhaps there will be common cases among the ddlog programs that I can optimize the differential or timely implementation for.

  2. Working at lower layers: Work at the timely or differential level and try to achieve better parallelism by profiling timely/differential programs respectively exhibiting poor scaling. I don't understand a lot of the timely internals so it is currently unclear to me how feasible this approach is. And even if it is, it may be the case that the workflows we're interested in may not see much speed up from these changes. Any thoughts on possible bottlenecks or places where we could hope optimize the execution for better parallel scaling?

Thank you for any feedback or thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant