You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
my name is Omar and this summer I am doing a internship at VMWare with @ryzhyk and @mbudiu-vmw.
My project is pretty general right now: get better parallel worker scaling for differential datalog programs (I don't have concrete numbers yet on how ddlog programs scale).
I have spend the last couple weeks poking around the differential-dataflow and timely-dataflow repositories seeing how things fit together and reading relevant blogs: https://github.com/frankmcsherry/blog. There is a lot of information so I can't say I have absorbed it all.
I was hoping to get some advice or thoughts on this project. It seems that batch size and timestamp granularity have subtle interplay with latency and throughput. As I see it, there is two general ways to approach the project:
Top-down starting at ddlog: Understand what kind of workflows we're interested in achieving better scaling out of and profile them. Then tune the number of workers, timestamp granularity, operators, etc, to best exploit differentail-dataflow's parallelism. Perhaps there will be common cases among the ddlog programs that I can optimize the differential or timely implementation for.
Working at lower layers: Work at the timely or differential level and try to achieve better parallelism by profiling timely/differential programs respectively exhibiting poor scaling. I don't understand a lot of the timely internals so it is currently unclear to me how feasible this approach is. And even if it is, it may be the case that the workflows we're interested in may not see much speed up from these changes. Any thoughts on possible bottlenecks or places where we could hope optimize the execution for better parallel scaling?
Thank you for any feedback or thoughts.
The text was updated successfully, but these errors were encountered:
Hello,
my name is Omar and this summer I am doing a internship at VMWare with @ryzhyk and @mbudiu-vmw.
My project is pretty general right now: get better parallel worker scaling for differential datalog programs (I don't have concrete numbers yet on how ddlog programs scale).
I have spend the last couple weeks poking around the differential-dataflow and timely-dataflow repositories seeing how things fit together and reading relevant blogs: https://github.com/frankmcsherry/blog. There is a lot of information so I can't say I have absorbed it all.
I was hoping to get some advice or thoughts on this project. It seems that batch size and timestamp granularity have subtle interplay with latency and throughput. As I see it, there is two general ways to approach the project:
Top-down starting at ddlog: Understand what kind of workflows we're interested in achieving better scaling out of and profile them. Then tune the number of workers, timestamp granularity, operators, etc, to best exploit differentail-dataflow's parallelism. Perhaps there will be common cases among the ddlog programs that I can optimize the differential or timely implementation for.
Working at lower layers: Work at the timely or differential level and try to achieve better parallelism by profiling timely/differential programs respectively exhibiting poor scaling. I don't understand a lot of the timely internals so it is currently unclear to me how feasible this approach is. And even if it is, it may be the case that the workflows we're interested in may not see much speed up from these changes. Any thoughts on possible bottlenecks or places where we could hope optimize the execution for better parallel scaling?
Thank you for any feedback or thoughts.
The text was updated successfully, but these errors were encountered: