We see a lot of top_k operators in computations. Things like cc and bfs use them for graph computation, and several of the TPCH queries have similar "top blah" queries (e.g. Query 15).
It is not hard to implement top_k with group, but there are much more efficient implementations you could imagine. Or, there are optimizations to group that make sense for top_k and perhaps not as much sense for more general group computation. For example:
-
For each key, group reads out all of the value histories from the input and output traces. We could do this lazily, as each evaluation of the "user logic" only needs to pull record until they observe k-ish values.
-
By watching which values are observed, we can restrict the future times that need to be warned about. If we only observe the histories of the first three values before finding enough values, we needn't warn about times not found in those first three values. Times in all the other values (e.g. often later iterations, in computations like cc,bfs), can potentially be ignored.
-
When attending to new changes, we have the potential to swap in the output trace for values before the smallest value in the new changes. This needs to be double checked for correctness, but it seems that there is the ability to perform this substitutions safely in several cases.
-
In the same setting as above, we can advance the substituted values from the output trace up to the smallest value in the new changes, as the specific values of these values are not as important as the fact that they are less or equal to the new changes and not changing. This has the potential to reduce the number of moments of variation, as any changes among the smallest values collapse down to just the moment where the size of the set changes.
-
Having swapped in the output trace, advanced values, and advanced times, we can evaluate an interval by starting from the times in the input trace and evaluating the top_k logic at each of these, warning about the times that might be interesting in subsequent values, but only those values we need to explore to fill out the output. This restricts our attention to the moments where the count of elements in output changes, and only to the changes for values that would emerge. All of the updates to values greater than these, those that never appear in the output, are ignored.
All of this makes for a pretty sweet implementation, that should make top_k pretty effective. Right now TPCH's Query 15 is a real pain point for e.g. DBToaster, and this sort of implementation seems like it would clear up all sorts of issues, plus make bfs lots faster.
We see a lot of
top_koperators in computations. Things likeccandbfsuse them for graph computation, and several of the TPCH queries have similar "top blah" queries (e.g. Query 15).It is not hard to implement
top_kwithgroup, but there are much more efficient implementations you could imagine. Or, there are optimizations togroupthat make sense fortop_kand perhaps not as much sense for more generalgroupcomputation. For example:For each key,
groupreads out all of the value histories from the input and output traces. We could do this lazily, as each evaluation of the "user logic" only needs to pull record until they observe k-ish values.By watching which values are observed, we can restrict the future times that need to be warned about. If we only observe the histories of the first three values before finding enough values, we needn't warn about times not found in those first three values. Times in all the other values (e.g. often later iterations, in computations like cc,bfs), can potentially be ignored.
When attending to new changes, we have the potential to swap in the output trace for values before the smallest value in the new changes. This needs to be double checked for correctness, but it seems that there is the ability to perform this substitutions safely in several cases.
In the same setting as above, we can advance the substituted values from the output trace up to the smallest value in the new changes, as the specific values of these values are not as important as the fact that they are less or equal to the new changes and not changing. This has the potential to reduce the number of moments of variation, as any changes among the smallest values collapse down to just the moment where the size of the set changes.
Having swapped in the output trace, advanced values, and advanced times, we can evaluate an interval by starting from the times in the input trace and evaluating the
top_klogic at each of these, warning about the times that might be interesting in subsequent values, but only those values we need to explore to fill out the output. This restricts our attention to the moments where the count of elements in output changes, and only to the changes for values that would emerge. All of the updates to values greater than these, those that never appear in the output, are ignored.All of this makes for a pretty sweet implementation, that should make
top_kpretty effective. Right now TPCH's Query 15 is a real pain point for e.g. DBToaster, and this sort of implementation seems like it would clear up all sorts of issues, plus make bfs lots faster.