Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for time-bucketing in fixed size tasks #291

Closed
divergentdave opened this issue Jul 11, 2023 · 2 comments
Closed

Support for time-bucketing in fixed size tasks #291

divergentdave opened this issue Jul 11, 2023 · 2 comments
Assignees

Comments

@divergentdave
Copy link
Contributor

divviup/janus#1574 proposes adding a new task configuration parameter that is implementation-specific, and only applicable when a task's query type is fixed_size. This means we will be changing the aggregator API by adding a second field to the QueryType::FixedSize enum variant. The field will be of type Option<Duration>. (When absent, fixed size tasks will function as normal, and when present, fixed size tasks will group reports by time, with the given granularity, and assign them to batches arbitrarily within those groups) It is only of relevance to the leader as well, since the leader is responsible for assembling fixed size batches. If the leader and helper somehow disagreed about this parameter, the leader's view of it would win, and there wouldn't be any other ill effects.

This will affect divviup-api directly because of the API change (though I think I can make this backwards-compatible with #[serde(default)]) but it will also break an assumption of the database schema. Currently, the max_batch_size column stores both the task's query type (encoded through nullability) and the maximum batch size, if not null. After this change, there will be a second query type-specific parameter. We could add a new nullable integer column for the time bucketing duration, and let max_batch_size's nullability still indicate the task's query type, but it might be more robust to store the query type separately.

Looking further out, we will also need to be ready for other DAP implementations to not support this parameter, since it's implementation-specific. It's possible such an aggregator may support the aggregator API, but either return an error if the time bucketing parameter is present, or ignore the parameter entirely and successfully create the task.

@branlwyd
Copy link
Member

Regarding other aggregator implementations: I think this should effectively be a parameter that is only available to the user if Divvi Up is the Leader. We shouldn't expect other aggregators to support our implementation's features over-and-above what is required for DAP.

@jbr
Copy link
Contributor

jbr commented Jul 12, 2023

This seems like the sort of thing that could be addressed with an initial capabilities check on Aggregator creation, so aggregators could report a list of supported query types and vdafs (as well as confirming that the bearer token works and fetching the dap url and possibly even the allowed server role/s). As long as we're going to build out "some aggregators support different options" we might as well build it in a way that is extensible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants