Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: add batch mode #108

Open
mgolosova opened this issue Feb 5, 2018 · 0 comments
Open

Dataflow: add batch mode #108

mgolosova opened this issue Feb 5, 2018 · 0 comments

Comments

@mgolosova
Copy link
Collaborator

Batch mode allows to process more than one message at once.
It is essential for Sink Connector stages (like Stage 060 or Stage 069), as most of storages support bulk data load -- and it makes things work much faster, than as if we were uploading one record at a time.
Also it might be useful for processing stages (see PR #84).

As currently we use common library only for Processing stages, not connectors, we can add batch mode for them (and later, possibly, reuse for Sink Connectors).

What is needed:

  • add batch processing machinery to pyDKB library.
    Parameters:
    • batch processing on/off;
    • max batch size;
    • max batch time (how long to wait before starting the processing for less then max_batch_size messages);
  • make it reusable (keep Sink Connectors in mind);
  • make it optional (so that if batch mode is not supported (like in already developed stages), it wouldn`t break anything; at most, produce a warning message on startup);
  • (possibly) update existing stages (if it is still necessary in spite of the prev. item);
  • implement for Stage 95 for testing;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant