Conversation
| * @param sinkContext | ||
| * @throws Exception IO type exceptions when opening a connector | ||
| */ | ||
| void open(final Map<String, Object> config, SinkContext sinkContext) throws Exception; |
There was a problem hiding this comment.
Should this be a WindowedSinkContext?
There was a problem hiding this comment.
Not sure if we need a WindowedSinkContext wrapping around here. I can't think of any new methods other than whats already present in SinkContext to put in there.
|
@srkukarni can we have an integration test for this? |
| protected Long ram; | ||
| @Parameter(names = "--disk", description = "The disk (in bytes) that need to be allocated per sink instance (applicable only to Docker runtime)") | ||
| protected Long disk; | ||
| @Parameter(names = "--window-length-count", description = "The number of messages per window") |
There was a problem hiding this comment.
We have too many CLI args. We need to clean them up as some point. Only expose the basic CLI args and for the advanced ones just allow users to specify in a function config yaml file
| } | ||
| windowConfig.setWindowLengthDurationMs(windowLengthDurationMs); | ||
| } | ||
| if (null != slidingIntervalCount) { |
There was a problem hiding this comment.
I don't think we should have sliding windows for batched sink. What would be the use case for that?
There was a problem hiding this comment.
trying to make one up. think about a influxdb type sink where you want to write the average of some value over the last few seconds on a sliding basis?
There was a problem hiding this comment.
We are introducing a batched sink not a windowed sink thus I don't think the batched sink should have the same semantics and configs as a windowed function. This will be very confusing to users. To start with we should just start with a batchSize or batchTime configs and be distinct from windowing configs.
There was a problem hiding this comment.
I have two high level comments regarding this feature.
- We are introducing a brand new interface for writing a batch of records. It will confuse people about the interface.
- We are implementing a "batching" logic in the runtime instead of outside of the runtime.
This approach seems too heavy to me. Instead, can't we just implement the logic outside and have an abstract implementation BatchSink that implements the windowing and batch logic? because I have seen many sink implementations are implementing similar batching logic, we can just abstract those batching logic into one implementation and reuse the same logic across different sinks.
abstract class BatchSink implement Sink {
void write(Record record) {
// implement the batching logic
}
abstract void write(Collection<Record<T>> records) throws Exception.
}
|
I agree with @sijie comment. @srkukarni can we implement the BatchedSink as an external library instead of having to support yet another interface in the function backend code? |
|
@sijie @jerrypeng that makes sense. I will try to reforumulate this pr. meanwhile, have remove the 2.3 tag from it so that the release can proceed without it. |
|
@srkukarni should be abandon this and start a new repo for functions extensions ? |
|
@srkukarni:Thanks for your contribution. For this PR, do we need to update docs? |
|
Closed a stale. The codebase has evolved quite a lot from then on. |
Motivation
Often times sinks want to sink a collection of messages instead of one message at a time. An example might be writing to a snowflake database where it is far more efficient to write bunch of records at one time as opposed to one at a time. For these types of sinks, the current one record per invocation sink interface is too low level.
This pr introduces the BatchedSink interface. This interface exposes a write method that supplies a collection of records(as opposed to one record in the Sink interface). Users can program the parameters of this collection using the WindowConfig parameter inside the SinkConfig.
Modifications
We introduce BatchedSink interface in pulsar-io/core package.
We also expose a WindowConfig parameter inside the SinkConfig so users can program the batch sizes/intervals.
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If
yeswas chosen, please highlight the changesDocumentation