Conversation
tommyblue
left a comment
There was a problem hiding this comment.
LGTM, I added a suggestion about InitialPosition as we found a good strategy to deal with it
| using [vmware-go-kcl](https://github.com/vmware/vmware-go-kcl), an | ||
| implementation of the KCL (Kinesis Client Library). | ||
| KCL provides a way to to process a single Kinesis stream from multiple Baker | ||
| instances, each instances reading a specific set of shards. |
There was a problem hiding this comment.
| instances, each instances reading a specific set of shards. | |
| instances, each instances reading from a max number of shards. |
There was a problem hiding this comment.
I rephrased but didn't use max since the default is 32767, in other words, there are not limits.
|
|
||
| You can choose the initial position to read from when the application starts | ||
| (that is when the dynamodb table doesn't exist) by setting InitialPosition | ||
| either to LATEST or TRIM_HORIZON. |
There was a problem hiding this comment.
I'd add a suggestion for the value to use (rephrase as you like the most):
Note that those values are used only when a checkpoint doesn't exist in DynamoDB for the shard. If the checkpoint exists, its value is always used first.
How to choose the correct InitialPosition value
When chosing the InitialPosition value, have in mind that TRIM_HORIZON can be useful when a resharding happens. In that case, in fact, before baker starts reading the new shards, some records could have already been published there. In this case using LATEST leads to have those records ignored. On the other hand, using TRIM_HORIZON means that, the first time that baker starts reading an existing stream, all the records hold in the shards must be read (data retention period depends on the stream configuration and is variable between 24 hours to 7 days). If reading all the existing records is a problem, then a possible strategy is to start baker the first time with LATEST (to create the first checkpoint) and then switch to TRIM_HORIZON
There was a problem hiding this comment.
yes good addition, I rephrased a bit to not give too much information here and remain high-level.
Please have a look
Co-authored-by: Tommaso Visconti <tommaso.visconti@adroll.com>
Co-authored-by: Tommaso Visconti <tommaso.visconti@adroll.com>
❓ What
Document technical decisions, parameters and high-level implementation of the KCL Baker input.
🔨 How to test
✅ Checklists
This section contains a list of checklists for common uses, please delete the checklists that are useless for your current use case (or add another checklist if your use case isn't covered yet).
make gofmt-writebeen run on the code?make govetbeen run on the code? Has the code been fixed accordingly to the output?