Skip to content

README: document KCL technical decisions/params#59

Merged
arl merged 7 commits intomainfrom
doc/kcl-input
Oct 13, 2020
Merged

README: document KCL technical decisions/params#59
arl merged 7 commits intomainfrom
doc/kcl-input

Conversation

@arl
Copy link
Copy Markdown
Collaborator

@arl arl commented Oct 13, 2020

❓ What

Document technical decisions, parameters and high-level implementation of the KCL Baker input.

🔨 How to test

  1. List all steps necessary;
  2. To test this pull request.

✅ Checklists

This section contains a list of checklists for common uses, please delete the checklists that are useless for your current use case (or add another checklist if your use case isn't covered yet).

  • Is there unit/integration test coverage for all new and/or changed functionality added in this PR?
  • Have the changes in this PR been functionally tested?
  • Has make gofmt-write been run on the code?
  • Has make govet been run on the code? Has the code been fixed accordingly to the output?
  • Have the changes been added to the CHANGELOG.md file?
  • Have the steps in CONTRIBUTING.md been followed to update a Go module?

@arl arl requested a review from tommyblue October 13, 2020 13:40
Copy link
Copy Markdown
Contributor

@tommyblue tommyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I added a suggestion about InitialPosition as we found a good strategy to deal with it

Comment thread README.md Outdated
using [vmware-go-kcl](https://github.com/vmware/vmware-go-kcl), an
implementation of the KCL (Kinesis Client Library).
KCL provides a way to to process a single Kinesis stream from multiple Baker
instances, each instances reading a specific set of shards.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instances, each instances reading a specific set of shards.
instances, each instances reading from a max number of shards.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased but didn't use max since the default is 32767, in other words, there are not limits.

Comment thread README.md Outdated
Comment thread README.md Outdated

You can choose the initial position to read from when the application starts
(that is when the dynamodb table doesn't exist) by setting InitialPosition
either to LATEST or TRIM_HORIZON.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a suggestion for the value to use (rephrase as you like the most):

Note that those values are used only when a checkpoint doesn't exist in DynamoDB for the shard. If the checkpoint exists, its value is always used first.

How to choose the correct InitialPosition value

When chosing the InitialPosition value, have in mind that TRIM_HORIZON can be useful when a resharding happens. In that case, in fact, before baker starts reading the new shards, some records could have already been published there. In this case using LATEST leads to have those records ignored. On the other hand, using TRIM_HORIZON means that, the first time that baker starts reading an existing stream, all the records hold in the shards must be read (data retention period depends on the stream configuration and is variable between 24 hours to 7 days). If reading all the existing records is a problem, then a possible strategy is to start baker the first time with LATEST (to create the first checkpoint) and then switch to TRIM_HORIZON

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good addition, I rephrased a bit to not give too much information here and remain high-level.
Please have a look

Comment thread README.md Outdated
Comment thread README.md Outdated
Copy link
Copy Markdown
Contributor

@tommyblue tommyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, fixed 2 typos

Comment thread README.md Outdated
Comment thread README.md Outdated
arl and others added 4 commits October 13, 2020 17:34
Co-authored-by: Tommaso Visconti <tommaso.visconti@adroll.com>
Co-authored-by: Tommaso Visconti <tommaso.visconti@adroll.com>
@arl arl merged commit 652efd8 into main Oct 13, 2020
@arl arl deleted the doc/kcl-input branch October 13, 2020 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants