Skip to content

SAMZA-2479: Add configurable default for min compaction lag ms#1305

Merged
mynameborat merged 6 commits into
apache:masterfrom
bkonold:compact_lag_config
Mar 17, 2020
Merged

SAMZA-2479: Add configurable default for min compaction lag ms#1305
mynameborat merged 6 commits into
apache:masterfrom
bkonold:compact_lag_config

Conversation

@bkonold
Copy link
Copy Markdown
Contributor

@bkonold bkonold commented Mar 6, 2020

Feature: Allow changelog min.compaction.lag.ms to be assigned a configurable default value that will apply to all stores if present, and unless overriden by a store-specific config.

The need for a different value comes from an evaluation done internally (within LI) that we can have a more relaxed setting than 4hr across all our jobs. The code default was chosen when the feature was implemented and is now part of a released build of samza; this 4hr value has been communicated to OSS customers and since changing this value has an operational impact on customers' kafka deployments (storage space) and each customer may have different thresholds for what is acceptable in terms of cost, making this default configurable empowers them to easily configure their samza/kafka clusters.

Changes: Introducing stores.default.changelog.min.compaction.lag.ms store config as a way to indicate a default value that will apply to all changelogs unless overriden for a store specifically.

Tests: Added a unit test which verifies the config works as expected and honors the correct precedence.

API Changes: None
Upgrade instructions: None
Usage instructions: Value of the configuration must be specified in milliseconds. Use stores.default.changelog.min.compaction.lag.ms to configure a default, and stores.<store-name>.changelog.min.compaction.lag.ms to set a store specific value. The precedence is as follows: 1) store specific configuration; if not present then 2) configured default; if not present then 3) default hard-coded in StorageConfig.java

@bkonold
Copy link
Copy Markdown
Contributor Author

bkonold commented Mar 6, 2020

@prateekm @bharathkk can you take a look?

Copy link
Copy Markdown
Contributor

@mynameborat mynameborat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions

  1. What do you think about having this configuration under JobConfig hierarchy instead of StoreConfig? I see why it is the way it is in your current PR and the reason for it to belong under StoreConfig. However, stores.default.... dilutes the (implicit) stronger contract of what follows stores is the store name and makes default as a reserved store name.
  2. What are the challenges in coming up with code default for it? Can you add details in PR description about the limitations of code default and the need for configurability?

@bkonold
Copy link
Copy Markdown
Contributor Author

bkonold commented Mar 10, 2020

Few questions

  1. What do you think about having this configuration under JobConfig hierarchy instead of StoreConfig? I see why it is the way it is in your current PR and the reason for it to belong under StoreConfig. However, stores.default.... dilutes the (implicit) stronger contract of what follows stores is the store name and makes default as a reserved store name.
  2. What are the challenges in coming up with code default for it? Can you add details in PR description about the limitations of code default and the need for configurability?

@mynameborat

  1. This sounds reasonable. So you are suggesting we change this config to job.default.changelog...? I am good with that; I was only following what seemed to be an already established precedent in our configs w.r.t. "default" store configs: https://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html#advanced-storage-configurations. Take a look at replication factor.

  2. Will respond here and add to the PR. The need for a different value comes from an evaluation done internally (within LI) that we can have a more relaxed setting than 4hr across all our jobs. The code default was chosen when the feature was implemented and is now part of a released build of samza; this 4hr value has been communicated to OSS customers and since changing this value has an operational impact on customers' kafka deployments (storage space) and each customer may have different thresholds for what is acceptable in terms of cost, making this default configurable empowers them to easily configure their samza/kafka clusters.

Copy link
Copy Markdown
Contributor

@mynameborat mynameborat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor documentation questions.
Looks good to me

Comment thread docs/learn/documentation/versioned/jobs/configuration-table.html
@mynameborat mynameborat merged commit 5e52b28 into apache:master Mar 17, 2020
@bkonold bkonold deleted the compact_lag_config branch June 10, 2020 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants