-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: enabled streams standby replicas by default #3641
Conversation
@derekjn @mjsax we can continue discussion here on choosing from the approaches below a) Landing this change and enabling it by default. Existing deployments will see additional resource consumption. but we can message this with other changes we are making None of this matters for single node quickstart/docker playground use cases.. |
BREAKING CHANGE: existing multi instance ksql deployments could see increased disk usage, due to additional standby state replication, with the benefit of improving state availability for pull queries. By default, we tolerate 1 instance failure and there is no effect on single instance deployment
74b037a
to
47a053d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - please get a +1 from one of the cloud folk (@vcrfxia @stevenpyzhang @spena)
Capturing the conversation with @derekjn here. IMO we should have ideally enabled the standby replication in KStreams by default, since KStreams design does not rely on checkpointing to recover state upon instance failures. Given the changes we are making to KSQL anyway, it seems prudent to do this now, so even pull/streaming queries will see meaningful speedups in the recovery times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @rodesai in case we want to override this default in cloud later. Using the default is definitely fine for now.
this one makes me wonder - we have a long-standing debate about whether the
default ootb settings should be suitable for dev or test or prod or some
other flavor of prod :) e.g. for some large-scale app usecases, it makes
very little sense to double the resource usage, whereas for others -
typically with smaller data volumes - it's a great default. My suggestion
would be to stick with what we have (where the default is for no standby
replicas) and we revisit #817 -
basically we should provide an entire properties file which is tuned for
"resilient, production deployments". Whether this is the default file or a
secondary one clearly marked as being "use this one for prod!!!" is an open
question but we should be consistent. The worst of all worlds would be to
have some max-resilience settings in a dev-oriented properties file and
some others in a separate properties file or only mentioned in the docs.
…On Tue, Oct 22, 2019 at 7:15 PM Victoria Xia ***@***.***> wrote:
***@***.**** approved this pull request.
LGTM. cc @rodesai <https://github.com/rodesai> in case we want to
override this default in cloud later. Using the default is definitely fine
for now.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3641?email_source=notifications&email_token=ABCXJIA4HOWE3MNLCVZE4SLQP6XU3A5CNFSM4JDSMXK2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCI3RMWA#pullrequestreview-305600088>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCXJIDI6P5LQCCYMOS7WPLQP6XU3ANCNFSM4JDSMXKQ>
.
|
@blueedgenick this makes a lot of sense to me. I think it's very reasonable to expect users to explicitly create a production configuration file (perhaps using our suggested configuration options as guidance), especially considering that KSQL runs as an actual server. This is fairly standard practice IMO, and I don't believe that users expect the OOTB configuration to be suitable for production. |
Thanks for all the feedback. Answer really depends on how you look at ksql - streaming ETL tool or end-end database with pull queries being the main stay.
The prod config files are a great idea, had them on my previous project(s) too. Did not realize ksql does not have one yet. I can pick up #817 if there are no takers yet. On this PR, if the argument was this will be disruptive to existing KSQL deployments and thats not okay, happy to close this and do (some form of) #817 instead, as it relates to pull queries. |
@apurvam there are trade-offs here. WDYT? |
I’d be in favor of having a set of recommended production configs in a clearly named properties file. This file can include this PR plus everything from https://docs.confluent.io/current/ksql/docs/installation/server-config/config-reference.html#ksql-production-settings. The default can remain optimized for development purposes. That said, if you will never have a standby replica on the same instance, so num.standby <= num instances, then this default is fairly harmless. Philosophically, num.standby =3 belongs in a production file and num.standby = 1 can be explicitly put in the development file. |
I still think this decision should be based on whether its okay to increase footprint for existing deployments. Nonetheless, I concede we don't have this data and I think most of the confusion here around what use-case to optimize for. So I ll close this and make a PR for #817 instead. Thanks everyone! |
@apurvam -- for dev the config should be |
thanks @mjsax , yea I came back to make that correction before seeing your message. I always get thrown off by the semantics of |
I think for prod deployments, it should be OK to default to increasing footprint for better availability. It is a config, and we have an opinion for the default. IF people want to trade off availability for cost, they have the lever to pull. |
Zero is the default and thus valid: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L583-L587 In fact, for a dev properties file, the parameter could be omitted entirely. |
Description
BREAKING CHANGE: existing multi instance ksql deployments could see increased disk usage, due to additional standby state replication, with the benefit of improving state availability for pull queries. By default, we tolerate 1 instance failure and there is no effect on single instance deployment
Testing done
Unit test added. Tested on my box bringing up ksql server
Reviewer checklist