Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull Query: Disable Pull Queries when auth validation is needed on /query endpoint #3863

Closed
vinothchandar opened this issue Nov 15, 2019 · 5 comments · Fixed by #3879
Closed

Comments

@vinothchandar
Copy link
Contributor

Now that the pull queries are moved to /query endpoint, we took a look at what validation happens on this code path.

We are invoking KsqlAuthorizationValidator#checkAuthorization() for every request (push or pull) when either ksql.access.validator.enable=on or ksql.access.validator.enable=auto and the kafka cluster has a non empty value for authorizer.class.name.

Even as #3696 avoids eager instantiation of the adminclient (+ other clients) for every request, it simply memoizes it per ServiceContext created for every request. When real auth validation is involved, the call path ultimately leads to KsqlAuthorizationValidatorImpl#checkAccess , which describes a topic by talking to kafka

 private void checkAccess(
      final ServiceContext serviceContext,
      final String topicName,
      final AclOperation operation
  ) {
    final Set<AclOperation> authorizedOperations = serviceContext.getTopicClient()
        .describeTopic(topicName).authorizedOperations();
    ... 

We need to fail pull queries or skip validation, to avoid creating clients per request again.

@vinothchandar
Copy link
Contributor Author

cc @spena @vpapavas

@big-andy-coates
Copy link
Contributor

big-andy-coates commented Nov 15, 2019

We need to fail pull queries or skip validation, to avoid creating clients per request again.

If I read this right, because we have a known performance hit from checking the user has the rights to access the data the pull query is using you're saying we should either:

a) Fail the pull query, or
b) skip validation.

This seems a very strange stance to take.

Why would we fail the pull query? What if the user is happy to take the perf hit and just wants the pull query to work?

If we skip validation we're introducing a security hole, (it would reintroduce #3772). It's a security hole with the current pull query impl, as you're allowing the user access to data KSQL, that they don't have permission to access from Kafka, (I think it helps to think of the materialized state in KSQL as a cache of the data in Kafka. KSQL could build the state from the state in Kafka if it wanted: using the pre-materialized state is, in essence, just an optimisation). Plus, it's certainly a security hole as we move forward with pull queries that will actually access Kafka topics, i.e. pull queries against none-materialized state.

Surely the fix here is to introduce some level of caching for the auth responses so that we don't need to check auth on every request.

@vinothchandar
Copy link
Contributor Author

vinothchandar commented Nov 15, 2019

What if the user is happy to take the perf hit and just wants the pull query to work?

If we create an admin client every request and making a network call, you are also creating a socket and with even reasonable volume of requests/sec, it will connection storm kafka and cause stability pains. (https://kafka.apache.org/documentation/ and search for "connection storms") . Ultimately, the user will end up turning off pull queries anyway, so why give them headaches. This is why we must do this before releasing.
On perf front, this is significant and drops the qps by 60% and < 1000 qps/box or so and I have explicit guidance to make the perf reasonable at launch.

Surely the fix here is to introduce some level of caching for the auth responses so that we don't need to check auth on every request.

Great. Looks like we hit the last option from your earlier list of things to try. #3663 (comment) .

@apurvam
Copy link
Contributor

apurvam commented Nov 15, 2019

As @vinothchandar said, Kafka is known to be very sensitive to connection storms. Internal experience running kafka in the cloud shows this to be empirically true.

Creating a new connection on each pull query when we can reasonably expect a high volume is going to cause instability on Kafka. Further the admin calls to describe the ACLs only go to the controller, which isn’t sharded. This creates another hot spot which won’t scale for a high volume of queries. Further the availability of the controller is vital to the availability of the cluster. So instability there is even worse.

I think we should fix this properly (cached authorization results shared across requests), but in the meantime make sure that we don’t de stabilize existing Kafka installations: the latter would be worse for adoption IMO.

@big-andy-coates
Copy link
Contributor

big-andy-coates commented Nov 15, 2019

Personally, I still think it's a shame to disable something that may be of use to people for their low volume / testing without issue.

We could call this out in the release notes, or have it disabled in the default config with a warning in the comment etc. This would be better than disabling out right IMHO.

Why not introduce a second config to control if pull queries are enabled if custom auth handler is installed?

vinothchandar added a commit to vinothchandar/ksql that referenced this issue Nov 16, 2019
fixes confluentinc#3863

 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation
 - By default, Pull queries error out, if auth validation is needed
 - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory
 - Fixed some tests, added test cases
vinothchandar added a commit to vinothchandar/ksql that referenced this issue Nov 16, 2019
fixes confluentinc#3863

 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation
 - By default, Pull queries error out, if auth validation is needed
 - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory
 - Fixed some tests, added test cases
vinothchandar added a commit to vinothchandar/ksql that referenced this issue Nov 18, 2019
fixes confluentinc#3863

 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation
 - By default, Pull queries error out, if auth validation is needed
 - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory
 - Fixed some tests, added test cases
vinothchandar added a commit that referenced this issue Nov 18, 2019
…3879)

fixes #3863

 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation
 - By default, Pull queries error out, if auth validation is needed
 - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory
 - Fixed some tests, added test cases
 - Applied on both `query` and websocket endpoints
vinothchandar added a commit to vinothchandar/ksql that referenced this issue Nov 26, 2019
…c#3879)

fixes confluentinc#3863

 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation
 - By default, Pull queries error out, if auth validation is needed
 - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory
 - Fixed some tests, added test cases
 - Applied on both `query` and websocket endpoints
vinothchandar added a commit that referenced this issue Nov 27, 2019
…sters (#3980)

* refactor: lazy initialization of clients (admin,sr,ksql,connect) (#3696)
- Made client creation lazy by memoizing them.

* feat: add config to disable pull queries when validating (#3879)

fixes #3863

 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation
 - By default, Pull queries error out, if auth validation is needed
 - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory
 - Fixed some tests, added test cases
 - Applied on both `query` and websocket endpoints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants