Add support for authorizing query context params #12396

jihoonson · 2022-04-04T22:18:11Z

Description

The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.

Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

{
  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    },
    "action" : "WRITE"
  },
  "resourceNamePattern" : "maxSubqueryRows"
}

Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.

Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.

Key changed/added classes in this PR

QueryContext tracks user params and separates them from others.
QueryHolder has a state indicating whether the context in the native query is valid.
QueryLifecycle retrieves context params from a valid source.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.

clintropolis

overall lgtm, this seems pretty useful to be able to control access to this stuff 🤘

clintropolis · 2022-04-05T19:17:52Z

server/src/main/java/org/apache/druid/server/QueryHolder.java

+ * using {@link #withContext(QueryContext)}. When callers use query context, they should check first
+ * if the query holder has a valid query context using {@link #isValidContext()}.
+ */
+public class QueryHolder<T>


this feels somewhat sad given the existence of QueryPlus, the other wrapper, though i guess it is used for processing queries instead of the lead up to it... still, seems like we have too many wrappers...

I added QueryHolder at first because it was quite confusing which is the valid context between Query.getContext() and QueryContext. QueryHolder was a stateful object indicating whether Query.getContext is valid. QueryLifecycle could find a valid query context using QueryHolder.isValidContext(). A better approach is replacing the context map in Query with QueryContext as we can consolidate the query context stores. I didn't do it at first because it seems quite invasive. However, I was curious how invasive it would be, so went ahead and tried it. It actually doesn't seem that invasive, this PR is rather a little bit smaller than it was before.

The Query interface now has getQueryContext() which returns QueryContext. This new interface is preferred over Query.getContext() which internally simply calls QueryContext.getMergedParams(). QueryContext in Query is "valid" only in the broker in a sense that defaultParams, userParams, and systemParams will not be kept after serialization. All parameters will be stored in userParams after it is deserialized. This should not cause any issue today.

clintropolis · 2022-04-05T20:41:34Z

server/src/main/java/org/apache/druid/server/QueryLifecycle.java

+   * - Updated with context after authorization in {@link #doAuthorize}.
+   */
+  @MonotonicNonNull
+  private QueryHolder<?> baseQuery;


this seems error prone to have two contexts like this with one hidden in the wrapper, I think it might be easier to follow what is going on here to keep this as a Query and have a baseContext separate from the context.

clintropolis · 2022-04-05T20:53:37Z

server/src/main/java/org/apache/druid/server/QueryResource.java

-          queryId
-      );
-
+      final NonnullPair<QueryHolder<?>, QueryContext> pair = readQuery(req, in, ioReaderWriter);


again this feels confusing to have two query contexts, one of which is wrapped, i'd personally probably rather just see them all separately to make it easier to follow what is what

…uthz

clintropolis

very nice 👍

clintropolis · 2022-04-19T01:22:17Z

sql/src/main/java/org/apache/druid/sql/avatica/DruidConnection.java

@@ -43,14 +42,15 @@
 */
 public class DruidConnection
 {
-  private static final Logger LOG = new Logger(DruidConnection.class);
-  private static final Set<String> SENSITIVE_CONTEXT_FIELDS = Sets.newHashSet(
+  public static final Set<String> SENSITIVE_CONTEXT_FIELDS = ImmutableSet.of(


this isn't used here anymore, and can be moved to DruidMeta

clintropolis · 2022-04-19T01:23:19Z

server/src/main/java/org/apache/druid/server/QueryLifecycle.java

@@ -343,11 +359,56 @@ public void emitLogsAndMetrics(
    }
  }

-  public Query getQuery()
+  /**
+   * Returns the Query wrapped inside QueryHolder.


nit: stale javadoc

clintropolis · 2022-04-19T01:23:42Z

server/src/main/java/org/apache/druid/server/QueryLifecycle.java

-  private Query baseQuery;
+
+  /**
+   * A holder for the user query to run.


nit: stale javadoc

clintropolis · 2022-04-19T01:23:57Z

processing/src/main/java/org/apache/druid/query/QueryContext.java

+    return Objects.hash(getMergedParams());
+  }
+
+  // TODO: toString?


unresolved todo?

clintropolis · 2022-04-19T01:24:18Z

processing/src/main/java/org/apache/druid/query/QueryContext.java

+ * You can use {@code getX} methods or {@link #getMergedParams()} to compute the context params
+ * merging 3 types of params above.
+ *
+ * Currently, this class is mainly used for query context parameter authorization in query entires,


Suggested change

* Currently, this class is mainly used for query context parameter authorization in query entires,

* Currently, this class is mainly used for query context parameter authorization,

clintropolis · 2022-04-19T01:24:38Z

processing/src/main/java/org/apache/druid/query/QueryContext.java

+ *   Auto-generated queryId or sqlQueryId are also set as default parameters. These default parameters can
+ *   be overridden by user or system parameters.
+ * - User parameters. These are the params set by the user. User params override default parameters but
+ *   are overridden by system paramters.


Suggested change

* are overridden by system paramters.

* are overridden by system parameters.

kfaraz

Overall looks good. Added some minor comments.

We could also update the utility methods inside QueryContexts that accept a map to now accept a QueryContext itself but this can be done later.

kfaraz · 2022-04-19T04:44:45Z

...ation-tests/src/test/java/org/apache/druid/tests/security/AbstractAuthConfigurationTest.java

+    return makeSQLQueryRequest(httpClient, query, ImmutableMap.of(), expectedStatus);
+  }
+
+  protected StatusResponseHolder makeSQLQueryRequest(


Suggestion: Maybe rename to something like makeSqlRequestAndVerifyStatus.

This is actually not a new method, but I added a new parameter for this method. I would like to not rename it as this PR is already quite big.

kfaraz · 2022-04-19T04:48:03Z

processing/src/main/java/org/apache/druid/query/Query.java

@@ -95,8 +94,11 @@

  DateTimeZone getTimezone();

+  @Deprecated


Nit: Please add a comment/javadoc about the deprecation and the alternative.

kfaraz · 2022-04-19T04:56:06Z

processing/src/main/java/org/apache/druid/query/QueryContext.java

+      return false;
+    }
+    QueryContext context = (QueryContext) o;
+    return getMergedParams().equals(context.getMergedParams());


Nit:
As we are comparing only the final merged params, two QueryContext objects that are currently equal might not be so after performing the same operation (say removeUserParam) on the two of them.

For example:
Context1: userParam={p1=10}, systemParam={}
Context2: userParam={}, systemParam={p1=10}
These two are currently equal.
But after performing the same operation, say removeUserParam(p1), the two contexts will not remain equal anymore.

I guess this should be okay?

The scenario you described sounds OK to me. defaultParams, userParams, or systemParams are to track how those params are set, and should not affect the equality test of the queryContext object. Any particular concern about the behavior?

No concern as such. Just wanted to be sure.

kfaraz · 2022-04-19T07:48:45Z

...ation-tests/src/test/java/org/apache/druid/tests/security/AbstractAuthConfigurationTest.java

  protected abstract void setupDatasourceAndSysTableUser() throws Exception;
  protected abstract void setupDatasourceAndSysAndStateUser() throws Exception;
  protected abstract void setupSysTableAndStateOnlyUser() throws Exception;
  protected abstract void setupTestSpecificHttpClients() throws Exception;
  protected abstract String getAuthenticatorName();
  protected abstract String getAuthorizerName();
  protected abstract String getExpectedAvaticaAuthError();
-  protected abstract Properties getAvaticaConnectionProperties();


Nit:
Maybe we should just add a new testAvaticaQuery(properties, url)and leave the existingtestAvaticaQuery as is?
Otherwise, we are forced to pass the properties to testAvaticaQuery in every downstream test.

It was my bad that I removed this method before. I missed that implementations of this class can have a different implementation for this method. I added them back.

for testAvaticaQuery, I think it's OK to change the method signature as the new signature makes it clearer what user's credentials are used for the test.

kfaraz · 2022-04-19T07:51:36Z

server/src/main/java/org/apache/druid/server/QueryLifecycle.java

-    } else {
-      mergedUserAndConfigContext = defaultQueryConfig.getContext();
-    }
+    baseQuery.getQueryContext().addDefaultParam(BaseQuery.QUERY_ID, UUID.randomUUID().toString());


Much cleaner now!

kfaraz · 2022-04-19T08:09:58Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java

+            new QueryContext(
+                ImmutableMap.of(
+                    PlannerConfig.CTX_KEY_USE_GROUPING_SET_FOR_EXACT_DISTINCT,
+                    "true"
+                )
+            )


Nit: Maybe add a private util method for this? This code is present in 4 places.

I'm not sure if that is worth. I think we rather need to refactor and split CalciteQueryTest since this class is too big now.

+1 on splitting up the CalciteQueryTest

kfaraz · 2022-04-19T08:11:05Z

sql/src/test/java/org/apache/druid/sql/avatica/DruidStatementTest.java

+    final DruidStatement statement = new DruidStatement(
+        "",
+        0,
+        new QueryContext(),
+        sqlLifecycleFactory.factorize(),
+        () -> {}
+    ).prepare(sql, -1, AllowAllAuthenticator.ALLOW_ALL_RESULT);


Nit: Maybe add a createStatement(sql) private util method for this to avoid code repetition.

kfaraz · 2022-04-19T08:13:39Z

sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java

@@ -74,7 +73,7 @@ private DruidJoinRule(final PlannerContext plannerContext)
            operand(DruidRel.class, any())
        )
    );
-    this.enableLeftScanDirect = QueryContexts.getEnableJoinLeftScanDirect(plannerContext.getQueryContext());


isEnableJoinLeftScanDirect seems too specific a use case to be a part of the base QueryContext class itself. Why have we moved it from QueryContexts?

IMO, now QueryContexts should remain just as a holder of predefined context parameters. It doesn't give any more benefit over QueryContext. getEnableJoinLeftScanDirect is just a util method for our convenience and they all should be moved to QueryContext.

👍🏻 Moving all of these methods to QueryContext would be much better than it is now.

kfaraz · 2022-04-19T08:18:09Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java

@@ -158,43 +156,37 @@ public boolean isUseNativeQueryExplain()
    return useNativeQueryExplain;
  }

-  public PlannerConfig withOverrides(final Map<String, Object> context)
+  public PlannerConfig withOverrides(final QueryContext queryAndContext)


Suggestion: Rename to queryContext

kfaraz · 2022-04-19T08:23:32Z

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java

@@ -135,32 +139,29 @@ public SqlLifecycle(
   *
   * If successful (it will be), it will transition the lifecycle to {@link State#INITIALIZED}.
   */
-  public String initialize(String sql, Map<String, Object> queryContext)
+  public String initialize(String sql, QueryContext queryAndContext)


Nit:

Suggested change

public String initialize(String sql, QueryContext queryAndContext)

public String initialize(String sql, QueryContext queryContext)

clintropolis

🤘

clintropolis · 2022-05-24T18:40:06Z

tagging Incompatible because this adds a new method without a default implementation to the Query interface (which is an @ExtensionPoint). Still trying to decide what exactly to do about it.

jihoonson · 2022-05-24T23:45:26Z

@clintropolis maybe we can add a default implementation for getQueryContext() that returns null. Callers should check the return value and use getContext() instead if null is returned. When context param authorization is enabled but getQueryContext() returns null, since we cannot distinguish user context params and others, the authorizer authorizes all context params returned by getContext(). What do you think?

clintropolis · 2022-05-25T00:05:22Z

@clintropolis maybe we can add a default implementation for getQueryContext() that returns null. Callers should check the return value and use getContext() instead if null is returned. When context param authorization is enabled but getQueryContext() returns null, since we cannot distinguish user context params and others, the authorizer authorizes all context params returned by getContext(). What do you think?

I did something somewhat similar in #12564, except instead of null I returned a specific object and a default implementation method to check for it. I considered null, but ultimately decided to do this instead so that I didn't have to mark getQueryContext nullable, because it didn't seem like legitimate implementations should ever return null. I'm not very attached to the way I did it though, so if you think null is better its more or less the same thing I think. I've decided to change it to null as suggested here

Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.

Adds a default implementation of getQueryContext, which was added to the Query interface in apache#12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.

Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else. Co-authored-by: Clint Wylie <cwylie@apache.org>

Add support for authorizing query context params

5972c8f

jihoonson added Release Notes Design Review labels Apr 4, 2022

jihoonson added 3 commits April 5, 2022 09:18

fix npe in test

2fa50fa

more test

b074831

unused import

6c4c96b

clintropolis reviewed Apr 5, 2022

View reviewed changes

jihoonson added 6 commits April 13, 2022 10:07

fix integration test

6a2eea6

context in QueryPlus

d470e17

context in query

c0a4962

fix ci

d7a0c7d

redundant import

af2dfe2

fix ldap test

600c5b2

jihoonson closed this Apr 15, 2022

jihoonson reopened this Apr 15, 2022

jihoonson added 2 commits April 15, 2022 15:56

Merge branch 'master' of github.com:apache/druid into context-param-a…

2c3314d

…uthz

fix unit test

b349b65

clintropolis reviewed Apr 19, 2022

View reviewed changes

address comments

4640b16

kfaraz added Area - Querying Security labels Apr 19, 2022

kfaraz approved these changes Apr 19, 2022

View reviewed changes

jihoonson added 3 commits April 19, 2022 20:27

fix auth test for extensions

5c3fcb7

address more comments

2ef7ea5

suppress unused warning

d292fd5

clintropolis approved these changes Apr 20, 2022

View reviewed changes

abhishekagarwal87 merged commit 73ce5df into apache:master Apr 21, 2022

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

clintropolis added the Incompatible label May 24, 2022

clintropolis mentioned this pull request May 24, 2022

make query context changes backwards compatible #12564

Merged

This was referenced May 25, 2022

Backport - make query context changes backwards compatible (#12564) #12565

Merged

[Draft] 0.23.0 Release notes #12510

Closed

gianm mentioned this pull request Sep 9, 2022

Fix QueryContext race condition #13049

Merged

2 tasks

	* Currently, this class is mainly used for query context parameter authorization in query entires,
	* Currently, this class is mainly used for query context parameter authorization,

	* are overridden by system paramters.
	* are overridden by system parameters.

	public String initialize(String sql, QueryContext queryAndContext)
	public String initialize(String sql, QueryContext queryContext)

Add support for authorizing query context params #12396

Add support for authorizing query context params #12396

Conversation

jihoonson commented Apr 4, 2022 • edited Loading

Description

Key changed/added classes in this PR

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

clintropolis commented May 24, 2022

jihoonson commented May 24, 2022

clintropolis commented May 25, 2022 • edited Loading

jihoonson commented Apr 4, 2022 •

edited

Loading

clintropolis commented May 25, 2022 •

edited

Loading