Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling datasource level authorization in Druid #2424

Merged
merged 1 commit into from
Apr 28, 2016
Merged

Enabling datasource level authorization in Druid #2424

merged 1 commit into from
Apr 28, 2016

Conversation

pjain1
Copy link
Member

@pjain1 pjain1 commented Feb 9, 2016

Fixes #2355. This PR is meant to put necessary abstractions inside Druid for enabling authorization as discussed in #2355

  • Introduce AuthorizationInfo interface, specific implementations of which would be provided by extensions
  • If the druid.auth.enabled is set to true then the isAuthorized method of AuthorizationInfo will be called to perform authorization checks
  • AuthorizationInfo object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as AuthConfig.DRUID_AUTH_TOKEN
  • As per the scope of this PR, all resources that needs to be secured are divided into 3 types - DATASOURCE, CONFIG and STATE. For any type of resource, possible actions are - READ or WRITE
  • Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example
    • DatasourceResourceFilter is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as /druid/coordinator/v1/datasources/, /druid/coordinator/v1/metadata/datasources/, /druid/v2/datasources/
    • RulesResourceFilter is used where the datasource information is present after "rules" segment in the request Path such as /druid/coordinator/v1/rules/
    • TaskResourceFilter is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as druid/indexer/v1/task
    • ConfigResourceFilter is used for endpoints like /druid/coordinator/v1/config, /druid/indexer/v1/worker, /druid/worker/v1 etc
    • StateResourceFilter is used for endpoints like /druid/broker/v1/loadstatus, /druid/coordinator/v1/leader, /druid/coordinator/v1/loadqueue, /druid/coordinator/v1/rules etc
  • For endpoints where a list of resources is returned like /druid/coordinator/v1/datasources, /druid/indexer/v1/completeTasks etc. the list is filtered to return only the resources to which the requested user has access. In these cases, HttpServletRequest instance needs to be injected in the endpoint method.

Note -
JAX-RS specification provides an interface called SecurityContext. However, we did not use this but provided our own interface AuthorizationInfo mainly because it provides more flexibility. For example, SecurityContext has a method called isUserInRole(String role) which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources.

@drcrallen
Copy link
Contributor

There is an interesting mix of standardized and non standardized auth methods here. On one hand the authorization tries to provide a standard resource related framework, but on the other hand it completely relies on the endpoint to do all the auth requesting and logic.

I suspect this is because the resource of interest is contained within the body rather than as part of the http request metadata.

IMHO a "cleaner" solution would be to architect the requests such that any endpoint that touches a sensitive resource must have the appropriate annotations on what resources it touches. Then the auth layer transparently handles the auth based on request metadata, and the duty within the endpoint itself is simply to wire up the authorized resource to the resource actually being used.

Another way to say it is that it would be awesome if by the time the method is called, the auth system has already performed its function, and it is simply the job of the endpoint method to ensure compliance with the auth system's expectations on resource usage.

@pjain1
Copy link
Member Author

pjain1 commented Feb 9, 2016

Yes the auth checks are performed inside endpoints as many of the Druid endpoints does not follow REST conventions and resource information is inside the body.
So the cleaner approach that you are suggesting is to change the endpoints to follow REST conventions ? Is that what you meant when you said the requests must have appropriate annotations ?

@himanshug
Copy link
Contributor

@drcrallen your suggestion is valid, however something that can be done independently of this PR and would probably require druid client API changes.

@@ -0,0 +1,7 @@
package io.druid.server.security;

public enum AllowAccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just call it Access ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@himanshug himanshug added this to the 0.9.1 milestone Feb 9, 2016
@@ -0,0 +1,6 @@
package io.druid.server.security;

public enum ResourceType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen i think, the set of valid resource types would be fixed by druid-core because druid-core calls in to the action and provides those as arguments, so it makes sense to use enum to represent resource types.
in any case, since we are calling the api experimental and potentially changeable in near future. I wouldn't worry too much about it either ways.

@drcrallen
Copy link
Contributor

@himanshug / @pjain1 would you guys feel comfortable calling auth an experimental feature that may change significantly in the near-ish future?

If that's the case then a stop-gap that solves your main pain points should be ok as long as its not intrusive in other scenarios (which I don't think this PR is)

@drcrallen
Copy link
Contributor

General PR comments:

  • Headers missing on some of the files
  • Why are enums needed instead of just strings?

@himanshug
Copy link
Contributor

@drcrallen I'm fine calling it experimental.
sounds like high level approach is ok.
@pjain1 can you make necessary add/updates and remove the "Discuss" when done?

* @param action action to be performed on the resource
* @return {@link AllowAccess#ALLOW} if authorized otherwise {@link AllowAccess#DENY}
* */
AllowAccess isAuthorized(Resource resource, Action action);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just return boolean?

@drcrallen
Copy link
Contributor

@pjain1 please comment in the code where you're adding the auth checks that auth is experimental and reference this PR. ((so future developers know why it is there))

@pjain1
Copy link
Member Author

pjain1 commented Feb 9, 2016

@drcrallen sure I will put the necessary comments. I will put the headers.
About Enum vs String - I chose enums as it gives more control on what values can be passed in and all the options available to the developer but I also see the point that anyways AuthorizationInfo will be implemented by extensions they can pass in whatever they want. Personally I would prefer enum but I am OK in changing it to String if you have a strong opinion against using enum. BTW what is your reason of not having enums ?

@drcrallen
Copy link
Contributor

@pjain1 good points. My reasons for not favoring enums are:

  • Enums are not very extensible, and IMHO make more sense when either A) there is an explicit requirement for ordering or B) There is an explicit need to limit the options available
  • I'm not sure how well enums play with classloaders. I'm guessing they don't any more so than other classes. Getting an error message like "DATASOURCE cannot be assigned to type ResourceType" (or similar) is pretty irritating. This can occur if the two enum classes were loaded through different classLoaders.

So basically, if you're trying to force any extensions to ONLY use the values presented, and not intending the security set to be extendable, then there can be a good argument for enums.

@drcrallen
Copy link
Contributor

@himanshug I think from #2424 (comment) you're supporting B) from my list above, where you are purposefully limiting the options available. Is that correct?

@himanshug
Copy link
Contributor

@drcrallen yes

@fjy
Copy link
Contributor

fjy commented Feb 27, 2016

@pjain1 merge conflicts

@pjain1
Copy link
Member Author

pjain1 commented Feb 29, 2016

@fjy I am still working on it, I will resolve the conflicts when it is reviewable

@@ -278,6 +279,7 @@ public static Injector makeInjectorWithModules(final Injector baseInjector, Iter
{
final ModuleList defaultModules = new ModuleList(baseInjector);
defaultModules.addModules(
new DruidAuthModule(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after log4j shutter downer module

@pjain1
Copy link
Member Author

pjain1 commented Mar 7, 2016

@drcrallen @fjy @himanshug the PR is reviewable. Updated the top level comment to reflect the state of latest changes. @drcrallen using annotation based resource filtering for performing access checks.

@pjain1 pjain1 changed the title [Discuss] [WIP] Enabling datasource level authorization in Druid Enabling datasource level authorization in Druid Mar 7, 2016
@pjain1 pjain1 removed the Discuss label Mar 7, 2016
@@ -372,7 +454,36 @@ public Response getRunningTasks()
@Override
public Collection<? extends TaskRunnerWorkItem> apply(TaskRunner taskRunner)
{
return taskRunner.getRunningTasks();
if (authConfig.isEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the change in getPendingTasks() are same, can you refactor them into a private helper method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@fjy
Copy link
Contributor

fjy commented Mar 14, 2016

@pjain1 can we finish this up? there's merge conflicts

@pjain1 pjain1 closed this Mar 15, 2016
@pjain1 pjain1 reopened this Mar 15, 2016
private void getDataSourcesHelper(final Query query, List<String> datasources) {
if (query.getDataSource() instanceof TableDataSource) {
// there will only be one datasource for TableDataSource
datasources.addAll(query.getDataSource().getNames());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which types break using only this call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the getNames call on the DataSource impls here, it looks like you should be able to just use getNames

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, dataSource.getNames() already returns appropriate list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check now

@pjain1
Copy link
Member Author

pjain1 commented Apr 21, 2016

@drcrallen I think I addressed your comments, can you have a look again ?

@@ -0,0 +1,115 @@
package io.druid.indexing.overlord.http.security;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header

@drcrallen
Copy link
Contributor

@pjain1 Please check out https://github.com/pjain1/druid/pull/1 and see if that works for you

@pjain1
Copy link
Member Author

pjain1 commented Apr 27, 2016

@drcrallen fixed UTs. have a look now

@Override
public void configure(Binder binder)
{
JsonConfigProvider.bind(binder, "druid.auth", AuthConfig.class);
Copy link
Contributor

@drcrallen drcrallen Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) would this be more appropriate as druid.request.auth or something else a little more descriptive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if request should be included in the property name as the scope is much more than just requests to Druid. It can be called druid.security or something like that or can be kept as it is..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep as is then.

@drcrallen
Copy link
Contributor

few minor comments but looking very good overall.

authorizationInfo = EasyMock.createStrictMock(AuthorizationInfo.class);

// Memory barrier
synchronized (this) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen btw can you please explain what it this for ?

Copy link
Contributor

@drcrallen drcrallen Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I was having trouble with threading, but it might not be right now.

Sometimes if a mocked object is accessed by multiple threads, the threads see an inconsistent rule set if the expectations are set in a different thread than the object is used. Memory barriers make the memory consistent at least to the point you pass the barrier.

You can tell you are hitting this kind of mocking race condition if you get an error like "expected 1 actual 1"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@drcrallen
Copy link
Contributor

@pjain1 can you comment in the master comment why you opted for this route instead of going through
https://jersey.java.net/documentation/latest/security.html ?

@pjain1
Copy link
Member Author

pjain1 commented Apr 28, 2016

@drcrallen updated the master comment with the info you asked for. Please, see if it makes sense.

@drcrallen
Copy link
Contributor

@pjain1 yes thanks!

@drcrallen
Copy link
Contributor

Please ping me when #2424 (comment) is resolved and I think this should be ready to go

@pjain1
Copy link
Member Author

pjain1 commented Apr 28, 2016

@drcrallen done

@drcrallen
Copy link
Contributor

Cool 👍

@pjain1
Copy link
Member Author

pjain1 commented Apr 28, 2016

@drcrallen squashed the commits

- Introduce `AuthorizationInfo` interface, specific implementations of which would be provided by extensions
- If the `druid.auth.enabled` is set to `true` then the `isAuthorized` method of `AuthorizationInfo` will be called to perform authorization checks
-  `AuthorizationInfo` object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as `AuthConfig.DRUID_AUTH_TOKEN`
- As per the scope of this PR, all resources that needs to be secured are divided into 3 types - `DATASOURCE`, `CONFIG` and `STATE`. For any type of resource, possible actions are  - `READ` or `WRITE`
- Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example
 - `DatasourceResourceFilter` is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as `/druid/coordinator/v1/datasources/`, `/druid/coordinator/v1/metadata/datasources/`, `/druid/v2/datasources/`
 - `RulesResourceFilter` is used where the datasource information is present after "rules" segment in the request Path such as `/druid/coordinator/v1/rules/`
 - `TaskResourceFilter` is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as `druid/indexer/v1/task`
 - `ConfigResourceFilter` is used for endpoints like `/druid/coordinator/v1/config`, `/druid/indexer/v1/worker`, `/druid/worker/v1` etc
 - `StateResourceFilter` is used for endpoints like `/druid/broker/v1/loadstatus`, `/druid/coordinator/v1/leader`, `/druid/coordinator/v1/loadqueue`, `/druid/coordinator/v1/rules` etc
- For endpoints where a list of resources is returned like `/druid/coordinator/v1/datasources`, `/druid/indexer/v1/completeTasks` etc. the list is filtered to return only the resources to which the requested user has access. In these cases, `HttpServletRequest` instance needs to be injected in the endpoint method.

Note -
JAX-RS specification provides an interface called `SecurityContext`. However, we did not use this but provided our own interface `AuthorizationInfo` mainly because it provides more flexibility. For example, `SecurityContext` has a method called `isUserInRole(String role)` which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources. Fixes #2355 with PR #2424
return action;
}

public abstract boolean isApplicable(String requestPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone know what this method is for? This method is used in only unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants