Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20239][Core] Improve HistoryServer's ACL mechanism #17582

Closed
wants to merge 3 commits into from

Conversation

jerryshao
Copy link
Contributor

@jerryshao jerryshao commented Apr 10, 2017

What changes were proposed in this pull request?

Current SHS (Spark History Server) two different ACLs:

  • ACL of base URL, it is controlled by "spark.acls.enabled" or "spark.ui.acls.enabled", and with this enabled, only user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed. This will also affect REST APIs which listing the summary of all apps and one app.
  • Per application ACL. This is controlled by "spark.history.ui.acls.enabled". With this enabled only history admin user and user/group who ran this app can access the details of this app.

With this two ACLs, we may encounter several unexpected behaviors:

  1. if base URL's ACL (spark.acls.enable) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app.
  2. if ACLs of base URL (spark.acls.enable) is disabled, then user "A" could download any application's event log, even it is not run by user "A".
  3. The changes of Live UI's ACL will affect History UI's ACL which share the same conf file.

The unexpected behaviors is mainly because we have two different ACLs, ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, here in this PR proposed to:

  1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" for history server.
  2. Check permission for event-log download REST API.

With this PR:

  1. Admin user could see/download the list of all applications, as well as application details.
  2. Normal user could see the list of all applications, but can only download and check the details of applications accessible to him.

How was this patch tested?

New UTs are added, also verified in real cluster.

CC @tgravescs @vanzin please help to review, this PR changes the semantics you did previously. Thanks a lot.

@SparkQA
Copy link

SparkQA commented Apr 10, 2017

Test build #75641 has finished for PR 17582 at commit bc1e53a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2017

Test build #75643 has finished for PR 17582 at commit e56c388.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

Sorry but I'm confused by the explanation in the description. I didn't completely follow what problems you are seeing that aren't intended and I don't understand how you are proposing to fix. Can you please describe the design you are proposing in more detail?

On the description can you please clarify each of your bullets? For instance:

  1. if base URL's ACL (spark.acls.enable) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app.

Are you saying user A is not in the list of acls or is? if they have no view permission then they shouldn't be able to see the app. I don't understnad what you mean by "could still access details of it's own app"? Is this user A's application (meaning they started it) and hence he would automatically be in the acl list?

Clarifying the other bullets would be helpful as well.

@vanzin
Copy link
Contributor

vanzin commented Apr 10, 2017

user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed

So to me this is the only bug; which means that maybe ACLs on the listing itself shouldn't ever be applied, and this PR should be a lot simpler, right?

Most of it seem to be dealing with filtering the list of apps so that only applications the user can see are shown. I wonder if that's necessary, since the only thing that's showing is the existence of the application, not any data about it that could be considered sensitive.

There's also a minor thing that the listing being different for different users might cause confusion; but if there's a good reason for filtering, then that concern can be overridden. I'm just not sure there is a good reason for it.

@jerryshao
Copy link
Contributor Author

@tgravescs sorry for the confuse.

if base URL's ACL (spark.acls.enable) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app.

Here actually has two list of acls, one is controlled by spark.acls.enabled, if user "A" is not added to this acl list, then user "A" cannot see the app list (/<history-server-url>/api/v1/applications). But if this app is run by user "A", then user "A" could still see the details of app, like (/<history-server-url>/api/v1/applications/<app-id>/jobs), this acl is controlled by "spark.history.ui.acls.enabled", and user "A" is automatically in the acl list (because of run by him).

if ACLs of base URL (spark.acls.enable) is disabled. Then user "A" could see the summary of all the apps, even some apps didn't run by user "A", but can only access its own app's details.

If "spark.acls.enabled" is disabled, then SecurityFilter is not worked, so user "A" could access /<history-server-url>/api/v1/applications, which means user "A" could see all the applications even not run by him.

This /<history-server-url>/api/v1/applications doesn't touch spark.history.ui.acls.enabled.

if ACLs of base URL (spark.acls.enable) is disabled, then user "A" could download any application's event log, even it is not run by user "A".

This is the same issue as above. /<history-server-url>/api/v1/applications/<app-id>/logs is only controlled by "spark.acls.enable", not "spark.history.ui.acls.enable". So anyone could download any even logs if "spark.acls.enable" is disabled.

So basically what I fixed is that:

  1. disable the work of spark.acls.enable, which means SecurityFilter is not checked.
  2. Using spark.history.ui.acls.enable to filter applications, application summary and application log based on users who run the app.

So the result of my PR is:

  1. history admin user could see/download/access any apps.
  2. normal user could see/download/access apps run by him.

@vanzin your suggestion is to only disable ACLs on the listing, that definitely simplifies the fix, but IMO that "all or nothing" solution is not so ideal:

  1. any user could list all the apps, though cannot access the details if it is not run by him. For the sensitivity, is it better to even not show the apps not run by him?
  2. currently if ACLs on listing is disabled, anyone could download event log, which on the other hand expose the security hole to other users.

So IMO filtering based on users is better than "all or nothing" solution. Also it doesn't increase the code complex much.

@jerryshao
Copy link
Contributor Author

jerryshao commented Apr 14, 2017

@tgravescs @vanzin do you have any comment on this JIRA?

A compromise is that any user could see all the app list but detailed information is still controlled by per app ACLs. But at least we should fix event log download issue, currently anyone could download the event log if "spark.acls.enable" is disabled, even it is not permitted by HDFS. This is definitely a security hole should be fixed.

What do you think?

@tgravescs
Copy link
Contributor

Sorry again the wording above and all the different configs are a bit confusing to me as to what the real issues are here.

Here actually has two list of acls, one is controlled by spark.acls.enabled, if user "A" is not added
to this acl list, then user "A" cannot see the app list (//api/v1/applications). But if this app is run by user "A", then user "A" could still see the details of app, like (//api/v1/applications//jobs), this acl is controlled by "spark.history.ui.acls.enabled", and user "A" is automatically in the acl list (because of run by him).

You are mixing things here. You say that if user "A" is not added to acl list he cannot see the app list. This is broken then and I assume only applies to rest api not UI? But I'm not sure what that has to do with your second sentence, if user "A" ran the app then of course he can see the details of the app, that is intended. I'm not sure what that has to do with the first issue? If you don't have spark.history.ui.acls.enabled then it is up to what the user set. Generally in any secure environment you should set spark.history.ui.acls.enabled=true and it should enforce acls no matter what user set. It might help for you to describe these in terms of configs. Which exact configs are set on the history server and which exact configs are set on the application side and which exact apis are being used (Rest vs Web UI).

so all the urls you list are the REST API, is this only an issue with rest api or the actual web UI as well? It sounds like things are definitely broke there but I'm not sure it requires changing the configs just fixing the things that are broken.

Its supposed to be that if spark.history.ui.acls.enable is enabled it doesn't matter what the setting of spark.acls.enable is, acls should always be enforced on the history server. see the description: https://spark.apache.org/docs/latest/monitoring.html

Certain UI's don't have information that should be sensitive. I thought the list of applications was one of those things, all users should be able to see the entire list of applications. Nothing sensitive there, but once you look at the application details that should be acl'd. If someone added something sensitive then it should be protected or it should be moved from that page.

My opinions on your response to @vanzin

  1. No, there shouldn't be sensitive information there and many times a user is looking for a job run by say a headless user or other user. I guess you could filter only the jobs that user has acls to but that makes it more complicated. Do you have a concrete reason it should be protected? Note that this follow how other Hadoop UI's work.

  2. That is just broken, event log should be protected.

@ajbozarth
Copy link
Member

Been following this but haven't had time to do a proper review, but @tgravescs since you brought up the UI vs API thing, as of 2.0 the UI gets it's list from the API so that's where the security has to be handled.

@jerryshao
Copy link
Contributor Author

@tgravescs , with the changes of history UI, REST API and web UI are now mixed. The base URL to list all the apps is through REST API.

The key problem here is that in History Server we could have configured two ACLs (spark.acls.enable here and spark.history.ui.acls.enable here), this two ACLs checks different URLs, for example:

/api/v1/applications
/api/v1/applications/<app-id>
/api/v1/applications/<app-id>/logs

are controlled by spark.acls.enable. And web UI will use this API to list and show apps on the UI. If this is enabled, then only SHS owner, admin/view user groups could see the apps list on the SHS's base web UI.

And all the other URLs (web UI and REST API) for application details are controlled by spark.history.ui.acls.enable.

If we configured differently for this two ACLs, then we will get some unexpected behaviors.

So what I here fixed is to unify the ACL and offer the right behavior.

No, there shouldn't be sensitive information there and many times a user is looking for a job run by say a headless user or other user. I guess you could filter only the jobs that user has acls to but that makes it more complicated. Do you have a concrete reason it should be protected? Note that this follow how other Hadoop UI's work.

We worked with customers and they wish to filter and list apps based on the login users. But I don't have strong opinion it. I could change to what you suggested.

@tgravescs
Copy link
Contributor

so we should definitely fix the /api/v1/applications//logs to go through the acls. It looks like it should be protected in ApiRootResource.java. You have the app id so it needs to do something like the withSparkUI to get the acls included in that application.

Like I mentioned the listing (/api/v1/applications) and /api/v1/applications/ (which is same info I believe as listing) were intentionally left open. I don't really see a reason to change that but if other people have a use case for it then perhaps we should make which pages are protected by acls configurable.

on the history server I would expect spark.acls.enable=false and spark.history.ui.acls.enable=true, I can see where that could be confusing, perhaps we should document this better. spark.acls.enable on the history UI really is protecting the root UI, not the app level ui's. We could explicitly turn this off.

@jerryshao
Copy link
Contributor Author

Thanks @tgravescs for your reply.

on the history server I would expect spark.acls.enable=false and spark.history.ui.acls.enable=true, I can see where that could be confusing, perhaps we should document this better. spark.acls.enable on the history UI really is protecting the root UI, not the app level ui's. We could explicitly turn this off.

This could be happened when history server and spark application shares same configuration file. That's why in our internal test the behavior is not expected.

Change-Id: I378c5c4082a7a2567eb108847e9eb637f62fe748
@SparkQA
Copy link

SparkQA commented Apr 19, 2017

Test build #75934 has finished for PR 17582 at commit a8fb1e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -301,6 +301,14 @@ object HistoryServer extends Logging {
logDebug(s"Clearing ${SecurityManager.SPARK_AUTH_CONF}")
config.set(SecurityManager.SPARK_AUTH_CONF, "false")
}

if (config.getBoolean("spark.acls.enable", config.getBoolean("spark.ui.acls.enable", false))) {
logInfo(s"Either spark.acls.enable or spark.ui.acles.enable is configured, clearing it and " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo ("acles")


if (config.getBoolean("spark.acls.enable", config.getBoolean("spark.ui.acls.enable", false))) {
logInfo(s"Either spark.acls.enable or spark.ui.acles.enable is configured, clearing it and " +
s"only honor spark.history.ui.acl.enable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only using ..."

@@ -184,14 +184,27 @@ private[v1] class ApiRootResource extends ApiRequestContext {
@Path("applications/{appId}/logs")
def getEventLogs(
@PathParam("appId") appId: String): EventLogDownloadResource = {
new EventLogDownloadResource(uiRoot, appId, None)
try {
// withSparkUI will throw NotFoundException if attemptId is existed for this application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is existed/exists

@@ -602,6 +606,36 @@ class HistoryServerSuite extends SparkFunSuite with BeforeAndAfter with Matchers
}
}

test("acls with downloading files") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test just be a new URL added to testUrls in "ui and api authorization checks"? What is it doing that is different than what that test does and why?

try {
// withSparkUI will throw NotFoundException if attemptId is existed for this application.
// So we need to try again with attempt id "1".
withSparkUI(appId, None) { _ =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... optimally you wouldn't do this, because loading the UI can be expensive. That would require recording extra information when listing applications and some massaging of internal APIs, so this is probably OK for now, but would be nice to track it separately as an enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. Either we could get application acls in listing applications like what did before.

I assume download is not used very frequently, so workaround probably should be OK.

Change-Id: I037137cd3125547fc408dfc69c73ca9b5d3b1ce5
@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75967 has started for PR 17582 at commit 68c9d83.

Change-Id: Ia45d5aedab8c2a7bd65ccd842d29c5159de53604
@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75970 has started for PR 17582 at commit 4b3781f.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75977 has finished for PR 17582 at commit 4b3781f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Apr 20, 2017

retest this please

@vanzin
Copy link
Contributor

vanzin commented Apr 20, 2017

@jerryshao is the PR description still accurate? It seems you're not really implementing 2 anymore.

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75998 has finished for PR 17582 at commit 4b3781f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Just update the description, please review again @vanzin , thanks!

@tgravescs
Copy link
Contributor

changes lgtm. Did you file a jira to track changing to not use withSparkUI? If user is downloading because the file is huge and takes a long time to render or causes history server to have issue this would hurt that use case. We could wait and see if someone has that use case too.

@jerryshao
Copy link
Contributor Author

Thanks @tgravescs for your comments. Do you think it is a good idea to read out ACLs when mergeApplicationListing in here and keep in applications, so that we don't need to load SparkUI to check ACLs when downloading event logs?

@tgravescs
Copy link
Contributor

As @vanzin said I think this is fine for now to get this fixed quickly, but filing a follow up jira makes sense. Actually this might be good to get into the 2.1.1 release if they are going to spin another rc.

@jerryshao
Copy link
Contributor Author

OK, thanks @tgravescs .

@vanzin
Copy link
Contributor

vanzin commented Apr 25, 2017

LGTM. Merging to master / 2.2, will try 2.1 and 2.0 too.

asfgit pushed a commit that referenced this pull request Apr 25, 2017
## What changes were proposed in this pull request?

Current SHS (Spark History Server) two different ACLs:

* ACL of base URL, it is controlled by "spark.acls.enabled" or "spark.ui.acls.enabled", and with this enabled, only user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed. This will also affect REST APIs which listing the summary of all apps and one app.
* Per application ACL. This is controlled by "spark.history.ui.acls.enabled". With this enabled only history admin user and user/group who ran this app can access the details of this app.

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app.
2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" could download any application's event log, even it is not run by user "A".
3. The changes of Live UI's ACL will affect History UI's ACL which share the same conf file.

The unexpected behaviors is mainly because we have two different ACLs, ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, here in this PR proposed to:

1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" for history server.
2. Check permission for event-log download REST API.

With this PR:

1. Admin user could see/download the list of all applications, as well as application details.
2. Normal user could see the list of all applications, but can only download and check the details of applications accessible to him.

## How was this patch tested?

New UTs are added, also verified in real cluster.

CC tgravescs vanzin please help to review, this PR changes the semantics you did previously. Thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #17582 from jerryshao/SPARK-20239.

(cherry picked from commit 5280d93)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Apr 25, 2017

No luck with 2.1, please file a separate PR.

@asfgit asfgit closed this in 5280d93 Apr 25, 2017
@jerryshao
Copy link
Contributor Author

What about branch 2.0, do we also need to backport to it @vanzin ?

@vanzin
Copy link
Contributor

vanzin commented Apr 25, 2017

It would be good, but maybe the 2.1 backport will merge cleanly to 2.0.

@jerryshao
Copy link
Contributor Author

OK, let me try it, thanks.

jerryshao added a commit to jerryshao/apache-spark that referenced this pull request Apr 25, 2017
Current SHS (Spark History Server) two different ACLs:

* ACL of base URL, it is controlled by "spark.acls.enabled" or "spark.ui.acls.enabled", and with this enabled, only user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed. This will also affect REST APIs which listing the summary of all apps and one app.
* Per application ACL. This is controlled by "spark.history.ui.acls.enabled". With this enabled only history admin user and user/group who ran this app can access the details of this app.

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app.
2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" could download any application's event log, even it is not run by user "A".
3. The changes of Live UI's ACL will affect History UI's ACL which share the same conf file.

The unexpected behaviors is mainly because we have two different ACLs, ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, here in this PR proposed to:

1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" for history server.
2. Check permission for event-log download REST API.

With this PR:

1. Admin user could see/download the list of all applications, as well as application details.
2. Normal user could see the list of all applications, but can only download and check the details of applications accessible to him.

New UTs are added, also verified in real cluster.

CC tgravescs vanzin please help to review, this PR changes the semantics you did previously. Thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#17582 from jerryshao/SPARK-20239.

Change-Id: I65d5d0c5e5a76f08abbe2b7dd43a2e08d295f6b6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants