Skip to content

Conversation

@lustefaniak
Copy link

By default permissions are set to 770, which in some cases is more problematic than relying on umask.

If we are using spark master and driver application in an containerized application, using different users which don't share user group it will cause process to fail.

By default permissions are set to 770, which in some cases is more problematic than relying on umask.

If we are using spark master and driver application in an containerized application, using different users which don't share user group it will cause process to fail.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Mar 12, 2015

CC @andrewor14 and @vanzin . Although it's annoying to add another property (which I guess should be documented, I understand the question here. You'd have to make the files public in this situation; is that desirable?

@vanzin
Copy link
Contributor

vanzin commented Mar 12, 2015

Yeah, the main thing here is that you do not want user B to be able to modify user A's files. I understand your patch doesn't change the default case, but I wonder what's your motivation for not configuring your daemons appropriately instead?

With HDFS in mind, you'd have:

  • user "spark" belonging to group "spark"
  • /event_log_dir permissions 1777 owned by "spark:spark"
  • Master and HistoryServer running as user "spark"
  • Users don't need to belong to group "spark" at all.

Or maybe you're not using HDFS to store the logs?

With this configuration files / directories created under the event log dir will belong to "user:spark" and everything should work as planned. Note this is a little different than what a POSIX fs would do - a POSIX fs would require "5777" permissions for this but HDFS doesn't support that.

@lustefaniak
Copy link
Author

Yes, we are not using HDFS, for file storage we use GlusterFS. One of the requirements of our client was to run every process type with it's own user. This isn't something we would desire as it makes everything more complicated.

@adaszko: can you please comment on that?

@vanzin
Copy link
Contributor

vanzin commented Mar 12, 2015

If GlusterFS supports POSIX semantics, you should be able to set things up as I suggested and use the 3777 permissions for the event log dir (sorry I mistakenly said 5777 before). You can run applications as any user with those, all of them should be able to write to the log directory, and the Spark processes should be able to read them.

@lustefaniak
Copy link
Author

I guess we will try with that and report back if it worked well. Many thanks!

@srowen
Copy link
Member

srowen commented Mar 19, 2015

Just checking to see whether that worked and if so whether we should close this?

@andrewor14
Copy link
Contributor

@lustefaniak were you able to see whether that workaround was sufficient? Is this still an issue, or can we close this PR?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Apr 27, 2015

Do you mind closing this PR?

@huangjs
Copy link

huangjs commented Jun 1, 2015

Want to reopen this as in YARN mode, Spark app will run under user's account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants