Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Fix sandbox permission errors with Mesos 1.6.0 #28

Merged
merged 4 commits into from Sep 5, 2018

Conversation

StephanErb
Copy link
Member

This change makes Aurora compatible with Mesos 1.6.0 which has changed the permission of sandbox folders (see https://issues.apache.org/jira/browse/MESOS-8332).

Please have a look at the individual commits for further details and descriptions.

@DavidMcLaughlin
Copy link
Contributor

Does this degrade gracefully to older versions of Mesos?

This simplifies debugging of Thermos runner issues significantly, as
there is just one relevant log file to look at.

For further details, please see the prior discussion at
https://reviews.apache.org/r/58609/
For example, rather than just a "Permission denied" one gets a full
stacktrace of which access or operation caused the issue.
Rather than taking "/path/to/mesos_task_dir/sandbox" as a constructor
argument, the sandbox objects are now passed just "/path/to/mesos_task_dir"
as this simplifies a few operations that need the folder created by
Mesos.

The "sandbox" subfolder is still created with the same name, but this is
now only handled internally. There is no user-visible behaviour change.
Mesos 1.6.0 creates sandboxes with permissions 750 rather than 755. This
breaks the assumption of Thermos that non-privileged processes can still
read the sandbox content.

This change is necessary even when container images are used via the
Mesos containerizer, as certain processes such as the stdout log
rotation continue to run outside of the image.
@StephanErb
Copy link
Member Author

Given our default Aurora and Thermos setup where the executor is launched as root:

  • In Mesos < 1.6:

    • /var/lib/mesos/slaves/x/frameworks/y/executors/z/runs/latest/ is owned by root
    • permissions are 755
  • In Mesos >= 1.6:

    • /var/lib/mesos/slaves/x/frameworks/y/executors/z/runs/latest/ is owned by root
    • permissions are now 750

With this change, we now let Thermos change the owner of /var/lib/mesos/slaves/x/frameworks/y/executors/z/runs/latest/ to role but let the permissions unchanged. This matches the intend of Mesos that the sandbox should be read/writeable by the enduser processes and by nobody else.

I have verified that this works in 1.5 and 1.6 by running the end-to-end tests with both versions. I therefore think that this is degrades gracefully as expected.

Copy link
Contributor

@ridv ridv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix!

@ridv
Copy link
Contributor

ridv commented Sep 5, 2018

Giving notice to other PMC members in the interest of getting 0.21.0 out the door: Given that this PR has been open for 27 days and is needed for our release, I'm calling for lazy consensus. If no one reviews or opposes this patch by Friday, September 7th 2018 at 12 PM San Francisco time, I'll be merging it and cutting 0.21.0-RC1.

Copy link
Contributor

@shanmugh shanmugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for testing backward compatibility.

@ridv ridv merged commit 3277679 into apache:master Sep 5, 2018
@ridv ridv added this to the 0.22.0 milestone Oct 22, 2019
@ridv ridv added the 0.22.0 Belongs to the 0.22.0 release label Oct 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
0.22.0 Belongs to the 0.22.0 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants