-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
I keep running into issues trying to run bring your own container (BYOC) images, they either work in local model, or they work in "cloud mode" but not both. The issue is almost certainly docker permission problems but I cannot see a way of resolving.
To reproduce
I have a docker image
FROM continuumio/miniconda3
WORKDIR /opt/ml/code/
COPY src/ /opt/ml/code/
ENTRYPOINT ["bash", "/opt/ml/code/processor.sh"]
The processor.sh creates a conda venv, installs packages into it then runs conda pack with the output set to /opt/ml/processing/output/pyenv.tar.gz
In local mode this step fails on this line
| sagemaker_session.upload_data(source, bucket, path) |
Because the container user (root) writes the file with permissions that are too restrictive. Note there's a "fix" on line 99 for a similar issue but it doesn't help.
I tried to work around this by adding a non-root (gid/pid 1000) user to the container. This "fixed" the local issue - but now the contain fails when I run it normally. It appears to be the opposite issue, the container user doesn't have access to the volume /opt/ml/processing/output/ so the script fails.
Expected behavior
Need to be able to run pipelines in local and sagemaker mode without permission errors. It seems that in local mode, scripts need to run with the same pid as the host user, but in sagemaker mode they need to run as root? Is there some way they can work in both?