Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Bazel inside Ubuntu docker on M1 Mac #13925

Closed
vidheyoza opened this issue Aug 30, 2021 · 21 comments
Closed

Building Bazel inside Ubuntu docker on M1 Mac #13925

vidheyoza opened this issue Aug 30, 2021 · 21 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: support / not a bug (process)

Comments

@vidheyoza
Copy link

vidheyoza commented Aug 30, 2021

Description of the problem / feature request:

Problem building Bazel inside a Ubuntu docker container built for an M1 Mac.

Feature requests: what underlying problem are you trying to solve with this feature?

Docker is designed to be as OS and architecture agnostic as possible while building a container so that development environments are as consistent inside a team as possible, but Docker can only take care of so many aspects of it. I want to build a specific version of MediaPipe that was forked by my team, and so requires Bazel 3.7.2 and Tensorflow 1.14.0 to be installed and running correctly in the container.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  • Build a Docker container from Ubuntu 18.04 (tried amd64, x64 as well as arm64/v8)
  • Download and install Bazel 3.7.2 using the installer.sh script (this part works for x64 as per confirmation from other team members, fails for arm64/v8 and shows no errors for amd64)

What operating system are you running Bazel on?

Ubuntu docker image running on macOS with M1 processor.

What's the output of bazel info release?

No output. Bazel builds but gives error when trying to use it to run bazel build:

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Opening zip "/proc/self/exe": lseek(): Bad file descriptor
[FATAL 21:37:43.679 src/main/cpp/archive_utils.cc:51] Failed to open '/proc/self/exe' as a zip file: (error: 9): Bad file descriptor

Have you found anything relevant by searching the web?

Replace these lines with your answer.

Places to look:

I have looked at repos of Docker for Mac, Tensorflow, Bazel and MediaPipe. There are mentions of it here and there but the solutions are very specific to the issues raised by the users. There is no general solution of making bazel work with Docker images made on M1.

@keith
Copy link
Member

keith commented Aug 30, 2021

Related #11379

@vidheyoza
Copy link
Author

vidheyoza commented Aug 30, 2021

@keith thanks for mentioning that.

I saw that and other issues (#7135, #11628) on this repo, and this one on Tensorflow and this one on MediaPipe.

Though I couldn't use them to figure it out by myself, so any help would be appreciated.

@aiuto
Copy link
Contributor

aiuto commented Sep 5, 2021

Bazel 3.x might not build for M1 at all. Moving a newer version and newer tensorflow might be a more time effective path for you than trying to make this work.

@sudomann
Copy link

sudomann commented Sep 6, 2021

@aiuto have you managed to get newer versions building on m1 macs?

@keith
Copy link
Member

keith commented Sep 6, 2021

4.2.x is the first version to support M1 macs.

@vidheyoza
Copy link
Author

@keith I switched to 4.2.1 on my docker container, but I'm still getting the same error while building mediapipe.

@keith
Copy link
Member

keith commented Sep 8, 2021

Yes I see the same

@philwo
Copy link
Member

philwo commented Sep 21, 2021

What does uname -a print on your Mac and what does it print inside the Docker container that you're running Bazel in?

I would recommend to not use the installer inside the Docker image - just download and put the correct Bazel binary that you need into /usr/local/bin/bazel. The installer does the same thing just with more self-extracting magic around it that makes it harder to understand what's happening.

For x86_64 Linux: Bazel 4.2.1 or Bazel 3.7.2
For arm64 Linux: Bazel 4.2.1 or Bazel 3.7.2

I think this might be a mismatch of accidentally trying to run an x86_64 version of Bazel inside an arm64 container or the other way round.

Note that Docker under macOS uses Linux under the hood, so M1 compatibility of Bazel should not matter here, it's more important to look after Linux arm64 compatibility in this case.

@philwo philwo added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Sep 21, 2021
@vidheyoza
Copy link
Author

vidheyoza commented Sep 22, 2021

@philwo thanks for the help! I am currently using an arm64/v8 Ubuntu image. I think what you mentioned might be the issue, but after I wget the 3.7.2 arm64 release into the /usr/local/bin/bazel directory and try using bazel, it throws the following error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "bazel": executable file not found in $PATH: unknown.
ERRO[0001] error waiting for container: context canceled

Earlier I saw the container trying to find bazel in /usr/local/lib/bazel/bin, so I tried downloading into that directory as well, but that doesn't seem to work too. Since the image is being run as root, I don't think there are any permission issues as well. What do you think could be the issue?

@philwo
Copy link
Member

philwo commented Sep 22, 2021

@vidheyoza Could you try running chmod +x /usr/local/bin/bazel after downloading it there with wget? I think that might fix the error message you've seen.

@vidheyoza
Copy link
Author

vidheyoza commented Sep 22, 2021

Tried but getting the same error.

EDIT: also tried chmod +x /usr/local/bin/bazel/bazel-3.7.2-linux-arm64 but got same error.

@vidheyoza
Copy link
Author

Update: Apparently I was using chmod -x instead of chmod +x. I'm handling errors with installing tensorflow now, but I think I can close this issue.

To recap for anyone in the future: if bazel installation doesn't work for any container that you use, try downloading the binary directly into the container and change permissions to it if needed. It might work much better than the installer script that they have for some (but not all) configurations of OS and architecture.

@philwo
Copy link
Member

philwo commented Sep 24, 2021

Excellent, thanks for posting the solution here @vidheyoza! 😊

@vidheyoza
Copy link
Author

@philwo circling back to this issue, I need Bazel 0.26.2 or lower to install TF 1.14, but it looks like there is no arm64 build for it like the one you mentioned above. Do you think there's an alternative for this?

@RahulRachuri
Copy link

Hey, I'm also on an M1 Mac (12.0.1), and trying to build bazel inside Ubuntu 20.04 (x86_64) using docker. I could get it working for Ubuntu arm64 using docker, but I need x86_64 for something I'm doing.

I tried installing version 4.2.1 via the installer script, and by downloading the binary directly. Either way, I get the following error message:

Opening zip "/proc/self/exe": lseek(): Bad file descriptor
FATAL: Failed to open '/proc/self/exe' as a zip file: (error: 9): Bad file descriptor

Could you please help me with this?

@philwo
Copy link
Member

philwo commented Nov 15, 2021

Hi @RahulRachuri, can you confirm by running and posting the output of uname -a inside your Docker container that your Docker on your M1 Mac is actually running as x86_64?

@RahulRachuri
Copy link

RahulRachuri commented Nov 15, 2021

Sure, this is the output:
Linux 7411fb3dcdb4 5.10.47-linuxkit #1 SMP PREEMPT Sat Jul 3 21:50:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@philwo
Copy link
Member

philwo commented Nov 15, 2021

Mhm.. yeah, looks like that should work. :/

I'm sorry, I really don't know what might cause /proc/self/exe to not work in this specific environment. It might be due to the way Docker on Apple Silicon implements the support for x86_64 containers (maybe using binfmt_misc?) which makes /proc/self/exe not readable or at least seekable for emulated binaries.

Unfortunately I can no longer use Docker on my Mac since their license change, so I'm not able to reproduce or debug this. Maybe you could raise this as a bug with Docker?

@RahulRachuri
Copy link

I see, that makes sense.

I will try raising a bug with Docker, thanks for your help! :)

@emidln
Copy link

emidln commented Dec 8, 2021

I fixed this in #14391

The gist is that when running a non-native binary from Docker, docker uses linux's facility for registering handlers for non-native-elf binaries called binfmt_misc to call qemu-user-static to actually execute the non-native binary. When this happens, /proc/self/exe from the Kernel's view is qemu-user-static (and also lives outside the filesystem the container can see). lseek'ing to find the size of this file will fail. The fix is to use a kernel/elf api to get the executing program name.

My patch originally tried to just use the AT_EXECFD from the auxv at a lower level, but I suspect that AT_EXECFD also has problems under this mode of execution. Luckily, AT_EXECFN gives us what we're looking for inside docker on alternate platforms as well as when running natively. It should be more reliable than using argv[0] as well.

@zmk-punchbowl
Copy link

zmk-punchbowl commented Mar 24, 2022

Is your fix usable yet @emidln ? I commented there that I'm running into this issue when trying to cross-compile tensorflow using bazel with an M1 host and an x86_64 container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests

8 participants