Request: Update QEMU to support emulating AVX instructions on ARM64 hosts #6620

fumoboy007 · 2022-12-06T04:23:05Z

I have tried with the latest version of Docker Desktop
I have tried disabling enabled experimental features

Expected behavior

AMD64 images that use AVX instructions are able to run on ARM64 hosts.

Actual behavior

Information

AVX support was recently added to QEMU. I believe Docker needs to update its QEMU version to pull in this functionality?

fumoboy007 · 2022-12-06T04:25:10Z

Oops, the QEMU functionality in QEMU 7.2 hasn’t been released yet. 😅

fumoboy007 · 2022-12-15T08:56:56Z

QEMU 7.2 has been released.

Enzo90910 · 2023-01-10T17:56:50Z

Very important issue for my team.

fumoboy007 · 2023-01-21T06:09:07Z

The latest Docker for Mac release is apparently still using QEMU 6.2.0:

# /containers/services/binfmt/rootfs/usr/bin/qemu-x86_64 --version
qemu-x86_64 version 6.2.0 (v6.2.0)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Looks like QEMU was upgraded to 7.0.0 in Docker Desktop 4.13.0 but was downgraded to 6.2.0 in Docker Desktop 4.13.1 due to some other issue.

Is anyone working on trying again to upgrade QEMU? 🥺

fedejinich · 2023-02-15T00:37:54Z

any news on this? I'm struggling with the avx2 instruction set

fumoboy007 · 2023-02-15T04:45:55Z

any news on this? I'm struggling with the avx2 instruction set

^ @stephen-turner who previously commented on #5148.

Enzo90910 · 2023-02-15T11:24:08Z

Negative news: I tested the recent Rosetta 2 support in Docker Desktop but Rosetta 2 does not seem to support AVX either.

GoingOnSun · 2023-04-26T12:21:41Z

tried on Debian 11. rocket.chat container exiting (132)

qiangli · 2023-04-27T00:35:47Z

Until this is fixed and docker is upgraded to use qemu 7.2+ (latest is 8.0.0), one could try run qemu/colima directly. you will still build and run as usual after stopping docker and having started colima.
This works for me for my projects.

brew install qemu
brew install colima
colima start --arch x86_64 --cpu 8 --memory 24 --disk 128 --cpu-type Broadwell-v4

for other cpu models: https://qemu.readthedocs.io/en/latest/system/qemu-cpu-models.html

I would not hold my breath for Rosetta support, it's not going to happen. https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment

glynjackson · 2023-05-03T20:52:13Z

Any update on this issue? With Apple Silicon now being a staple in a lot of engineering departments were facing the same issues here.

ggilley · 2023-07-06T16:44:40Z

Any update here? Pain ongoing...

yutotakano · 2023-08-26T15:05:07Z

Apple Silicon combined with QEMU above 7.0 causes a regression where the syscall prctl(PR_SET_CHILD_SUBREAPER, 1) will return Invalid Argument, which some applications rely on not to happen (e.g. astro-cli, spark-on-k8s-operator, cinit).

Specifically, whereas QEMU 6.2 and under passed on the syscall without modification, QEMU 7.0 and above disables it and put a comment saying "TODO to implement a safe pass-through for it". https://gitlab.com/qemu-project/qemu/-/commit/220717a6f46a99031a5b1af964bbf4dec1310440

And it's still not implemented to this day, which means nothing above QEMU 6.2 will work for those applications. Until that is fixed, I think a QEMU update will cause unexpected regressions to Docker users.

fumoboy007 · 2023-08-26T17:39:02Z

Great find, @yutotakano!

Apple Silicon combined with QEMU above 7.0 causes a regression

Dumb question: Is this issue specific to Apple Silicon? At first glance, the commit you linked doesn’t seem to depend on architecture?

And it's still not implemented to this day

Do you know if there is a QEMU ticket tracking this issue? If not, I think we should create one so that the QEMU developers don’t forget about it!

yutotakano · 2023-08-26T17:42:07Z

Dumb question: Is this issue specific to Apple Silicon? At first glance, the commit you linked doesn’t seem to depend on architecture?

Hmm. I'm certainly on an Apple Silicon so I decided to keep my assumptions small. Perhaps it's on all devices as long as you use QEMU to emulate Linux. But would Docker use QEMU if it's running an x86 container on Intel x86?

Shigerello · 2023-09-13T03:27:03Z

Related but somewhat off-topic, because MongoDB 5.0 and later relies on the AVX instruction set, among other tools mongosh crashes on a QEMU-emulated x86 container running on Docker Desktop for Mac with Apple silicon.
This incompatibility has a bad effect on containerized development & testing environment setup.

Side note:
If the said container is rebuild for ARM architecture, mongosh starts to work just fine.

https://www.mongodb.com/docs/v7.0/administration/production-notes/#x86_64

MongoDB 5.0 requires use of the AVX instruction set, available on
select Intel and AMD processors.

nielspardon · 2023-10-11T08:18:51Z

Do you know if there is a QEMU ticket tracking this issue? If not, I think we should create one so that the QEMU developers don’t forget about it!

I took the liberty of creating an issue in the QEMU tracker since I did not find an existing one: https://gitlab.com/qemu-project/qemu/-/issues/1929

dgageot · 2023-11-25T07:33:57Z

Hello everyone, we're updating QEMU in the upcoming version of Docker Desktop.
Have you tested a version that would suite your needs?

fumoboy007 · 2023-11-25T08:14:30Z

@dgageot Good news! I have not tested but in theory, QEMU 7.2 or above should resolve this issue.

dgageot · 2023-11-25T09:32:43Z

@dgageot Good news! I have not tested but in theory, QEMU 7.2 or above should resolve this issue.

Thank you @fumoboy007.
Probably Docker Desktop 4.26.0 will contain a more recent QEMU but not 7.2 yet.
But I'll still do my best to fit it in and if it doesn't work, I'll target 4.27.0.

dgageot · 2023-11-30T10:24:51Z

@fumoboy007 Sorry, that'll have to wait for 4.27.0.

dgageot · 2023-12-01T13:12:46Z

@fumoboy007 do you have an example of a docker command that fails with the latest version of Docker Desktop?

fumoboy007 · 2023-12-04T05:55:07Z

@dgageot One Docker image that is affected by this issue is tensorflow/serving. tensorflow/serving#1948 (comment) has reproduction steps.

juanmirocks · 2024-01-04T10:24:07Z

@dgageot it would be awesome seeing it coming for 4.27.0 or early 2024 :-) I'm also blocked by tensorflow/serving#1948

Do you have any updates?

dgageot · 2024-01-04T10:32:52Z

@dgageot it would be awesome seeing it coming for 4.27.0 or early 2024 :-) I'm also blocked by tensorflow/serving#1948

Do you have any updates?

We currently have a QEMU 8.0.4 on our main branch, with a patch for the prctl(PR_SET_CHILD_SUBREAPER, 1) issue. So the good news is that we are not stuck with a very old version of QEMU anymore and 4.27.0 will at least contains this version of qemu.

Also, this morning, I've started testing 8.1.4 with the plan to soon test 8.2.0.

I mainly focused on using the most recent versions of qemu. I didn't test the support for AVX, yet. Do you have a simple docker run command I can try to validate that it does what you want?

dgageot · 2024-01-04T12:18:36Z

Here are the commands that fail with Docker Desktop 4.26.1 but succeed on our main branch, with Qemu 8.0.4:

cd /tmp
git clone https://github.com/tensorflow/serving
docker run -t --rm -it --init -p 8501:8501 --platform linux/amd64 -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1 -v "./serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu:/models/half_plus_two" -e MODEL_NAME=half_plus_two tensorflow/serving:2.14.1

Although, from the output of the program, I'm not sure it does actually use AVX instructions:

2024-01-04 12:11:33.524895: I tensorflow_serving/model_servers/server.cc:74] Building single TensorFlow model file config:  model_name: half_plus_two model_base_path: /models/half_plus_two
2024-01-04 12:11:33.542335: I tensorflow_serving/model_servers/server_core.cc:467] Adding/updating models.
2024-01-04 12:11:33.544671: I tensorflow_serving/model_servers/server_core.cc:596]  (Re-)adding model: half_plus_two
2024-01-04 12:11:33.926545: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2024-01-04 12:11:33.926776: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2024-01-04 12:11:33.927587: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2024-01-04 12:11:33.929071: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /models/half_plus_two/00000123
2024-01-04 12:11:33.936173: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-01-04 12:11:33.936641: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2024-01-04 12:11:33.939478: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 12:11:34.040351: I external/org_tensorflow/tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-01-04 12:11:34.056903: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-01-04 12:11:34.067407: W external/org_tensorflow/tensorflow/tsl/platform/profile_utils/cpu_utils.cc:118] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2024-01-04 12:11:34.279405: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2024-01-04 12:11:34.299615: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 370421 microseconds.
2024-01-04 12:11:34.301360: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:80] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2024-01-04 12:11:34.547284: I tensorflow_serving/core/loader_harness.cc:95] Successfully loaded servable version {name: half_plus_two version: 123}
2024-01-04 12:11:34.554644: I tensorflow_serving/model_servers/server_core.cc:488] Finished adding/updating models
2024-01-04 12:11:34.556354: I tensorflow_serving/model_servers/server.cc:118] Using InsecureServerCredentials
2024-01-04 12:11:34.556922: I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled
2024-01-04 12:11:34.573646: I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
2024-01-04 12:11:34.585490: I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:8501 ...

Tip:

-e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1 is a useful (experimental) trick to force the usage of QEMU just for one docker run command, even though Docker Desktop is configured to use Rosetta, for faster overall emulation.

juanmirocks · 2024-01-04T18:53:29Z

@dgageot that's so great to hear!

Exactly, running your command on Docker Desktop v4.26.1 for mac, on apple silicon, without -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1, abruptly ends with the error:

/usr/bin/tf_serving_entrypoint.sh: line 3:    12 Illegal instruction     tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Similarly, running the command with -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1, ends with the error:

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:560] Invalid file descriptor data passed to EncodedDescriptorDatabase::Add().
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
qemu: uncaught target signal 6 (Aborted) - core dumped
/usr/bin/tf_serving_entrypoint.sh: line 3:    12 Aborted                 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

matemijolovic · 2024-01-26T14:45:05Z

Hello, I've tried the recently released Docker for Mac 4.27.0 and AVX seems to work now on ARM64 🎊

dgageot · 2024-01-26T14:50:54Z

Woot! Thanks @matemijolovic, that's really good news! Is it fully working? How's the perf?

matemijolovic · 2024-01-26T15:04:53Z

Didn't have time to benchmark the performance, but seems okay at a first glance (I'd say roughly ~2x slower than running on comparable Linux x64 machine). For our usecase this is perfectly acceptable, as we don't run any production inference on ARMs. [EDIT: to clarify, regarding performance, I'm not sure that AVX is actually being used in its full potential, but for us the important thing is that the containers don't crash]
Probably it would also help to compile linux/arm64 TF Serving images, currently there are only linux/amd64 ones so it's unfair to do benchmarks :)

The only issue I observed is that SIGINT isn't propagated correctly (can't stop a container with CTRL+C), but can't say for sure if it's related to the particular upgrade. [EDIT, as dgageot suggested, docker run --init flag helps with this]

dgageot · 2024-01-26T15:24:22Z

Didn't have time to benchmark the performance, but seems okay at a first glance (I'd say roughly ~2x slower than running on comparable Linux x64 machine). For our usecase this is perfectly acceptable, as we don't run any production inference on ARMs.

Good to hear!

The only issue I observed is that SIGINT isn't propagated correctly (can't stop a container with CTRL+C), but can't say for sure if it's related to the particular upgrade.

Have you tried running the container with docker run --init?

dgageot · 2024-01-26T15:36:36Z

(I'm, closing this issue. Feel free to ping me if you think it needs to be re-opened)

matemijolovic · 2024-01-29T08:07:59Z

Have you tried running the container with docker run --init?

Can confirm this helps, thank you!

dgageot · 2024-02-15T08:50:44Z

Hi everyone! There's a good chance that we rollback the qemu upgrade in Docker Desktop 4.28.0. It has too many regressions for the majority of users. A temporary solution will be for you to stick with 4.27.X.

juanmirocks · 2024-02-15T10:26:15Z

That's unfortunate but thank you so much @dgageot for the heads up!

bonzini · 2024-02-15T10:37:58Z

We're the regressions reported on Gitlab? Also what patch release are you on?

fumoboy007 mentioned this issue Dec 6, 2022

tensorflow-serving docker container doesn't work on Macs with Apple M1 chips. tensorflow/serving#1948

Open

hasezoey mentioned this issue Jan 16, 2023

MongoMemoryServer.create hanging in Docker nodkz/mongodb-memory-server#710

Closed

gjmulder mentioned this issue Mar 14, 2023

Build on Debian Docker ggerganov/llama.cpp#108

Closed

bsousaa added the status/2-in-progress label Jul 7, 2023

BjarneHerland mentioned this issue Sep 25, 2023

tflite-model-maker cannot be installed correctly for different reasons on different configurations tensorflow/tensorflow#61719

Open

nielspardon mentioned this issue Oct 10, 2023

PR_SET_CHILD_SUBREAPER is unavailable on this platform kubeflow/spark-operator#1735

Open

bsousaa added the area/emulation label Jan 4, 2024

dgageot closed this as completed Jan 26, 2024

borodiliz mentioned this issue Feb 1, 2024

[bitnami/mongodb] Mongodb doesn't run on M1 Mac bitnami/containers#40947

Open

NMikle mentioned this issue Feb 25, 2024

[bitnami/mongodb] Mongodb Chart arm64 Support bitnami/charts#3635

Open

Request: Update QEMU to support emulating AVX instructions on ARM64 hosts #6620

Request: Update QEMU to support emulating AVX instructions on ARM64 hosts #6620

Comments

fumoboy007 commented Dec 6, 2022 • edited

Expected behavior

Actual behavior

Information

fumoboy007 commented Dec 6, 2022

fumoboy007 commented Dec 15, 2022

Enzo90910 commented Jan 10, 2023

fumoboy007 commented Jan 21, 2023

fedejinich commented Feb 15, 2023 • edited

fumoboy007 commented Feb 15, 2023

Enzo90910 commented Feb 15, 2023

GoingOnSun commented Apr 26, 2023

qiangli commented Apr 27, 2023 • edited

glynjackson commented May 3, 2023

ggilley commented Jul 6, 2023

yutotakano commented Aug 26, 2023

fumoboy007 commented Aug 26, 2023

yutotakano commented Aug 26, 2023 • edited

Shigerello commented Sep 13, 2023 • edited

nielspardon commented Oct 11, 2023

dgageot commented Nov 25, 2023

fumoboy007 commented Nov 25, 2023

dgageot commented Nov 25, 2023

dgageot commented Nov 30, 2023

dgageot commented Dec 1, 2023

fumoboy007 commented Dec 4, 2023

juanmirocks commented Jan 4, 2024

dgageot commented Jan 4, 2024

dgageot commented Jan 4, 2024

Tip:

juanmirocks commented Jan 4, 2024

matemijolovic commented Jan 26, 2024

dgageot commented Jan 26, 2024

matemijolovic commented Jan 26, 2024 • edited

dgageot commented Jan 26, 2024

dgageot commented Jan 26, 2024

matemijolovic commented Jan 29, 2024

dgageot commented Feb 15, 2024

juanmirocks commented Feb 15, 2024

bonzini commented Feb 15, 2024

fumoboy007 commented Dec 6, 2022 •

edited

fedejinich commented Feb 15, 2023 •

edited

qiangli commented Apr 27, 2023 •

edited

yutotakano commented Aug 26, 2023 •

edited

Shigerello commented Sep 13, 2023 •

edited

matemijolovic commented Jan 26, 2024 •

edited