Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Update QEMU to support emulating AVX instructions on ARM64 hosts #6620

Closed
2 tasks done
fumoboy007 opened this issue Dec 6, 2022 · 35 comments
Closed
2 tasks done

Comments

@fumoboy007
Copy link

fumoboy007 commented Dec 6, 2022

  • I have tried with the latest version of Docker Desktop
  • I have tried disabling enabled experimental features

Expected behavior

AMD64 images that use AVX instructions are able to run on ARM64 hosts.

Actual behavior

#5148

Information

AVX support was recently added to QEMU. I believe Docker needs to update its QEMU version to pull in this functionality?

@fumoboy007
Copy link
Author

Oops, the QEMU functionality in QEMU 7.2 hasn’t been released yet. 😅

@fumoboy007
Copy link
Author

QEMU 7.2 has been released.

@Enzo90910
Copy link

Very important issue for my team.

@fumoboy007
Copy link
Author

The latest Docker for Mac release is apparently still using QEMU 6.2.0:

# /containers/services/binfmt/rootfs/usr/bin/qemu-x86_64 --version
qemu-x86_64 version 6.2.0 (v6.2.0)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Looks like QEMU was upgraded to 7.0.0 in Docker Desktop 4.13.0 but was downgraded to 6.2.0 in Docker Desktop 4.13.1 due to some other issue.

Is anyone working on trying again to upgrade QEMU? 🥺

@fedejinich
Copy link

fedejinich commented Feb 15, 2023

any news on this? I'm struggling with the avx2 instruction set

@fumoboy007
Copy link
Author

any news on this? I'm struggling with the avx2 instruction set

^ @stephen-turner who previously commented on #5148.

@Enzo90910
Copy link

Negative news: I tested the recent Rosetta 2 support in Docker Desktop but Rosetta 2 does not seem to support AVX either.

@GoingOnSun
Copy link

tried on Debian 11. rocket.chat container exiting (132)

@qiangli
Copy link

qiangli commented Apr 27, 2023

Until this is fixed and docker is upgraded to use qemu 7.2+ (latest is 8.0.0), one could try run qemu/colima directly. you will still build and run as usual after stopping docker and having started colima.
This works for me for my projects.

brew install qemu
brew install colima
colima start --arch x86_64 --cpu 8 --memory 24 --disk 128 --cpu-type Broadwell-v4

for other cpu models: https://qemu.readthedocs.io/en/latest/system/qemu-cpu-models.html

I would not hold my breath for Rosetta support, it's not going to happen. https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment

@glynjackson
Copy link

Any update on this issue? With Apple Silicon now being a staple in a lot of engineering departments were facing the same issues here.

@ggilley
Copy link

ggilley commented Jul 6, 2023

Any update here? Pain ongoing...

@yutotakano
Copy link

Apple Silicon combined with QEMU above 7.0 causes a regression where the syscall prctl(PR_SET_CHILD_SUBREAPER, 1) will return Invalid Argument, which some applications rely on not to happen (e.g. astro-cli, spark-on-k8s-operator, cinit).

Specifically, whereas QEMU 6.2 and under passed on the syscall without modification, QEMU 7.0 and above disables it and put a comment saying "TODO to implement a safe pass-through for it". https://gitlab.com/qemu-project/qemu/-/commit/220717a6f46a99031a5b1af964bbf4dec1310440

And it's still not implemented to this day, which means nothing above QEMU 6.2 will work for those applications. Until that is fixed, I think a QEMU update will cause unexpected regressions to Docker users.

@fumoboy007
Copy link
Author

Great find, @yutotakano!

Apple Silicon combined with QEMU above 7.0 causes a regression

Dumb question: Is this issue specific to Apple Silicon? At first glance, the commit you linked doesn’t seem to depend on architecture?

And it's still not implemented to this day

Do you know if there is a QEMU ticket tracking this issue? If not, I think we should create one so that the QEMU developers don’t forget about it!

@yutotakano
Copy link

yutotakano commented Aug 26, 2023

Dumb question: Is this issue specific to Apple Silicon? At first glance, the commit you linked doesn’t seem to depend on architecture?

Hmm. I'm certainly on an Apple Silicon so I decided to keep my assumptions small. Perhaps it's on all devices as long as you use QEMU to emulate Linux. But would Docker use QEMU if it's running an x86 container on Intel x86?

@Shigerello
Copy link

Shigerello commented Sep 13, 2023

Related but somewhat off-topic, because MongoDB 5.0 and later relies on the AVX instruction set, among other tools mongosh crashes on a QEMU-emulated x86 container running on Docker Desktop for Mac with Apple silicon.
This incompatibility has a bad effect on containerized development & testing environment setup.

Side note:
If the said container is rebuild for ARM architecture, mongosh starts to work just fine.

https://www.mongodb.com/docs/v7.0/administration/production-notes/#x86_64

MongoDB 5.0 requires use of the AVX instruction set, available on
select Intel and AMD processors.

@nielspardon
Copy link

Do you know if there is a QEMU ticket tracking this issue? If not, I think we should create one so that the QEMU developers don’t forget about it!

I took the liberty of creating an issue in the QEMU tracker since I did not find an existing one: https://gitlab.com/qemu-project/qemu/-/issues/1929

@dgageot
Copy link
Member

dgageot commented Nov 25, 2023

Hello everyone, we're updating QEMU in the upcoming version of Docker Desktop.
Have you tested a version that would suite your needs?

@fumoboy007
Copy link
Author

@dgageot Good news! I have not tested but in theory, QEMU 7.2 or above should resolve this issue.

@dgageot
Copy link
Member

dgageot commented Nov 25, 2023

@dgageot Good news! I have not tested but in theory, QEMU 7.2 or above should resolve this issue.

Thank you @fumoboy007.
Probably Docker Desktop 4.26.0 will contain a more recent QEMU but not 7.2 yet.
But I'll still do my best to fit it in and if it doesn't work, I'll target 4.27.0.

@dgageot
Copy link
Member

dgageot commented Nov 30, 2023

@fumoboy007 Sorry, that'll have to wait for 4.27.0.

@dgageot
Copy link
Member

dgageot commented Dec 1, 2023

@fumoboy007 do you have an example of a docker command that fails with the latest version of Docker Desktop?

@fumoboy007
Copy link
Author

@dgageot One Docker image that is affected by this issue is tensorflow/serving. tensorflow/serving#1948 (comment) has reproduction steps.

@juanmirocks
Copy link

@dgageot it would be awesome seeing it coming for 4.27.0 or early 2024 :-) I'm also blocked by tensorflow/serving#1948

Do you have any updates?

@dgageot
Copy link
Member

dgageot commented Jan 4, 2024

@dgageot it would be awesome seeing it coming for 4.27.0 or early 2024 :-) I'm also blocked by tensorflow/serving#1948

Do you have any updates?

We currently have a QEMU 8.0.4 on our main branch, with a patch for the prctl(PR_SET_CHILD_SUBREAPER, 1) issue. So the good news is that we are not stuck with a very old version of QEMU anymore and 4.27.0 will at least contains this version of qemu.

Also, this morning, I've started testing 8.1.4 with the plan to soon test 8.2.0.

I mainly focused on using the most recent versions of qemu. I didn't test the support for AVX, yet. Do you have a simple docker run command I can try to validate that it does what you want?

@dgageot
Copy link
Member

dgageot commented Jan 4, 2024

Here are the commands that fail with Docker Desktop 4.26.1 but succeed on our main branch, with Qemu 8.0.4:

cd /tmp
git clone https://github.com/tensorflow/serving
docker run -t --rm -it --init -p 8501:8501 --platform linux/amd64 -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1 -v "./serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu:/models/half_plus_two" -e MODEL_NAME=half_plus_two tensorflow/serving:2.14.1

Although, from the output of the program, I'm not sure it does actually use AVX instructions:

2024-01-04 12:11:33.524895: I tensorflow_serving/model_servers/server.cc:74] Building single TensorFlow model file config:  model_name: half_plus_two model_base_path: /models/half_plus_two
2024-01-04 12:11:33.542335: I tensorflow_serving/model_servers/server_core.cc:467] Adding/updating models.
2024-01-04 12:11:33.544671: I tensorflow_serving/model_servers/server_core.cc:596]  (Re-)adding model: half_plus_two
2024-01-04 12:11:33.926545: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2024-01-04 12:11:33.926776: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2024-01-04 12:11:33.927587: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2024-01-04 12:11:33.929071: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /models/half_plus_two/00000123
2024-01-04 12:11:33.936173: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-01-04 12:11:33.936641: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2024-01-04 12:11:33.939478: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 12:11:34.040351: I external/org_tensorflow/tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-01-04 12:11:34.056903: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-01-04 12:11:34.067407: W external/org_tensorflow/tensorflow/tsl/platform/profile_utils/cpu_utils.cc:118] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2024-01-04 12:11:34.279405: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2024-01-04 12:11:34.299615: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 370421 microseconds.
2024-01-04 12:11:34.301360: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:80] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2024-01-04 12:11:34.547284: I tensorflow_serving/core/loader_harness.cc:95] Successfully loaded servable version {name: half_plus_two version: 123}
2024-01-04 12:11:34.554644: I tensorflow_serving/model_servers/server_core.cc:488] Finished adding/updating models
2024-01-04 12:11:34.556354: I tensorflow_serving/model_servers/server.cc:118] Using InsecureServerCredentials
2024-01-04 12:11:34.556922: I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled
2024-01-04 12:11:34.573646: I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
2024-01-04 12:11:34.585490: I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:8501 ...

Tip:

-e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1 is a useful (experimental) trick to force the usage of QEMU just for one docker run command, even though Docker Desktop is configured to use Rosetta, for faster overall emulation.

@juanmirocks
Copy link

@dgageot that's so great to hear!

Exactly, running your command on Docker Desktop v4.26.1 for mac, on apple silicon, without -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1, abruptly ends with the error:

/usr/bin/tf_serving_entrypoint.sh: line 3:    12 Illegal instruction     tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Similarly, running the command with -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1, ends with the error:

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:560] Invalid file descriptor data passed to EncodedDescriptorDatabase::Add().
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
qemu: uncaught target signal 6 (Aborted) - core dumped
/usr/bin/tf_serving_entrypoint.sh: line 3:    12 Aborted                 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

@matemijolovic
Copy link

Hello, I've tried the recently released Docker for Mac 4.27.0 and AVX seems to work now on ARM64 🎊

@dgageot
Copy link
Member

dgageot commented Jan 26, 2024

Woot! Thanks @matemijolovic, that's really good news! Is it fully working? How's the perf?

@matemijolovic
Copy link

matemijolovic commented Jan 26, 2024

Didn't have time to benchmark the performance, but seems okay at a first glance (I'd say roughly ~2x slower than running on comparable Linux x64 machine). For our usecase this is perfectly acceptable, as we don't run any production inference on ARMs. [EDIT: to clarify, regarding performance, I'm not sure that AVX is actually being used in its full potential, but for us the important thing is that the containers don't crash]
Probably it would also help to compile linux/arm64 TF Serving images, currently there are only linux/amd64 ones so it's unfair to do benchmarks :)

The only issue I observed is that SIGINT isn't propagated correctly (can't stop a container with CTRL+C), but can't say for sure if it's related to the particular upgrade. [EDIT, as dgageot suggested, docker run --init flag helps with this]

@dgageot
Copy link
Member

dgageot commented Jan 26, 2024

Didn't have time to benchmark the performance, but seems okay at a first glance (I'd say roughly ~2x slower than running on comparable Linux x64 machine). For our usecase this is perfectly acceptable, as we don't run any production inference on ARMs.

Good to hear!

The only issue I observed is that SIGINT isn't propagated correctly (can't stop a container with CTRL+C), but can't say for sure if it's related to the particular upgrade.

Have you tried running the container with docker run --init?

@dgageot
Copy link
Member

dgageot commented Jan 26, 2024

(I'm, closing this issue. Feel free to ping me if you think it needs to be re-opened)

@dgageot dgageot closed this as completed Jan 26, 2024
@matemijolovic
Copy link

Have you tried running the container with docker run --init?

Can confirm this helps, thank you!

@dgageot
Copy link
Member

dgageot commented Feb 15, 2024

Hi everyone! There's a good chance that we rollback the qemu upgrade in Docker Desktop 4.28.0. It has too many regressions for the majority of users. A temporary solution will be for you to stick with 4.27.X.

@juanmirocks
Copy link

That's unfortunate but thank you so much @dgageot for the heads up!

@bonzini
Copy link

bonzini commented Feb 15, 2024

We're the regressions reported on Gitlab? Also what patch release are you on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests