Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker buildx run yum install very slow, while docker build is fine #379

Open
light4 opened this issue Sep 3, 2020 · 17 comments
Open

docker buildx run yum install very slow, while docker build is fine #379

light4 opened this issue Sep 3, 2020 · 17 comments

Comments

@light4
Copy link

light4 commented Sep 3, 2020

docker buildx run yum install very slow, while docker build is fine

build with these commands bellow, docker build takes about 3min to finish, while docker buildx build takes about 3 hours.

sudo docker build -f centos7.Dockerfile .
sudo docker buildx build -f centos7.Dockerfile --target=artifact --output type=local,dest=$(pwd)/rpms/ .

I think docker buildx should use docker build caches, but it's not. This is unexpected.

centos7.Dockerfile

FROM centos:centos7 as build
LABEL maintainer="opsdev@qunar.com"

RUN yum install -y yum-utils rpm-build redhat-rpm-config make gcc git vi tar unzip rpmlint wget curl \
    && yum clean all

# 安装 golang
RUN PKG_VERSION="1.15.1" PKG_NAME="go$PKG_VERSION.linux-amd64.tar.gz" \
    && wget https://dl.google.com/go/$PKG_NAME \
    && tar -zxvf $PKG_NAME -C /usr/local \
    && rm -rf $PKG_NAME

ENV GOROOT=/usr/local/go
ENV GOPATH=/home/q/go
ENV PATH=$PATH:/usr/local/go/bin:/home/q/go/bin
ENV GOPROXY=https://goproxy.io

# RUN useradd q -u 5002 -g users -p q
# USER q
ENV HOME /home/q
WORKDIR /home/q
RUN mkdir -p /home/q/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
RUN echo '%_topdir %{getenv:HOME}/rpmbuild' > /home/q/.rpmmacros

COPY spec/q-agentd.spec q-agentd.spec
COPY scripts/q-agentd.service /home/q/rpmbuild/SOURCES/q-agentd.service

RUN yum-builddep -y q-agentd.spec \
    && rpmbuild -bb q-agentd.spec

FROM scratch as artifact
COPY --from=build /home/q/rpmbuild/RPMS/x86_64/*.rpm /

FROM build as release
@dubrsl
Copy link

dubrsl commented Apr 16, 2021

The same problem

@darkdragon-001
Copy link

darkdragon-001 commented Apr 16, 2021

I also have the same problem. While using buildkit by default for some time via DOCKER_BUILDKIT=1 environment variable, CentOS 7 is the only distro where I experienced these problems with. I found out that yum is taking full CPU on a single core/thread (12.5% on my Intel CPU with 4 cores 8 threads).

Docker 20.10.6 on Fedora 33 on NVMe SSD.

@sxiii
Copy link

sxiii commented Jun 2, 2021

I happen to have same problem. Process goes on, but taking mostly 1 core and takes veeeeeeery long. Any leads on that?
Docker 20.10.5 (API version: 1.41, Go version: go1.13.15) on Manjaro Linux 21.0.5 kernel 5.11.19-1-MANJARO.

@silh
Copy link

silh commented Aug 24, 2021

Similar problem with go get.

@x2c3z4
Copy link

x2c3z4 commented Nov 15, 2021

Same issue, any update?

@clime
Copy link

clime commented Dec 1, 2021

The same problem here.

@Diluka
Copy link

Diluka commented Dec 13, 2021

not only yum install, but also npm install or any other large amount of file IO manipulations

for example, 10 minutes for buildx to build 2 platforms, and 2 minutes each for build to.

@clime
Copy link

clime commented Dec 13, 2021

Ye, I would probably try to use buildkit but this is a blocker for me.

@slimm609
Copy link

slimm609 commented Feb 27, 2022

I recently ran into this issue as well.

check the systemd service for docker.

It was set to infinity which sets the ulimit to 1073741816 but this seems to be causing a problem for some reason.

LimitNOFILE=infinity

When I changed the limit for nofile to 1024000, it resolved the problem and it works like normal docker build

LimitNOFILE=1024000

you can check the ulimit with a build

FROM alpine:latest
RUN echo "ulimit is $(ulimit -n)"

@mihalicyn
Copy link

mihalicyn commented Jul 27, 2022

Thanks @slimm609!

I've workarounded it by adding ulimit -n 1024000 just before yum install ... in the Dockerfile. It's more convenient rather than changing docker daemon service configuration at least in my case.

Example:

RUN ulimit -n 1024000 && yum -y install flex bison make gcc findutils openssl-devel bc diffutils elfutils-devel perl vim openssl dwarves

Hope this helps someone.

@Diluka
Copy link

Diluka commented Aug 16, 2022

bors bot added a commit to cross-rs/cross that referenced this issue Oct 8, 2022
1065: Workaround for slow CentOS builds depending on the ulimit. r=Emilgardis a=Alexhuszagh

Fixes CentOS builds being ~10x slower with certain configurations, where `LimitNOFILE=infinity`. This sets the ulimit for open file descriptors manually, which fixes the issue. See the upstream issue docker/buildx#379. 

Co-authored-by: Alex Huszagh <ahuszagh@gmail.com>
bors bot added a commit to cross-rs/cross that referenced this issue Oct 8, 2022
1065: Workaround for slow CentOS builds depending on the ulimit. r=Emilgardis a=Alexhuszagh

Fixes CentOS builds being ~10x slower with certain configurations, where `LimitNOFILE=infinity`. This sets the ulimit for open file descriptors manually, which fixes the issue. See the upstream issue docker/buildx#379. 

Co-authored-by: Alex Huszagh <ahuszagh@gmail.com>
bors bot added a commit to cross-rs/cross that referenced this issue Oct 8, 2022
1065: Workaround for slow CentOS builds depending on the ulimit. r=Emilgardis a=Alexhuszagh

Fixes CentOS builds being ~10x slower with certain configurations, where `LimitNOFILE=infinity`. This sets the ulimit for open file descriptors manually, which fixes the issue. See the upstream issue docker/buildx#379. 

Co-authored-by: Alex Huszagh <ahuszagh@gmail.com>
@breezewish
Copy link

It is also possible to just set ulimit via the command line, for example:

docker build --ulimit nofile=1024000:1024000 .

In this way Dockerfile don't need to be changed.

frison added a commit to frison/_slash that referenced this issue Nov 23, 2022
Fingers crossed for: docker/buildx#379
The doc is really fuzzy on if the ulimit option works with Buildkit, and
buildx uses buildkit.
@polarathene
Copy link

It is also possible to just set ulimit via the command line

This does not work with docker buildx bake unfortunately.

Presently you need to provide via --ulimit in the CLI for docker run or docker build if you want to ensure the soft limit is what it should be most of the time 1024. docker buildx bake presumably has to use a different workaround, such as the mentioned Dockerfile usage in RUN 😕

Once the Go 1.19 regression is fixed (and backported), you would then prefer LimitNOFILE in .service files is better for dockerd / containerd , although ideally they'd get updated to use a sane soft + hard limit like LimitNOFILE=1024:524288 (default since systemd v240, and plenty for these daemons). Until that regression is fixed, you can only lower the hard limit via LimitNOFILE, soft limit is ignored.

@hummeltech
Copy link

@polarathene, would it be possible for you to provide a link to the issue (I.E. Go 1.19 regression) you mentioned so that the status can easily be tracked? I was looking around but was unable to find it. Thanks!

@polarathene
Copy link

would it be possible for you to provide a link to the issue (I.E. Go 1.19 regression) you mentioned so that the status can easily be tracked?

Here you go (provides links to relevant tracking issues), but it has been resolved since Go 1.19.9 and Go 1.20.4 releases. These are available with Docker Engine 23.0.6 and 24.0.0 releases, while Containerd is releases 1.6.21 and 1.7.1.

Neither Docker Engine (moby) or Containerd projects have accepted PRs for setting LimitNOFILE=1024:524288 instead of LimitNOFILE=infinity yet. You'd still need to manually modify that (a drop-in override would avoid losing the change between updates).

I've not confirmed if that resolves docker buildx bake issue, I assume it does since moby bundles buildx now?

@LeeXN
Copy link

LeeXN commented Jul 20, 2023

This problem maybe caused by a rpm bug(https://bugzilla.redhat.com/show_bug.cgi?id=1537564), I found that in buildx, containerd ulimit is very large (1073741816).
Run docker build with --ulimit could fix this, or you can upgrade your rpm package like 4.18.

@polarathene
Copy link

@LeeXN I briefly explained this in my last comment above.

The problem is Docker and Containerd both package a systemd service with LimitNOFILE=infinity (sets --ulimit), and systemd since v240 on most distros (not Debian) sets infinity to a much larger value that you found. There are PRs for both, they just need to get approval from the maintainers and a new release pushed out. I've pinged them again today on slack to try move that along.

docker buildx bake command doesn't support --ulimit last I checked, so you must modify the LimitNOFILE in each service config for now.

milianw added a commit to KDAB/hotspot that referenced this issue Oct 7, 2023
Otherwise at least `yum` can take ages to complete, see e.g.:
docker/buildx#379
milianw added a commit to KDAB/hotspot that referenced this issue Oct 12, 2023
Otherwise at least `yum` can take ages to complete, see e.g.:
docker/buildx#379
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this issue Sep 25, 2024
Migrate these builds to linux 2023. We want to build and test the Docker images in CD.

Looks like we are hitting this issue: docker/buildx#379 when trying to build Docker on Amazon Linux 2023.

Conda Docker build is timing out. While Manywheel is executing but failing because BUILDKIT is turned off: https://github.com/pytorch/pytorch/actions/runs/11036043157/job/30653543264?pr=136544

Proposed Solution is to fix it in user_data . Please see: pytorch/test-infra#5712

I see docker builds are executed successfully here: https://github.com/pytorch/pytorch/actions/runs/11040149229/job/30667448668?pr=136544

Workaround timeout problem (reported in https://bugzilla.redhat.com/show_bug.cgi?id=1537564 ) by configuring number of open files per container to 1048576
Pull Request resolved: #136544
Approved by: https://github.com/ZainRizvi

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
BoyuanFeng pushed a commit to BoyuanFeng/pytorch that referenced this issue Sep 25, 2024
Migrate these builds to linux 2023. We want to build and test the Docker images in CD.

Looks like we are hitting this issue: docker/buildx#379 when trying to build Docker on Amazon Linux 2023.

Conda Docker build is timing out. While Manywheel is executing but failing because BUILDKIT is turned off: https://github.com/pytorch/pytorch/actions/runs/11036043157/job/30653543264?pr=136544

Proposed Solution is to fix it in user_data . Please see: pytorch/test-infra#5712

I see docker builds are executed successfully here: https://github.com/pytorch/pytorch/actions/runs/11040149229/job/30667448668?pr=136544

Workaround timeout problem (reported in https://bugzilla.redhat.com/show_bug.cgi?id=1537564 ) by configuring number of open files per container to 1048576
Pull Request resolved: pytorch#136544
Approved by: https://github.com/ZainRizvi

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests