Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alpine:3.13 cannot be used by gitlab-runner (alpine:3.12 can) #134

Closed
dHannasch opened this issue Jan 15, 2021 · 22 comments
Closed

alpine:3.13 cannot be used by gitlab-runner (alpine:3.12 can) #134

dHannasch opened this issue Jan 15, 2021 · 22 comments

Comments

@dHannasch
Copy link

dHannasch commented Jan 15, 2021

Just to keep you in the loop, some gitlab-runners are unable to load alpine:edge but can load alpine:latest (and any previous alpine version you care to name) just fine.
(The same gitlab-runners were able to run alpine:edge previously, as well, though unfortunately I don't know exactly how long ago. Definitely after the release of 3.12, and it was using https for apk which I believe was a new thing for 3.13, but it looks like it might have grabbed alpine:edge six months ago, so not that long after the release of 3.12.)

Trying to run a pair of jobs:

alpineedge:
  stage: test
  tags:
  - docker
  image: alpine:edge
  script:
  - echo "Success!"

alpinelatest:
  stage: test
  tags:
  - docker
  image: alpine:latest
  script:
  - echo "Success!"

Loading alpine:latest succeeds as expected, but loading alpine:edge fails with

Pulling docker image alpine:edge ...
Using docker image sha256:430cc6504dbd5a0acf9058733dc015452aa0af1b826c3c408c539f4f302591b7 for alpine:edge ...
Executing "step_script" stage of the job script
# sh: write error: Invalid argument

Curiously, some gitlab-runners appear to be able to load alpine:edge even now...exact details still to be determined.

As you can see, the error message isn't very helpful, so this is still an ongoing investigation, but it certainly appears that some change in alpine:edge is making life difficult for Docker-based CI runners. Mentioning it here just in case someone immediately has a brainwave like "ah, right, we just fiddled with the default shell" or something.

@NFarrington
Copy link

Following the release of Alpine Linux 3.13.0, alpine:latest (which is now 3.13.0) is also causing the same error in GitLab for us.

@TBK
Copy link

TBK commented Jan 15, 2021

For cross-reference - https://gitlab.alpinelinux.org/alpine/aports/-/issues/12311

@dHannasch
Copy link
Author

Following the release of Alpine Linux 3.13.0, alpine:latest (which is now 3.13.0) is also causing the same error in GitLab for us.

You're right. New jobs to keep track of this are

alpine312:
  stage: test
  tags:
  - docker
  image: alpine:3.12
  script:
  - echo "Success!"

alpine313:
  stage: test
  tags:
  - docker
  image: alpine:3.13
  script:
  - echo "Success!"

alpine:3.13 causes the crash:

Pulling docker image alpine:3.13 ...
Using docker image sha256:7731472c3f2a25edbb9c085c78f42ec71259f2b83485aa60648276d408865839 for alpine:3.13 ...
Executing "step_script" stage of the job script
sh: write error: Invalid argument
Running after_script
sh: write error: Invalid argument

alpine:3.12 succeeds as normal.

@dHannasch dHannasch changed the title alpine:edge cannot be used by gitlab-runner (alpine:latest can) alpine:3.13 cannot be used by gitlab-runner (alpine:3.12 can) Jan 16, 2021
@TBK
Copy link

TBK commented Jan 18, 2021

Please see https://gitlab.alpinelinux.org/alpine/aports/-/issues/12311#note_136903

@vvchik
Copy link

vvchik commented Jan 18, 2021

TL DR; we need to update docker.

summary from the link above:
https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.13.0#time64_requirements

musl 1.2 uses new time64-compatible system calls. Due to runc issue 2151, these system calls incorrectly return EPERM instead of ENOSYS when invoked under a Docker or libseccomp version predating their release. Therefore, Alpine Linux 3.13.0 requires the host Docker to be version 19.03.9 (which contains backported moby commit 89fabf0) or greater and the host libseccomp to be version 2.4.2 (which contains backported libseccomp commit bf747eb) or greater, compiled against Linux UAPI headers 5.4 (which contain time64 syscall definitions) or greater. Docker for Windows issue 8326 tracks the process of updating libseccomp in Docker for Windows.

@Hello71
Copy link

Hello71 commented Jan 20, 2021

hi, I wrote that. this issue is almost certainly not related to time64. updating docker may help but we don't know yet.

@Hello71
Copy link

Hello71 commented Jan 20, 2021

it might be helpful if those experiencing this issue (specifically, sh: write error: Invalid argument and not anything with Operation not permitted, which is most likely time64 related) could post their docker version and docker info.

@mkody
Copy link

mkody commented Jan 20, 2021

I have sh: write error: Invalid argument.
I did try again after checking that I'm running the latest docker version from the official Docker repository for Ubuntu.

docker version
Client: Docker Engine - Community
 Version:           20.10.2
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:17:29 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:23 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 3
  Running: 1
  Paused: 0
  Stopped: 2
 Images: 3
 Server Version: 20.10.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.4.0-109-generic
 Operating System: Ubuntu 16.04.7 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 992.3MiB
 Name: gitlabci-do1
 ID: XGMY:ASV2:QFIG:JJGO:NYA3:S6V3:TZ7Z:VGHE:Q6AV:QLIO:TFTH:VLHE
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

@Hello71
Copy link

Hello71 commented Jan 20, 2021

I'm still not sure what causes this issue, but I think it's most likely related to either an old kernel, old libseccomp, or both.

@vvchik
Copy link

vvchik commented Jan 20, 2021

Libseccomp 2.4.2 or higher and kernel 5.4 or higher

@Hello71
Copy link

Hello71 commented Jan 20, 2021

Libseccomp 2.4.2 or higher and kernel 5.4 or higher

that's a misreading of what i wrote in the alpine 3.13 release notes. it's kernel uapi headers 5.4 or higher. but as I said in #134 (comment), this issue is almost certainly not time64 related.

@vvchik
Copy link

vvchik commented Jan 20, 2021

I’ve updated kernel, docker and libseccomp on a pretty old ubuntu 16 runners and this issue is gone. So maybe it is smth else but an update helps in my particular case. Also, another box with ubuntu 20 and the latest docker works out of the box.

@Hello71
Copy link

Hello71 commented Jan 21, 2021

with @Ikke's help, I have diagnosed this issue. it is caused by busybox improperly calling free before checking errno. this issue should be fixed by patches in busybox and musl. in the meantime, it can be worked around by upgrading to at least Linux 4.5.

algitbot pushed a commit to alpinelinux/aports that referenced this issue Jan 21, 2021
@ncopa
Copy link
Contributor

ncopa commented Jan 21, 2021

can you test if alpine:edge is fixed? if It is I'll backport it to 3.13-stable

@Ikke
Copy link

Ikke commented Jan 21, 2021

Details here: https://gitlab.alpinelinux.org/alpine/aports/-/issues/12311#note_137667

It also includes instructions on how to upgrade it busybox in gitlab before the actual script is run.

algitbot pushed a commit to alpinelinux/aports that referenced this issue Jan 22, 2021
@Ikke
Copy link

Ikke commented Jan 28, 2021

This should be fixed now, so I guess it can be closed?

@dHannasch
Copy link
Author

This should be fixed now, so I guess it can be closed?

Thank you very much for the busybox fix!

Maybe this is a stupid question, but if apk add -u busybox fixes everything, wouldn't it be a good idea to include the upgraded version in alpine:edge? Is there a reason not to do that that I'm missing?
As far as I know, the official place to fetch alpine:edge from is Docker Hub, right?
https://hub.docker.com/_/alpine?tab=tags&page=1&ordering=last_updated
alpine:edge hasn't been rebuilt since your busybox fix, so fully resolving the issue might be as simple as clicking a button to rebuild it. (On the other hand, it might be more complicated than that. I struggle to follow what the Alpine Docker image build process is doing. But I think it would automatically pick up the latest version of busybox, right?)

It seems like the busybox package itself is fixed, but there is a last little bit needed to fix the alpine:edge image itself (rebuilding the image with the upgraded busybox). Though I'm not 100% sure whether this is the appropriate place for an issue with the Docker-image-itself-as-built versus an issue with the underlying package(s). But it seems like there ought to be an open issue somewhere until the upgraded busybox is included in the official alpine:edge image. (Either that or your entrypoint workaround is documented as the way to use alpine:edge, but it seems silly to enshrine that when upgrading busybox in the official image seems easier.)

@dHannasch
Copy link
Author

dHannasch commented Jan 28, 2021

can you test if alpine:edge is fixed?

I'm speculating wildly here, but based on this remark I'm guessing you intended to rebuild alpine:edge and it didn't go through?
https://hub.docker.com/_/alpine?tab=tags&page=1&ordering=last_updated
It looks like alpine:edge hasn't been rebuilt since the busybox package was fixed, so naturally if you just try to use it like

test:
  tags:
  - docker
  image: 
    name: alpine:edge
  script:
  - echo Success

Nothing has changed. (Or is there a place other than Docker Hub to pull alpine:edge?)

@Ikke
Copy link

Ikke commented Jan 28, 2021

It's correct that alpine:edge, nor alpine:3.13 contain the fixed busybox yet, so it should be upgraded manually for now.

@ncopa will tag new releases (or snapshots in case of edge) soon, so then the images will be rebuilt to include the fixed busybox.

See the linked gitlab issue for instructions how you can make sure busybox is upgraded before the job script is run as a temporary work-around.

@ncopa
Copy link
Contributor

ncopa commented Feb 2, 2021

This should be fixed with busybox-1.32.1-r1 alpinelinux/aports@c12121a and is included in the alpine:3.13.1 docker image. docker-library/official-images#9523

@ncopa ncopa closed this as completed Feb 2, 2021
@dHannasch
Copy link
Author

FYI for anyone still running into this, alpine:3.13 and alpine:latest now work, though you'll still run into this for alpine:edge for the moment.

# fails
alpineedge:
  stage: test
  tags:
  - docker
  image: alpine:edge
  script:
  - echo "Success!"

# passes
alpinelatest:
  stage: test
  tags:
  - docker
  image: alpine:latest
  script:
  - echo "Success!"

# passes
alpine313:
  stage: test
  tags:
  - docker
  image: alpine:3.13
  script:
  - echo "Success!"

Presumably it'll be fixed for alpine:edge too the next time alpine:edge gets rebuilt, but for now you can either use alpine:latest or use the apk trick to upgrade busybox as you download the image.

@dHannasch
Copy link
Author

dHannasch commented Feb 10, 2021

For the record, with the rebuild of alpine:edge, this is now fixed on alpine:edge as well.

jollaitbot pushed a commit to sailfishos-mirror/busybox that referenced this issue Feb 23, 2021
musl libc's mallocng free() may modify errno if kernel does not support
MADV_FREE which causes echo to echo with error when it shouldn't.

Future versions of POSIX[1] will require that free() leaves errno
unmodified but til then, do not rely free() implementation.

Should fix downstream issues:
alpinelinux/docker-alpine#134
https://gitlab.alpinelinux.org/alpine/aports/-/issues/12311

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants