Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to unpack GHC (sometimes) #4888

Closed
magthe opened this issue Jun 19, 2019 · 13 comments
Closed

Failing to unpack GHC (sometimes) #4888

magthe opened this issue Jun 19, 2019 · 13 comments

Comments

@magthe
Copy link

magthe commented Jun 19, 2019

General info

The last couple of days I'm running into an issue where untaring of GHC fails:

Preparing to download ghc-8.6.5 ...
ghc-8.6.5: download has begun
ghc-8.6.5:   17.49 MiB / 175.83 MiB (  9.95%) downloaded...
ghc-8.6.5:   47.24 MiB / 175.83 MiB ( 26.87%) downloaded...
ghc-8.6.5:   76.67 MiB / 175.83 MiB ( 43.61%) downloaded...
ghc-8.6.5:  107.02 MiB / 175.83 MiB ( 60.86%) downloaded...
ghc-8.6.5:  136.92 MiB / 175.83 MiB ( 77.87%) downloaded...
ghc-8.6.5:  167.14 MiB / 175.83 MiB ( 95.06%) downloaded...
ghc-8.6.5:  175.83 MiB / 175.83 MiB (100.00%) downloaded...
Downloaded ghc-8.6.5.
Unpacking GHC into /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/ ...
Received ExitFailure (-15) when running
Raw command: /bin/tar Jxf /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
Run from: /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/


Error: Error encountered while unpacking GHC with
         tar Jxf /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
         run in /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/

       The following directories may now contain files, but won't be used by stack:
         - /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/
         - /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5/

       For more information consider rerunning with --verbose flag

Steps to reproduce

I don't have exact steps, but the code and CI builds are all open and available.

The code is available at: https://github.com/magthe/ci-test-hs/ (the branch Add Azure Pipelines)

Examples of CI builds at:

Building image locally (docker build -t foo:0 .) first failed, then I followed the suggestion and added --verbose, then it succeeded. Howerver, the CI builds keep failing sporadically.

Expected

I'm used to stack setup working like a charm.

Actual

Well, see above.

Stack version

The version I'm using on VMs is the pre-built 2.1.1 downloaded from GitHub, e.g. https://github.com/magthe/ci-test-hs/blob/153ca80eaca23eae6444abdbf32e0e3b91240d76/.travis.yml#L15

The version used in container, including when building images, is the one that's found in fpco/stack-build:lts-13 (I believe that's been fpco/stack-build:lts-13.25 and thus stack 2.1.1)

Method of installation

See above.

@dmp1ce
Copy link

dmp1ce commented Jun 20, 2019

I'm also seeing this issue with Snapcraft which uses Multipass to build snap packages. It happens every time on a fresh build. https://forum.snapcraft.io/t/haskell-stack-snaps-help/11909

@snoyberg
Copy link
Contributor

I'm really confused by this one, and would love to hear some thoughts from others. I can't see any reason why a SIGTERM would be sent to Stack, what process would be sending it, or what changes in the Stack.Setup codepath could generate this difference.

I am able to reproduce.

@snoyberg snoyberg added this to the P0: Blocking release milestone Jun 23, 2019
@snoyberg
Copy link
Contributor

@jamesdbrock provided a Dockerfile for reproing this in #4889, but it's not a reliable repro. I'm not familiar with Snapcraft @dmp1ce. Do you think you'd be able to put together a reliable Docker-based repro for easier testing?

@dmp1ce
Copy link

dmp1ce commented Jun 23, 2019

It isn't easy to get snapd installed in a docker container. snapd is needed to install snapcraft which in turn uses multipass to create a virtual machine for building snap images. Probably the easiest thing to do is run snapcraft on a spare computer running Ubuntu. Multipass might work in LXD but I'm not sure because multipass requires a KVM device.

On Ubuntu the steps would be:

  • Ensure snapd is running (sudo apt install snapd)
  • Install snapcraft (sudo snap install snapcraft --classic)
  • Get basic snap of a stack project (git clone https://github.com/dmp1ce/snapcraft-stack-example.git)
  • Try to build project with snapcraft (snapcraft)

@jamesdbrock
Copy link

I tried to look into stack setup with strace and I got this glimpse.

strace -e %signal stack setup --verbose
2019-06-24 00:20:51.863074: [debug] Unpacking /root/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
2019-06-24 00:20:51.864394: [debug] Run process within /root/.stack/programs/x86_64-linux/ghc-8.6.5.temp/: /bin/tar Jxf /root/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
tgkill(14, 16, SIGPIPE)                 = 0
kill(21, SIGTERM)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=21, si_uid=0, si_status=SIGTERM, si_utime=21, si_stime=179} ---
2019-06-24 00:21:05.061188: [error] Received ExitFailure (-15) when running
Raw command: /bin/tar Jxf /root/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
Run from: /root/.stack/programs/x86_64-linux/ghc-8.6.5.temp/

I'd like to capture a log of

strace -f -e %signal,%process stack setup

but I cannot reproduce the error again.

@snoyberg
Copy link
Contributor

That strace output is just what we needed! @psibi and @nh2 provided the missing piece of insight: the SIGTERM is coming from Stack itself. This reminded me of a test suite bug I fixed recently:

snoyberg/conduit@20fd6e2

Which ultimately led to this PR: #4902

I'd appreciate if those affected by this bug would be able to test this out and confirm that it fixes the problem for them.

@magthe
Copy link
Author

magthe commented Jun 30, 2019

I'm not that familiar with the whole stack/stackage setup, so I'll have to ask. Are there builds with this change included available somewhere, e.g. using some specific tag on DockerHub or in an artefact store on the CI system you use?

@snoyberg
Copy link
Contributor

We have nothing automated (though I wish we did). I generated a Linux executable and uploaded it to S3, and started using it for typed-process. If you'd like to use it too, it's available at

https://s3.amazonaws.com/www.snoyman.com/stack-1ed71cae36a64365ead72da1427e1685ccec8246.bz2

Relevant commit: fpco/typed-process@af31b7b#diff-354f30a63fb0907d4ad57269548329e3

@magthe
Copy link
Author

magthe commented Jun 30, 2019

Yes, that'll make it a little easier to try it out on CI services, since that's where I've observed the issue most frequently.

@SkyWriter
Copy link
Member

Running stack build with LTS 12.26 in Docker fpco/stack-build-small (3523caf4fba2) always fails with the mysterious -15 error message, even though execution of the corresponding tar command succeeds when done manually.

Stack build provided by @snoyberg heals my woes, and my app builds without issues.

@magthe
Copy link
Author

magthe commented Jul 1, 2019

I'm completely failing to reproduce without the fix for the last few days... don't ask me what's different on the various CI services I'm experimenting with.

I can't even reproduce using @SkyWriter 's recipe above 😕

@neongreen
Copy link
Collaborator

I generated a Linux executable and uploaded it to S3, and started using it for typed-process.

A new official release would be great – due to this issue, our CI builds fail more often than not nowadays.

@snoyberg is there anything I can do (regarding maintenance tasks) to help make the new release happen faster? My email is in my Github profile.

@borsboom
Copy link
Contributor

@neongreen The only holdup right now is that 4938-non-ascii-module-names is failing for macOS on CI (see #4939 (comment)). If there's anything you can do to help with that, it would push things along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants