Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network as a dependency fails to build with GHC 8.4.1 #3944

Closed
ndmitchell opened this issue Mar 25, 2018 · 22 comments
Closed

Network as a dependency fails to build with GHC 8.4.1 #3944

ndmitchell opened this issue Mar 25, 2018 · 22 comments

Comments

@ndmitchell
Copy link
Contributor

Using an Appveyor machine, with the script from https://github.com/ndmitchell/bug-network-ghc84/blob/master/appveyor.yml, I get failures as per https://ci.appveyor.com/project/ndmitchell/bug-network-ghc84, when I try and install network using the latest GHC 8.4 nightly.

This issue has been discussed at haskell/network#313 but I believe this is a stack bug because:

  • stack unpack network && cd network-* && stack init && stack build works.
  • stack build network fails.

I believe these things are meant to be equivalent (full steps and command lines are given at https://github.com/ndmitchell/bug-network-ghc84/blob/master/appveyor.yml - the final line of the test_script section fails).

The actual error seems to be:

    config.status: creating network.buildinfo
    sed: -e expression #1, char 257: invalid reference \1 on `s' command's RHS

When running configure, which results in a blank network.buildinfo, which should have the CALLCONV macro set so the build succeeds.

@snoyberg
Copy link
Contributor

It's unclear from this issue if you're using the same resolver in the two different builds, which could fully explain the difference.

@ndmitchell
Copy link
Contributor Author

I am using --resolver=nightly appended to all actions. The concrete and complete Appveyor script is linked above, but the actions are:

- echo "" | stack --no-terminal setup --resolver=nightly > nul
- echo "" | stack --no-terminal unpack network --resolver=nightly
- cd network-2.6.3.5
- echo "" | ..\stack --no-terminal init --resolver=nightly
- echo "" | ..\stack --no-terminal build --resolver=nightly
- cd ..
- echo "" | stack --no-terminal build network --resolver=nightly

The final line fails, all other lines work. You can see a trace at https://ci.appveyor.com/project/ndmitchell/bug-network-ghc84/build/1.0.17.

@simonmichael
Copy link
Contributor

simonmichael commented Apr 27, 2018

I'm also seeing this, with 64-bit GHC 8.4.2 and nightly-2018-04-25 on appveyor (https://ci.appveyor.com/project/simonmichael/hledger/build/master-408).

@lierdakil
Copy link

Also seeing this on AppVeyor with nightly-2018-05-01 https://ci.appveyor.com/project/lierdakil/pandoc-crossref/build/1.0.286

@lierdakil
Copy link

Still happening on nightly-2018-05-23 https://ci.appveyor.com/project/lierdakil/pandoc-crossref/build/1.0.288
... the lack of attention to this issue is vaguely concerning.

@snoyberg
Copy link
Contributor

... the lack of attention to this issue is vaguely concerning.

As a side comment: this kind of attitude is not what I would expect when working on an open source project. Anyone is of course free to contribute and investigate the issue, including of course you yourself. Comments like this have a great way of preventing contributors from wanting to get involved.

That said, my initial read of this issue is that it's non-critical, as the original description shows that there's at least one way that the builds succeed, so people can continue using the tool. If I'm missing some reason why this is more critical than I thought, let me know. But at least I, and I'm sure most everyone else involved in open source maintenance, has to prioritize their time investments.

My strong recommendation: if this concerns you, ask concretely for advice on how to proceed with debugging this issue, I'd be more than happy to share any information I can.

@lierdakil
Copy link

Comments like this have a great way of preventing contributors from wanting to get involved.

Sorry if I offended you, this was not my intention at all. As an opensource maintainer myself, I find such comment relatively benign, at least considering how these things could go wrong. At least that prompted a response, eh? Again, sorry.

If I'm missing some reason why this is more critical than I thought, let me know.

Well, as far as I can tell, any package with a transitive dependency on network will fail to build on Windows as it is now. It's possible to build network itself directly, but it fails to build as a dependency. It might be possible to bodge somehow, but that would be rather far from Stack's mission statement, I think. Considering that a lot of projects depend on network in some way, I would argue this issue is at least "high priority". I might be missing something of course, since admittedly I didn't spend much time debugging it.

Anyone is of course free to contribute and investigate the issue, including of course you yourself.

I'm short on time, lack easy access to Windows, and not at all familiar with Stack's code base, but sure, I could give it a shot. Don't really know where to even start though, so any pointers are appreciated. I think @ndmitchell mentioned something about different sed being used depending on whether it's built directly or as a dependency, so I guess looking at where that's done might be the first thing to try, so if you could kindly point me to the code responsible, that would help.

@snoyberg
Copy link
Contributor

I've compared the logs (from https://ci.appveyor.com/project/ndmitchell/bug-network-ghc84/build/1.0.17) of the successful and non-successful builds, and this is what I see for the seemingly relevant subset of lines:

1c1
< Linking C:\project\network-2.6.3.5\.stack-work\dist\22b940d3\setup\setup.exe ...
---
> Linking C:\Users\appveyor\AppData\Local\Temp\1\stack2616\network-2.6.3.5\.stack-work\dist\22b940d3\setup\setup.exe ...
100c100
< config.status: creating include/HsNetworkConfig.h
---
> sed: -e expression #1, char 257: invalid reference \1 on `s' command's RHS

In other words, beyond the different directories (which is to be expected), I don't see anything else going on. My recommended next step would be to investigate what differences in environment may exist. One flag which may be useful is --cabal-verbose, which will tell Cabal to be more verbose in its own output. Actually, I just noticed that --verbose hasn't been passed in either in Neil's repro, that would probably shed some light. If that doesn't tell us anything, I'd probably instrument Stack itself to dump out environment variable information next.

@snoyberg
Copy link
Contributor

I opened up this PR (ndmitchell/bug-network-ghc84#1) to get an AppVeyor build running (https://ci.appveyor.com/project/ndmitchell/bug-network-ghc84/build/1.0.18).

@ndmitchell
Copy link
Contributor Author

FWIW, I consider this pretty important (currently anything downstream of network that I maintain is no longer tested on Windows), and if I was doing commercial development on Windows I'd be absolutely freaking out (I stopped doing that 5 months ago, so we narrowly avoided a freak out).

I have debugged as much as I could, and got stuck. I managed to figure out that stack uses a different sed in the two locations. I think that's the root cause, but because I'm so many levels of abstraction away from the problem, it's hard to be sure. The place where I discovered the information about sed was in haskell/network#313 (comment).

I did add --verbose to Stack, and tried adding it to stack calling Cabal, but one didn't provide any differing information, and one didn't work properly (I think I raised a few Stack bugs about it).

@snoyberg
Copy link
Contributor

@ndmitchell From that report, I don't see evidence that two different versions of sed are being run by Stack. I see that there are two different versions of sed available. It's possible that different arguments are being provided to the same sed exe. (I'm not saying it's one or the other, I'm just trying to identify what the actual issue may be.)

Also, AFAIK, autoreconf should not be called at all when building network from a tarball. Stack will call that program on demand, but only if the configure script isn't available. This applies to the git clone case, but not the stack unpack case.

@snoyberg
Copy link
Contributor

Any chance you can confirm if running the build command a second time succeeds? I was able to reproduce this on my Windows machine once, but not more.

Also, can you confirm that the command stack --resolver nightly-2018-05-23 build network reproduces the issue?

@snoyberg
Copy link
Contributor

Here's the last bit of info I have. I've updated my fork of the bug-network-ghc84 repo to create a replacement sed.exe that prints its arguments when the underlying sed.exe fails. You can see the relevant bits at:

My guesses at what's going on so far, which may be wrong:

  • The failure occurs when the network package is unpacked by Stack itself
  • From everything I can tell, there is no difference in environment variables, or which sed executable is called
  • If I was to take a complete guess: this is some kind of issue with sed tripping up on the generated temporary directory names
  • Unfortunately, without a reliable local repro, I won't be able to make more progress on this without assistance from others

I'm also not entirely certain that my intercept program is working as expected, as I'm getting new build failures. But it's at least demonstrating that, presumably, all the different build types are calling the same sed.exe.

@ndmitchell
Copy link
Contributor Author

@snoyberg I had very little success making it happen locally, but my development environment is the Haskell equivalent of the XKCD cartoon https://xkcd.com/1987/. At one point I thought I had it narrowed down to 32 bit vs 64 bit, but then I blinked and it changed.

My guesses were mostly based on using https://www.appveyor.com/docs/how-to/rdp-to-build-worker/ to get RDP to the appveyor instance, which is a poor mans approximation of local reproduction.

@snoyberg
Copy link
Contributor

I tried to automate this, but failed so far. Instead, I was able to manually log into an AppVeyor machine and upgrade GHC 8.2.2's global Cabal to 2.2.0.1. With that installed, I got the same build failure as with GHC 8.4. Point being: the breakage is due to differences in Cabal, not GHC. Doesn't help fix this immediately, but does slightly narrow down where the problem comes from.

@snoyberg
Copy link
Contributor

snoyberg commented Jun 18, 2018

And then I was able to reproduce the issue again using stack unpack, as long as it was run from the same temporary directory path used by stack build network. Problem is not with different environments, it's a filepath issue.

EDIT Instead of a bunch of individual comments, I'll keep updating here.

Running runghc Setup.hs configure directly from the same directory gave the same sed error message.

Broken directory paths:

  • C:\Users\appveyor\AppData\Local\Temp\1\stack2232\network-2.7.0.0
  • C:\Users\appveyor\AppData\Local\Temp\1\network-2.7.0.0

However, it worked with C:\Users\appveyor\AppData\Local\Temp\network-2.7.0.0

Theory: it's either an arbitrary length issue, or it doesn't like a path segment consisting of the number 1. Testing building in c:\1\network-2.7.0.0: and it fails.

I tried building in c:\2\network-2.7.0.0, and the error message changes!

sed: -e expression #1, char 213: invalid reference \2 on `s' command's RHS

Great news, things are starting very slowly to make sense. I can also reproduce this on my local machine, so AppVeyor is no longer needed. Next step is to figure out what changed in Cabal to suddenly trigger this.

@snoyberg
Copy link
Contributor

I've been able to minimize this to a repro against just Cabal, which I've filed here: haskell/cabal#5386.

@snoyberg
Copy link
Contributor

It seems unlikely that Cabal will make a point release for this problem, and that GHC 8.4.4 will be released with that change. Given that, it's a good idea to figure out a workaround in Stack. Here's the only one I can think of, and it's crazy:

In the temporary directory logic, add a check if the system's temporary directory has one of these \1-style paths in it and, if so, make up some completely different temporary directory. Bonus points: only do this for build-type: Configure. I'm open to other ideas on how to work around this one.

snoyberg added a commit that referenced this issue Jun 18, 2018
@snoyberg
Copy link
Contributor

@simonmichael
Copy link
Contributor

Way to go @snoyberg.

@snoyberg
Copy link
Contributor

Since the AppVeyor config change of setting TMP seems to fix this, I'm going to close the issue. I'm still slightly confused as to why cabal-install and Stack use different temporary directories, but I think I'm ready to leave well enough alone.

@ygale
Copy link

ygale commented Jul 10, 2018

For completeness, here is a link to the blog post where @snoyberg related the full details of the issue and how he figured it all out.

sol added a commit to sol/hpack that referenced this issue Jul 11, 2018
sol added a commit to sol/hpack that referenced this issue Jul 11, 2018
sol added a commit to sol/hpack that referenced this issue Jul 11, 2018
quasicomputational pushed a commit to quasicomputational/hpack that referenced this issue Dec 13, 2018
quasicomputational pushed a commit to quasicomputational/hpack that referenced this issue Apr 30, 2019
sol pushed a commit to sol/hpack that referenced this issue Jun 1, 2019
* AppVeyor workaround for TMP issue

commercialhaskell/stack#3944

* Bump resolver to nightly-2018-12-12.

This has the primary benefit of moving to GHC 8.6.3 and should fix
AppVeyor.

* Add clock 0.8 as an extra-dep.

* Adapt expected output to aeson 1.4.3.0.
steve-chavez added a commit to steve-chavez/postgrest that referenced this issue Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants