New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make locking opt-in rather than default #950

Closed
chrisdone opened this Issue Sep 8, 2015 · 6 comments

Comments

Projects
None yet
5 participants
@chrisdone
Member

chrisdone commented Sep 8, 2015

I propose we make the locking support optional rather than on by default, as it's prevented many workflows from happening so far to be a maintenance burden and sometimes still does. It doesn't serve the regular desktop user very well IMHO, but it's a valid use-case that fits the CI/automation needs. I propose we have a flag or environment variable like

STACK_LOCK=true

Or something like that to enable it.

Pinging @snoyberg @rrnewton.

@chrisdone chrisdone added this to the 0.3.0.0 milestone Sep 8, 2015

@rrnewton

This comment has been minimized.

Contributor

rrnewton commented Sep 8, 2015

I don't have a strong opinion about defaults as long as the option is there. As a first step, we haven't tried having it default-on but with an option to turn it off when there's a problem. I think we definitely at least need the option to turn it off as an escape hatch.

Leaving it on by default does have the benefit of forcing us to fix problems. I'm back in the country afer ICFP now and can help look into some of those problems.

Obviously, many tools persist with "concurrent behavior = undefined" for long periods of time (cabal did), but I think it would be nice to at least issue a warning or something, rather than letting the ~/.stack/ directory be corrupted silently.

@rvion

This comment has been minimized.

Contributor

rvion commented Sep 8, 2015

I also have the impression it is better to let the lock on by default so we have to fix bugs rather than ignore them.

To improve user experience when stack fails to acquire the lock for too long, stack could print a message like

stack is waiting for other stack instances to release the lock.
You can restart your command with STACK_LOCK=false to bypass lock protection
@radix

This comment has been minimized.

Contributor

radix commented Sep 8, 2015

What bad things will happen if the lock is disabled? I assume it was implemented in the first place to avoid bad things like database corruption or something. It seems like it would be bad to disable it by default if horrible things can happen.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 8, 2015

I don't believe there's any case of horrible things happening. AFAICT, the worst case scenario is that the GHC package database gets overwritten, a package that was registered by process A is implicitly unregistered by process B, and then process A's build will fail because dependencies are missing. The solutions to that could be:

  • Accept it as a possibility, and the user just needs to rerun the build
  • Make the lock much more fine-grained, and just apply to the package cache itself

Even the second option still leaves some possibilities for a failed build, but I don't think there's anything that can go wrong that would prevent a second run of the same command from succeeding.

@rrnewton

This comment has been minimized.

Contributor

rrnewton commented Oct 6, 2015

Those failure modes do sound pretty benign. I thought the issue was more along the lines of breaking existing packages like in the bad-old-days.

I suppose the simplest scenario is:

job 1:
   install A_1
   install B_1
job 2:
   install A_2
   install C_2

Even if A_1 and A_2 are the same version (as they should be, in the snapshot, not necessarily in concurrent builds in the same local project), they still get nondeterministic package IDs in there somewhere, right (which can make A_1 invalid for C_2)?

I think @snoyberg's point was that one of the two above jobs will fail, and just need to be rerun. That's a benign failure. (But strictly speaking aren't there also more complicated setups where both fail?) Also, is there a way for individual package registrations to be non-atomic? Do they touch more than one file? Finally, with dynamic libraries, could these races and bad interleavings break already built binaries?

@rrnewton

This comment has been minimized.

Contributor

rrnewton commented Oct 6, 2015

I started playing with letting concurrent stack processes run together. I ran into the following error while doing a stack ghci and stack build at the same time on an ubuntu 14.04 server. It may or may not be related to concurrency.

$ stack build
stack-0.1.5.0: build
Preprocessing library stack-0.1.5.0...
[39 of 70] Compiling Stack.PackageIndex ( src/Stack/PackageIndex.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.4.0/build/Stack/PackageIndex.o )

ghc: panic! (the 'impossible' happened)
  (GHC version 7.10.2 for x86_64-unknown-linux):
    Loading temp shared object failed: /tmp/ghc120832_0/libghc_47.so: undefined symbol: stackzu4xMM19FZZwCV6PT9u8bj3To_StackziTypesziCompiler_zdwparseCompilerVersion_info

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug


--  While building package stack-0.1.5.0 using:
      /u/rrnewton/.stack/setup-exe-cache/setup-Simple-Cabal-1.22.4.0-x86_64-linux-ghc-7.10.2 --builddir=.stack-work/dist/x86_64-linux/Cabal-1.22.4.0/ build lib:stack exe:stack --ghc-options  -ddump-hi -ddump-to-file
    Process exited with code: ExitFailure 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment