Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqliteBusy Error appears once in a while #4471

Closed
psibi opened this issue Dec 28, 2018 · 16 comments
Closed

SqliteBusy Error appears once in a while #4471

psibi opened this issue Dec 28, 2018 · 16 comments

Comments

@psibi
Copy link
Member

psibi commented Dec 28, 2018

Sqlite Busy error still appears

Related issue: #4247

Steps to reproduce

Unfortunately, there are no steps to reproduce it reliable. The Azure pipeline has been able to replicate it periodically:

I was sparsely able to reproduce it twice by running Stack build simultaneously on multiple Haskell projects (on a Linux machine).

Expected

Not have these errors:

SQLite3 returned ErrorBusy while attempting to perform prepare "PRAGMA journal_mode=WAL;": database is locked

Actual

Hitting up with the ErrorBusy error.

Stack version

$ stack --version
Version 1.10.0, Git revision f99a25395af25c964f1a517f49f4c31cab2b82a9 (6747 commits) x86_64 hpack-0.31.1

Method of installation

From the source

@mihaimaruseac
Copy link
Contributor

Do we have multiple threads accessing SQLite at the same time?

@psibi
Copy link
Member Author

psibi commented Dec 31, 2018

Yes, I do think so. There is also this function inside the module Pantry:

traverseConcurrently_
traverseConcurrentlyWith_

@mihaimaruseac
Copy link
Contributor

Then that's the issue. SQLite doesn't support multithreaded access to the database.

@qrilka
Copy link
Contributor

qrilka commented Jan 3, 2019

@mihaimaruseac why do you think it does not? See e.g. https://www.sqlite.org/threadsafe.html
From a quick glance it looks like persistent-sqlite doesn't set threading explicitly and thus serialized mode should be used.

@mihaimaruseac
Copy link
Contributor

I guess that's a new development in slqite. In the past it used to not be possible to properly multithread access the database.

@nh2
Copy link
Collaborator

nh2 commented Feb 4, 2019

Do we have multiple threads accessing SQLite at the same time?

Another important question to ask:

Do we have multiple processes accessing SQLite at the same time?

While multi-threaded access isn't a problem for sqlite as linked above, only 1 process can write to a DB at any given time.

From https://www.sqlite.org/faq.html#q5:

Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time, however.
...
When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.

@nh2
Copy link
Collaborator

nh2 commented Feb 4, 2019

CC @snoyberg via #4247 (comment)

@snoyberg
Copy link
Contributor

snoyberg commented Feb 4, 2019

Right, I get all that. Adding a busy timeout value is intended to give another process a chance (2 seconds) to finish up whatever transaction it's in the middle of. If that's insufficient, it means that Pantry transactions are lasting longer than I would have thought in some cases.

@psibi
Copy link
Member Author

psibi commented Feb 4, 2019

@snoyberg I don't think there is a guarantee of 2 seconds. From this source:

The presence of a busy handler does not guarantee that it will be invoked when there is lock contention. If SQLite determines that invoking the busy handler could result in a deadlock, it will go ahead and return SQLITE_BUSY to the application instead of invoking the busy handler.

Reference: http://sqlite.1065341.n5.nabble.com/busy-timeout-not-working-always-tp83035p83037.html

@psibi
Copy link
Member Author

psibi commented Feb 4, 2019

Also, I have this sample code through which I'm able to reproduce the above SqliteBusy Error (note that you may have to run multiple times to reproduce this):

Click to expand the code!
#!/usr/bin/env stack
-- stack script --resolver lts-12.14
{-# LANGUAGE GADTs, GeneralizedNewtypeDeriving, QuasiQuotes,
  TemplateHaskell #-}
{-# LANGUAGE TypeFamilies, MultiParamTypeClasses,
  OverloadedStrings, ScopedTypeVariables #-}

import Data.Pool (Pool, destroyAllResources)
import Data.Text
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Path
import RIO
import RIO.Orphans ()
import UnliftIO.Async

share
  [mkPersist sqlSettings, mkMigrate "migrateAll"]
  [persistLowerCase|
Person
  name String
  age Int Maybe
  deriving Show
|]

newtype Storage =
  Storage (Pool SqlBackend)

data Config = Config
  { pcStorage :: Storage
  , paSimpleApp :: SimpleApp
  }

simpleAppL :: Lens' Config SimpleApp
simpleAppL = lens paSimpleApp (\x y -> x {paSimpleApp = y})

instance HasLogFunc Config where
  logFuncL = simpleAppL . logFuncL

class HasStorage env
  -- | Lens to get or set the 'PantryConfig'
  --
  -- @since 0.1.0.0
  where
  pantryConfigL :: Lens' env Storage

instance HasStorage Config where
  pantryConfigL = lens pcStorage (\x y -> x {pcStorage = y})

initStorage ::
     HasLogFunc env
  => Path Abs File -- ^ storage file
  -> (Storage -> RIO env a)
  -> RIO env a
initStorage fp inner = do
  bracket (createSqlitePoolFromInfo sqinfo 1) (liftIO . destroyAllResources) $ \pool -> do
    migrates <- runSqlPool (runMigrationSilent migrateAll) pool
    forM_ migrates $ \mig -> RIO.logDebug $ "Migration output: "
    inner (Storage pool)
  where
    sqinfo =
      set extraPragmas ["PRAGMA busy_timeout=2000;"] $
      set fkEnabled True $ mkSqliteConnectionInfo (fromString $ toFilePath fp)

withPantryConfig ::
     HasLogFunc env
  => Path Abs Dir
  -> (Storage -> RIO env a)
  -- ^ What to do with the config
  -> RIO env a
withPantryConfig root inner = do
  testRelFile <- parseRelFile "test.sqlite3"
  env <- ask
  runRIO (mempty :: RIO.LogFunc) $
    initStorage (root </> testRelFile) $ \storage ->
      runRIO env $ do inner storage

runApp :: (MonadIO m) => RIO Config a -> m a
runApp f =
  runSimpleApp $ do
    sa <- ask
    testDir <- parseAbsDir "/home/sibi/github/scripts"
    withPantryConfig
      testDir
      (\x -> runRIO Config {pcStorage = x, paSimpleApp = sa} f)

withStorage ::
     (HasStorage env, HasLogFunc env)
  => ReaderT SqlBackend (RIO env) a
  -> RIO env a
withStorage action = do
  Storage pool <- view pantryConfigL
  logInfo $ "Going to run the pool action"
  runSqlPool action pool

action = do
  insert_ $ Person "Sibi" (Just 24)
  insert_ $ Person "Sibi" (Just 24)
  insert_ $ Person "Sibi" (Just 24)
  insert_ $ Person "Sibi" (Just 24)
  insert_ $ Person "Sibi" (Just 24)
  insert_ $ Person "Sibi" (Just 24)
  (pers :: [Entity Person]) <- selectList [] []
  return pers

action1 :: RIO Config ()
action1 =
  withStorage $ do
    xs <- action
    liftIO $ print xs

main :: IO ()
main = do
  forConcurrently_ (Prelude.replicate 10 0) (\_ -> runApp action1)

You would have to tweak the code for some path change, but you get the idea. Note that the above code certainly doesn't take 2 seconds to complete - it seems it is upto sqlite's discretion when to return SQLITE_BUSY

@ndmitchell
Copy link
Contributor

I just reproduced this running on Windows 10 while building Stack itself from Git using the command:

C:\Neil\stack>stack build --executable-profiling
... snip ...
tar-conduit-0.3.0: build
conduit-extra-1.3.0: configure
conduit-extra-1.3.0: build
x509-1.7.3: copy/register
x509-store-1.6.6: configure
x509-store-1.6.6: build
x509-store-1.6.6: copy/register
tar-conduit-0.3.0: copy/register
conduit-extra-1.3.0: copy/register
aeson-1.2.4.0: copy/register
Cabal-2.4.0.1: copy/register
Progress 134/172
SQLite3 returned ErrorBusy while attempting to perform prepare "PRAGMA journal_mode=WAL;": database is locked

SQLite3 returned ErrorBusy while attempting to perform prepare "PRAGMA journal_mode=WAL;": database is locked

@snoyberg
Copy link
Contributor

snoyberg commented Apr 2, 2019

What's the current status here? I think I remember some additional commits landing to improve this situation, and I personally haven't seen this bug on my machine or CI in a while. Anyone have more recent data?

@psibi
Copy link
Member Author

psibi commented Apr 2, 2019

@snoyberg This is from yesterday's azure log: https://dev.azure.com/commercialhaskell/stack/_build/results?buildId=944

@snoyberg
Copy link
Contributor

snoyberg commented Apr 2, 2019

Thanks Sibi, looking at this now.

snoyberg added a commit that referenced this issue Apr 2, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
@snoyberg
Copy link
Contributor

snoyberg commented Apr 2, 2019

I've opened a PR with some changes:

#4688

CC @psibi

snoyberg added a commit that referenced this issue Apr 2, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 3, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 3, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 3, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 4, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 5, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 5, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 5, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 5, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 7, 2019
This makes two basic changes to the code:

* Avoids a connection pool, which forces the SQLite connection to be
closed at the end of each call to withStorage. This seems unfortunate;
we would like to be able to maintain connections. However, SQLite seems
to view such connections as holding a write lock even when not in the
middle of a database transaction.

* Uses an explicit file lock on the two longest-running write
operations: migration and Hackage population. Ideally the locking
mechanism in SQLite would be sufficient for this, but (1) Hackage
population can take a long time, and (2) evidence has shown that it
doesn't behave as we expect it to.

This may be insufficient for preventing the bug described in #4471, in
which case we may need to apply withWriteLock to more calls. However,
I'd like to start off small to avoid the additional overhead.
snoyberg added a commit that referenced this issue Apr 8, 2019
…lock

Take a write lock before using Pantry database #4471
@snoyberg
Copy link
Contributor

snoyberg commented Apr 8, 2019

#4688 has landed. The integration tests now pass completely, but that's probably more a result of rewriting the test runner than any changes I made to the SQLite handling. I'm going to leave this issue open for now to make sure we revisit this before a release. If anyone who has been affected by this can try upgrading to latest master and reporting whether or not they see this problem anymore, that would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants