Skip to content

Pool timed out while waiting for an open connection, ZFS #952

@happenslol

Description

@happenslol

Edit by @ellie:

This has become the canonical issue for Atuin/ZFS issues

If you're using ZFS with Atuin, you have likely noticed an error such as the following:

Error: pool timed out while waiting for an open connection

Location:
    /home/runner/work/atuin/atuin/crates/atuin-client/src/record/sqlite_store.rs:48:20

This is due to an issue with ZFS and SQLite. See: openzfs/zfs#14290

There are two workarounds

  1. Use the Atuin daemon

This has not yet been released as stable, however is mostly without issue. The daemon takes all SQLite writes off of the hot path, therefore avoiding the issue.

Follow the steps here: #952 (comment)

  1. Create an ext4 zvol for Atuin

Follow the following steps: #952 (comment)


I've just begun using atuin, and I absolutely love it so far. However, there's been a recurring issue for me, which I've found hard to diagnose:

My prompt regularly blocks for between 500ms to 5s whenever I run a command. I've narrowed this down to the _atuin_preexec function, by manually importing the shell hook generated from atuin init zsh and annotating it with logging and time calls. Here's a sample time call from a time where it hang:

Running pre-exec for cd ~

0.00user 0.00system 0:04.93elapsed 0%CPU (0avgtext+0avgdata 8192maxresident)k
52036inputs+1064outputs (15major+512minor)pagefaults 0swaps

Pre-exec done for cd ~

Here's how I modified the hook to get the result:

_atuin_preexec() {
    log "Running pre-exec for $1\n" >> /tmp/atuin.log
    local id
    id=$(/usr/bin/time -a -o /tmp/atuin.log atuin history start -- "$1")
    export ATUIN_HISTORY_ID="$id"
    echo "\nPre-exec done for $1" >> /tmp/atuin.log
}

I've tried to replicate the behavior in cli use outside of the hook using hyperfine, and was successful:

» hyperfine -r 1000 "atuin search --limit 5"
Benchmark 1: atuin search --limit 5
  Time (mean ± σ):      18.3 ms ± 114.8 ms    [User: 4.9 ms, System: 8.2 ms]
  Range (min … max):    12.5 ms … 2587.9 ms    1000 runs

This does not happen on every benchmark, even with 1000 runs. My initial thought was that this has to be contention on the database file, but I saw that you're already using WAL, so concurrent writes/reads should not be a problem. I can also trigger the delay by repeatedly opening the search widget, which should not even be doing writes to the database, which confuses me even more.

Do you have any idea on how I could gather further data on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions