Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archivebox hangs when initializing collection on network drive that doesn't support FSYNC #742

Closed
nguyenhaiac opened this issue May 9, 2021 · 3 comments
Labels
status: done Work is completed and released (or scheduled to be released in the next version) touches: data/schema/architecture type: support

Comments

@nguyenhaiac
Copy link

My setup:

  • Odroid HC4 running dietpi as a Nas
  • 2 HDDs run in mergerfs pool
  • This is shared to the network via nfs

Issue:

  • I mount the nfs share to my main pc and attempt to run archivebox on main pc. The init process hang at creating sql database and initial migration. No error or termination, just hang.
  • I then install archivebox on the Odroid using docker-compose then it's fine, but I want to use it on my main pc since the odroid is stretching too much already.
@pirate
Copy link
Member

pirate commented May 10, 2021

Your network share has to be able to support FSYNC, if it does not then you'll have to put the index.sqlite3 file on a local drive and only put the archive/ sub-folder on the network drive.

See: https://github.com/ArchiveBox/ArchiveBox#storage-requirements

Most network drives support FSYNC if you configure them to, check the NFS/SAMBA docs to see how to set up FSYNC-compatible shares for your OS/NFS version.

This is for data integrity reasons. Too many users in the past accidentally corrupted their archives by running concurrent archivebox threads on network filesystems that ended up clobbering each other's indexes, so now we require the filesystem where the index is stored to support atomic writes. This is for my own support sanity, and to prevent users from accidentally corrupting their indexes. If you want you can hack around it (see archivebox/system.py:atomic_write), but I cannot officially support those use cases / handhold people toward setting it up (because it's often dangerous).

Note: archive/ contains all the archived assets (and is responsible for the bulk of the disk usage), and can still be on a non-fsync compatible network share. e.g. this should work:

./                               # ArchiveBox data folder
    index.sqlite3                # must be on local SSD/HDD (make sure to back it up still)
    ArchiveBox.conf              # must be on local SSD/HDD
    sources/                     # ok to put on network mount
        ...
    archive/                     # ok to put on network mount
        ...

See here for more info:

@pirate pirate changed the title Archivebox hang on initial run on network drive Archivebox hangs when initializing collection on network drive that doesn't support FSYNC May 13, 2021
@pirate pirate added type: support status: needs followup Work is stalled awaiting a follow-up from the original issue poster or ArchiveBox maintainers touches: data/schema/architecture labels May 13, 2021
@nguyenhaiac
Copy link
Author

It's not a bug and there is a work around.

@pirate pirate added status: done Work is completed and released (or scheduled to be released in the next version) and removed status: needs followup Work is stalled awaiting a follow-up from the original issue poster or ArchiveBox maintainers labels May 19, 2021
@pirate
Copy link
Member

pirate commented Apr 12, 2022

Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting

Contributions/suggestions welcome there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: done Work is completed and released (or scheduled to be released in the next version) touches: data/schema/architecture type: support
Projects
None yet
Development

No branches or pull requests

2 participants