Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: django.db.utils.DatabaseError: database disk image is malformed #955

Closed
terxw opened this issue Mar 24, 2022 · 2 comments
Closed

Bug: django.db.utils.DatabaseError: database disk image is malformed #955

terxw opened this issue Mar 24, 2022 · 2 comments

Comments

@terxw
Copy link

terxw commented Mar 24, 2022

Describe the bug

after running archivebox from python installation with the same config I cannot run archivebox in docker because of db error.
pragma integrity check of index.sqlite is without error

Steps to reproduce

cd /storage/data/docs/archivebox
docker-compose up -d
docker-compose run archivebox init

Screenshots or log output

archivebox is up-to-date                                                                                                                                                                      
archivebox_sonic_1 is up-to-date                                                                                                                                                              
[i] [2022-03-24 20:37:27] ArchiveBox v0.6.2: archivebox init                                                                                                                                  
    > /data                                                                                                                                                                                   
                                                                                                                                                                                              
[^] Verifying and updating existing ArchiveBox collection to v0.6.2...                                                                                                                        
----------------------------------------------------------------------                                                                                                                        
                                                                                                                                                                                              
[*] Verifying archive folder structure...                                                                                                                                                     
    + ./archive, ./sources, ./logs...                                                                                                                                                         
    + ./ArchiveBox.conf...                                                                                                                                                                    
                                                                                                                                                                                              
[*] Verifying main SQL index and running any migrations needed...                                                                                                                             
Traceback (most recent call last):                                                                                                                                                            
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 82, in _execute                                                                                             
    return self.cursor.execute(sql)                                                                                                                                                           
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 411, in execute                                                                                      
    return Database.Cursor.execute(self, query)                                                                                                                                               
sqlite3.DatabaseError: database disk image is malformed                                                                                                                                       
                                                                                                                                                                                              
The above exception was the direct cause of the following exception:                                                                                                                          
                                                                                                                                                                                              
Traceback (most recent call last):                                                                                                                                                            
  File "/usr/local/bin/archivebox", line 33, in <module>                                                                                                                                      
    sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())                                                                                                               
  File "/app/archivebox/cli/__init__.py", line 140, in main                                                                                                                                   
    run_subcommand(  
  File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand                                                                                                                          
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore                                                                                                                 
  File "/app/archivebox/cli/archivebox_init.py", line 43, in main                                                                                                                             
    init(                                                                                                                                                                                     
  File "/app/archivebox/util.py", line 114, in typechecked_function                                                                                                                           
    return func(*args, **kwargs)                                                                                                                                                              
  File "/app/archivebox/main.py", line 328, in init                                                                                                                                           
    for migration_line in apply_migrations(out_dir):                                                                                                                                          
  File "/app/archivebox/util.py", line 114, in typechecked_function                                                                                                                           
    return func(*args, **kwargs)                                                                                                                                                              
  File "/app/archivebox/index/sql.py", line 137, in apply_migrations                                                                                                                          
    call_command("makemigrations", interactive=False, stdout=null)                                                                                                                            
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 168, in call_command                                                                                 
    return command.execute(*args, **defaults)                                                                                                                                                 
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 371, in execute                                                                                          
    output = self.handle(*args, **options)                                                                                                                                                    
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 85, in wrapped                                                                                           
    res = handle_func(*args, **kwargs)                                                                                                                                                        
  File "/usr/local/lib/python3.9/site-packages/django/core/management/commands/makemigrations.py", line 101, in handle                                                                        
    loader.check_consistent_history(connection)                                                                                                                                               
  File "/usr/local/lib/python3.9/site-packages/django/db/migrations/loader.py", line 290, in check_consistent_history                                                                         
    applied = recorder.applied_migrations()                                                                                                                                                   
  File "/usr/local/lib/python3.9/site-packages/django/db/migrations/recorder.py", line 77, in applied_migrations                                                                              
    if self.has_table():                                                                                                                                                                      
  File "/usr/local/lib/python3.9/site-packages/django/db/migrations/recorder.py", line 56, in has_table                                                                                       
    tables = self.connection.introspection.table_names(cursor)                                                                                                                                
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/introspection.py", line 48, in table_names                                                                             
    return get_names(cursor)                                                                                                                                                                  
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/introspection.py", line 43, in get_names                                                                               
    return sorted(ti.name for ti in self.get_table_list(cursor)                                                                                                                               
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/introspection.py", line 74, in get_table_list                                                                       
    cursor.execute("""                                                                                                                                                                        
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute                                                                                              
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)                                                                                                       
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers                                                                               
    return executor(sql, params, many, context)                                                                                                                                               
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute                                                                                             
    return self.cursor.execute(sql, params)                                                                                                                                                   
  File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__                                                                                                      
    raise dj_exc_value.with_traceback(traceback) from exc_value                                                                                                                               
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 82, in _execute                                                                                             
    return self.cursor.execute(sql)                                                                                                                                                           
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 411, in execute                                                                                      
    return Database.Cursor.execute(self, query)                                                                                                                                               
django.db.utils.DatabaseError: database disk image is malformed 

ArchiveBox version

archivebox is up-to-date
archivebox_sonic_1 is up-to-date
ArchiveBox v0.6.2
Cpython Linux Linux-5.13.0-25-generic-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=sonic

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                    
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9                                                     
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                                
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                                
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node                                                                
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                 
 -  YOUTUBEDL_BINARY      -               disabled  /usr/local/bin/youtube-dl                                                    
 √  CHROME_BINARY         v90.0.4430.93   valid     /usr/bin/chromium                                                            
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                                                                               
                                                                                                                                                                                              
[i] Source-code locations:                                                                                                                                                                    
 √  PACKAGE_DIR           22 files        valid     /app/archivebox                                                                                                                           
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                                                                                 
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                                                                                            
                                                                                                                                                                                              
[i] Secrets locations:                                                                                                                                                                        
 √  CHROME_USER_DATA_DIR  49 files        valid     ./chrome_user_dir                                                                                                                         
 √  COOKIES_FILE          342.3 KB        valid     ./cookies.txt                                                                                                                             
                                                                                                                                                                                              
[i] Data locations:                                                                                                                                                                           
 √  OUTPUT_DIR            20 files        valid     /data                                                                                                                                     
 √  SOURCES_DIR           601 files       valid     ./sources                                                                                                                                 
 √  LOGS_DIR              1 files         valid     ./logs                                                                                                                                    
 √  ARCHIVE_DIR           9601 files      valid     ./archive                                                                                                                                 
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                                                                                         
 √  SQL_INDEX             511.2 MB        valid     ./index.sqlite3 

archivebox venv config

[DEFAULT]
IS_TTY=False
USE_COLOR=False
SHOW_PROGRESS=False
IN_DOCKER=False
ONLY_NEW=True
TIMEOUT=60
MEDIA_TIMEOUT=3600
OUTPUT_PERMISSIONS=755
RESTRICT_FILE_NAMES=windows
URL_BLACKLIST=\.(css|js|otf|ttf|woff|woff2|gstatic\.com|googleapis\.com/css)(\?.*)?$
BIND_ADDR=127.0.0.1:8000
ALLOWED_HOSTS=*
DEBUG=False
SNAPSHOTS_PER_PAGE=40
CUSTOM_TEMPLATES_DIR=None
TIME_ZONE=UTC
SAVE_TITLE=True
SAVE_FAVICON=True
SAVE_WGET=True
SAVE_WGET_REQUISITES=True
SAVE_SINGLEFILE=True
SAVE_READABILITY=True
SAVE_MERCURY=True
SAVE_PDF=True
SAVE_SCREENSHOT=True
SAVE_DOM=True
SAVE_HEADERS=True
SAVE_WARC=True
SAVE_GIT=True
SAVE_MEDIA=False
SAVE_ARCHIVE_DOT_ORG=True
RESOLUTION=1440,2000
CHECK_SSL_VALIDITY=True
MEDIA_MAX_SIZE=750m
CURL_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.68.0 (x86_64-pc-linux-gnu)
WGET_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.20.3
CHROME_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)
CHROME_HEADLESS=True
CHROME_SANDBOX=True
USE_INDEXING_BACKEND=True
USE_SEARCHING_BACKEND=True
SEARCH_BACKEND_ENGINE=ripgrep
SEARCH_BACKEND_HOST_NAME=localhost
SEARCH_BACKEND_PORT=1491
SEARCH_BACKEND_PASSWORD=xxxxxxxxxx
SONIC_COLLECTION=archivebox
SONIC_BUCKET=snapshots
SEARCH_BACKEND_TIMEOUT=90
FETCH_TITLE=True
#FETCH_FAVICON=True
FETCH_WGET=True
FETCH_WARC=True
FETCH_PDF=True
FETCH_SCREENSHOT=True
FETCH_DOM=True
FETCH_GIT=True
FETCH_MEDIA=false
SUBMIT_ARCHIVE_DOT_ORG=True
USE_SINGLEFILE=True
USE_CURL=True
USE_WGET=True
USE_READABILITY=True
USE_MERCURY=True
USE_GIT=True
USE_CHROME=True
USE_NODE=True
USE_YOUTUBEDL=True
USE_RIPGREP=True
CURL_BINARY=/usr/bin/curl
GIT_BINARY=/usr/bin/git
WGET_BINARY=/usr/bin/wget
YOUTUBEDL_BINARY=/usr/local/bin/youtube-dl
SECRET_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
SINGLEFILE_BINARY=/home/kangus/node_modules/single-file/cli/single-file
READABILITY_BINARY=/home/kangus/node_modules/readability-extractor/readability-extractor
MERCURY_BINARY=/home/kangus//node_modules/@postlight/mercury-parser/cli.js
NODE_BINARY=node
RIPGREP_BINARY=rg
CHROME_BINARY=/usr/bin/google-chrome-stable
COOKIES_FILE=/mnt/nfs/OMV_MERGEFS/data/docs/archivebox/cookies.txt
CHROME_USER_DATA_DIR=/mnt/nfs/OMV_MERGEFS/data/docs/archivebox/google-chrome-stable
POCKET_CONSUMER_KEY=None
USER=kangus
PACKAGE_DIR=/home/kangus/.local/lib/python3.8/site-packages/archivebox
TEMPLATES_DIR=/home/kangus/.local/lib/python3.8/site-packages/archivebox/templates
ARCHIVE_DIR=/mnt/nfs/OMV_MERGEFS/data/docs/archivebox/archive
SOURCES_DIR=/mnt/nfs/OMV_MERGEFS/data/docs/archivebox/sources
LOGS_DIR=/mnt/nfs/OMV_MERGEFS/data/docs/archivebox/logs
URL_BLACKLIST_PTN=re.compile('\\.(css|js|otf|ttf|woff|woff2|gstatic\\.com|googleapis\\.com/css)(\\?.*)?$', re.IGNORECASE|re.MULTILINE)
ARCHIVEBOX_BINARY=/home/kangus/.local/bin/archivebox
WGET_AUTO_COMPRESSION=True


archivebox docker config

# Usage:
#     docker-compose up -d
#     docker-compose run archivebox init
#     echo "https://example.com" | docker-compose run archivebox archivebox add
#     docker-compose run archivebox add --depth=1 https://example.com/some/feed.rss
#     docker-compose run archivebox config --set PUBLIC_INDEX=True
# Documentation:
#     https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#docker-compose
version: "3.7"
services:
    archivebox:
        container_name: archivebox
        # build: .
        image: archivebox/archivebox:latest
        command: server 0.0.0.0:8000
        stdin_open: true
        tty: true
        ports:
            - 8000:8000
        environment:
            - PGID=${PGID}
            - PUID=${PUID}
            - USE_COLOR=True
            - SHOW_PROGRESS=False
            - ONLY_NEW=True
            - DEBUG=True
            - TIMEOUT=180
            - DOCKER_CLIENT_TIMEOUT=120
            - COMPOSE_HTTP_TIMEOUT=120
            - MEDIA_TIMEOUT=3600
            - FETCH_TITLE=True
            - FETCH_WGET=True
            - FETCH_WARC=True
            - FETCH_PDF=True
            - FETCH_SCREENSHOT=True
            - FETCH_DOM=True
            - FETCH_GIT=True
            - FETCH_MEDIA=false
            - SUBMIT_ARCHIVE_DOT_ORG=True
            - USE_SINGLEFILE=True
            - CHECK_SSL_VALIDITY=False
            - FETCH_WGET_REQUISITES=True
            - RESOLUTION=1920,1080
            - SAVE_READABILITY=True
            - WGET_ARGS="--no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --inet4-only"
            - WGET_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36
            - CHROME_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36
            - CHROME_HEADLESS=True
            - CHROME_USER_DATA_DIR=/data/chrome_user_dir
            - SECRET_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx"
            #- SINGLEFILE_BINARY="/home/kangus/node_modules/single-file/cli/single-file"
            - MAX_URL_ATTEMPTS=5
            - SAVE_TITLE=True
            - SAVE_PDF=True
            - SAVE_WARC=True
            - SAVE_WGET=True
            - SAVE_SINGLEFILE=True
            - SEARCH_BACKEND_ENGINE=sonic
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD=XXXXXXXXXXXX
            - COOKIES_FILE=/data/cookies.txt
        volumes:
            - /etc/localtime:/etc/localtime:ro
            - /storage/data/docs/archivebox:/data
            - /storage/data/docs/archivebox/sonic.cfg:/etc/sonic.cfg:ro
            - /storage/data/docs/archivebox/data/sonic:/var/lib/sonic/store
    sonic:
       image: valeriansaliou/sonic:v1.3.0
       expose:
           - 1491
       environment:
#           - PGID=1000
#           - PUID=1000
           - PUID=${PUID}
           - PGID=${PGID}
           - SEARCH_BACKEND_PASSWORD=XXXXXXXXXXXX
       volumes:
           - /storage/data/docs/archivebox/sonic.cfg:/etc/sonic.cfg:ro
           - /storage/data/docs/archivebox/data/sonic:/var/lib/sonic/store
# docker network create -d bridge my-network
networks:
  my-network:
    external: true

@pirate
Copy link
Member

pirate commented Mar 24, 2022

Thanks for posting all the relevant info! Finally someone who knows how to open a good issue lol.

Are you able to open the SQLite3 file manually with $ sqlite3 index.sqlite3 or archivebox manage dbshell?

Unfortunately I've never seen this error before, don't know exactly how to fix it other than dumping and trying to rebuild the sqlite3 database.
You can try these instructions to do that with the index.sqlite3 file:
https://stackoverflow.com/questions/5274202/sqlite3-database-or-disk-is-full-the-database-disk-image-is-malformed

@terxw
Copy link
Author

terxw commented Mar 24, 2022

Thank You!
Worked like a charm!
Although I did pragma integrity_check; in DB Browser for sqlite, which was ok, also sqlite3 from my pc with path to index.sqlite3 on network share and integrity check was ok.
When i run sqlite3 index.sqlite3 localy on machine where archivebox is running, integrity check was not ok...

@terxw terxw closed this as completed Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants