Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: sqlite3.OperationalError: database or disk is full #1693

Open
ghost opened this issue Feb 2, 2024 · 16 comments
Open

[Bug]: sqlite3.OperationalError: database or disk is full #1693

ghost opened this issue Feb 2, 2024 · 16 comments
Labels
bug Something isn't working deployment

Comments

@ghost
Copy link

ghost commented Feb 2, 2024

What happened?

What Happened:

  • Encountered an error with a SQLite database in a Docker container environment.
  • The error message was sqlite3.OperationalError: database or disk is full.
  • This issue occurred despite the host machine having sufficient disk space.
  • The SQLite database file size was found to be approximately 4.1 GB.
  • The Docker container settings and host machine settings were checked for potential causes of the error.

Expected Behavior:

  • The SQLite database should operate without encountering a 'disk is full' error, especially considering that the host machine had adequate disk space.
  • Given the size of the SQLite file (4.1 GB) and the typical capabilities of SQLite and the Docker environment, normal database operations such as data insertion, updating, and querying were expected to occur without errors related to disk space.
  • The expectation was that the Docker container's configuration and the host system's file system would support the operation of a database of this size without triggering disk space-related errors.

Versions

ChromaDB V 0.4.9
Python 3.10

Relevant log output

sqlite3.OperationalError: database or disk is full
INFO:     [02-02-2024 04:10:33] 3.131.62.47:40862 - "POST /api/v1/collections/559a54f0-9471-48af-98af-4d19c5fbd2db/add HTTP/1.1" 500
INFO:     [02-02-2024 04:10:33] 3.131.62.47:40862 - "POST /api/v1/collections/559a54f0-9471-48af-98af-4d19c5fbd2db/query HTTP/1.1" 200
ERROR:    [02-02-2024 04:10:34] database or disk is full
@ghost ghost added the bug Something isn't working label Feb 2, 2024
@tazarov
Copy link
Contributor

tazarov commented Feb 5, 2024

@sachinchawla, you are using a relatively old version of Chroma in which Chroma data was stored internally in the container unless you have- a custom docker compose or docker command with mounts. If you are running on Linux, this might not be a problem, but on Windows and Mac, where docker runs in a VM.

@RichardScottOZ
Copy link

Traceback (most recent call last):
  File "/home/richard/book-mentat/src/chroma_info_custom.py", line 43, in <module>
    batch = collection.get()
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 211, in get
    get_results = self._client._get(
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/rate_limiting/__init__.py", line 45, in wrapper
    return f(self, *args, **kwargs)
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/segment.py", line 517, in _get
    records = metadata_segment.get_metadata(
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 216, in get_metadata
    return list(self._records(cur, q))
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 225, in _records
    cur.execute(sql, params)
sqlite3.OperationalError: database or disk is full

database is 37GB - so plenty of memory available - is on a drive with 2TB free - is there some sort of temp space issue problem?

chroma                    0.2.0                    pypi_0    pypi
chroma-hnswlib            0.7.3                    pypi_0    pypi
chromadb                  0.5.0                    pypi_0    pypi
python                    3.10.14         hd12c33a_0_cpython    conda-forge

@RichardScottOZ
Copy link

This is on trying to query - database is still allowing data to go in.

@tazarov
Copy link
Contributor

tazarov commented May 8, 2024

@RichardScottOZ, if you are running in a container, can you run:

docker exec -it <container_name_or_id>  df -h /chroma/chroma

Let's see what your container reports as spare disk size.

@RichardScottOZ
Copy link

Hi, thanks. Not running in a container, just installed it on a ubuntu server.

A note - I thought it could have been the size of the get, so I tried this:

Traceback (most recent call last):
  File "/home/richard/book-mentat/src/chroma_info_custom_loop.py", line 46, in <module>
    ids_only_result = collection.get(include=[])
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 211, in get
    get_results = self._client._get(
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/rate_limiting/__init__.py", line 45, in wrapper
    return f(self, *args, **kwargs)
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/segment.py", line 517, in _get
    records = metadata_segment.get_metadata(
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 216, in get_metadata
    return list(self._records(cur, q))
  File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 225, in _records
    cur.execute(sql, params)
sqlite3.OperationalError: database or disk is full

Is there some sort of integer limit or anything this might hit? It is late, I have not looked at the repo code as yet to try and work it out, will do tomorrow.

I can query a model using an index fine - so it seems like it is a collection information issue, not a db issue.

@tazarov
Copy link
Contributor

tazarov commented May 8, 2024

hey @RichardScottOZ, thanks for confirming let's do the following:

See how much space you have in persist dir:

df -h /path/to/chroma_persist

Let's check how much space you have in your /tmp although I'm skeptical sqlite3 uses it:

df -h /tmp

Check the max_page_count of the SQLite:

sqlite3 /path/to/chroma_persist/chroma.sqlite3 "PRAGMA max_page_count;"

@RichardScottOZ
Copy link

the disk chroma is on has 2.5 TB free, tmp has 8 gb

@RichardScottOZ
Copy link

on page count sqlite3 python?

@tazarov
Copy link
Contributor

tazarov commented May 9, 2024

@RichardScottOZ, if you are on Linux you can install the sqlite3 library e.g. for Debian-based distros sudo apt update && sudo apt install sqlite3 then sqlite3 executable will be in your path. Once installed, you can copy and paste (adjust the path) the above example.

@RichardScottOZ
Copy link

yeah, had never needed it - will take a look

@RichardScottOZ
Copy link

RichardScottOZ commented May 9, 2024

$ sqlite3 /mnt/usb_mount/chroma/Calibre\ Books/chroma.sqlite3 "PRAGMA max_page_count;"
1073741823

quite a big number

@tazarov
Copy link
Contributor

tazarov commented May 9, 2024

@RichardScottOZ, you are right. 1073741823 pages * 4096 bytes per page ~ 4.4TB max size of the sqlite3 file. So the size of your sqlite3 file (37GB) is not a problem and we can rule it out.

Let's examine the nature of your workload now. You said that ingestion is fine, but the query causes an issue. Can you elaborate on your query? Can you share a snippet + how many results do you expect it to return?

@RichardScottOZ
Copy link

RichardScottOZ commented May 9, 2024

when it started not working, likely had 7000 books? was trying to get the names of all them to list in alpha order where they were up to

this is a bit convoluted, but was working previously:

batch = collection.get()

print(len(batch))

for b in batch:
    print(b)

count = 0
file_dict = {}
for x in range(len(batch["documents"])):
    doc = batch["metadatas"][x]
    print(doc['file_name'])
    count += 1
    file_dict[doc['file_name']] = 1

print(count)    

print(file_dict)
print(len(file_dict))

sorted_dict = dict(sorted(file_dict.items()))
for key in sorted_dict:
    print(key)

print(len(sorted_dict))    

@tazarov
Copy link
Contributor

tazarov commented May 9, 2024

@RichardScottOZ, ok I think I understand now what might be the culprit here. SQLite uses temp storage for large result sets. In your case it ends up in /tmp (see https://www.sqlite.org/tempfiles.html). On a 37GB DB, there is a good chance that your collection.get() returns a huge number of results, thus overflowing /tmp storage capacity (hence the error). It is possible to specify the temp path via PRAGMA, but that is a code change in Chroma that we need to consider further.

In the meantime, can I ask you to try and paginate your collection.get() (see this code snippet for inspiration - https://cookbook.chromadb.dev/core/collections/#cloning-a-collection). Let me know the results.

@RichardScottOZ
Copy link

So temp space as considered above. Will try the above tomorrow thanks.

@RichardScottOZ
Copy link

splitting into sizeable chunks worked for the above use anyway, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deployment
Projects
None yet
Development

No branches or pull requests

3 participants