-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: sqlite3.OperationalError: database or disk is full #1693
Comments
@sachinchawla, you are using a relatively old version of Chroma in which Chroma data was stored internally in the container unless you have- a custom docker compose or docker command with mounts. If you are running on Linux, this might not be a problem, but on Windows and Mac, where docker runs in a VM. |
Traceback (most recent call last):
File "/home/richard/book-mentat/src/chroma_info_custom.py", line 43, in <module>
batch = collection.get()
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 211, in get
get_results = self._client._get(
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
return f(*args, **kwargs)
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/rate_limiting/__init__.py", line 45, in wrapper
return f(self, *args, **kwargs)
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/segment.py", line 517, in _get
records = metadata_segment.get_metadata(
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
return f(*args, **kwargs)
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 216, in get_metadata
return list(self._records(cur, q))
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 225, in _records
cur.execute(sql, params)
sqlite3.OperationalError: database or disk is full database is 37GB - so plenty of memory available - is on a drive with 2TB free - is there some sort of temp space issue problem? chroma 0.2.0 pypi_0 pypi
chroma-hnswlib 0.7.3 pypi_0 pypi
chromadb 0.5.0 pypi_0 pypi
python 3.10.14 hd12c33a_0_cpython conda-forge |
This is on trying to query - database is still allowing data to go in. |
@RichardScottOZ, if you are running in a container, can you run: docker exec -it <container_name_or_id> df -h /chroma/chroma Let's see what your container reports as spare disk size. |
Hi, thanks. Not running in a container, just installed it on a ubuntu server. A note - I thought it could have been the size of the get, so I tried this: Traceback (most recent call last):
File "/home/richard/book-mentat/src/chroma_info_custom_loop.py", line 46, in <module>
ids_only_result = collection.get(include=[])
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 211, in get
get_results = self._client._get(
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
return f(*args, **kwargs)
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/rate_limiting/__init__.py", line 45, in wrapper
return f(self, *args, **kwargs)
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/api/segment.py", line 517, in _get
records = metadata_segment.get_metadata(
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
return f(*args, **kwargs)
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 216, in get_metadata
return list(self._records(cur, q))
File "/home/richard/miniconda3/envs/mentat/lib/python3.10/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 225, in _records
cur.execute(sql, params)
sqlite3.OperationalError: database or disk is full Is there some sort of integer limit or anything this might hit? It is late, I have not looked at the repo code as yet to try and work it out, will do tomorrow. I can query a model using an index fine - so it seems like it is a collection information issue, not a db issue. |
hey @RichardScottOZ, thanks for confirming let's do the following: See how much space you have in persist dir: df -h /path/to/chroma_persist Let's check how much space you have in your df -h /tmp Check the sqlite3 /path/to/chroma_persist/chroma.sqlite3 "PRAGMA max_page_count;" |
the disk chroma is on has 2.5 TB free, tmp has 8 gb |
on page count sqlite3 python? |
@RichardScottOZ, if you are on Linux you can install the sqlite3 library e.g. for Debian-based distros |
yeah, had never needed it - will take a look |
$ sqlite3 /mnt/usb_mount/chroma/Calibre\ Books/chroma.sqlite3 "PRAGMA max_page_count;"
1073741823 quite a big number |
@RichardScottOZ, you are right. Let's examine the nature of your workload now. You said that ingestion is fine, but the query causes an issue. Can you elaborate on your query? Can you share a snippet + how many results do you expect it to return? |
when it started not working, likely had 7000 books? was trying to get the names of all them to list in alpha order where they were up to this is a bit convoluted, but was working previously: batch = collection.get()
print(len(batch))
for b in batch:
print(b)
count = 0
file_dict = {}
for x in range(len(batch["documents"])):
doc = batch["metadatas"][x]
print(doc['file_name'])
count += 1
file_dict[doc['file_name']] = 1
print(count)
print(file_dict)
print(len(file_dict))
sorted_dict = dict(sorted(file_dict.items()))
for key in sorted_dict:
print(key)
print(len(sorted_dict)) |
@RichardScottOZ, ok I think I understand now what might be the culprit here. SQLite uses temp storage for large result sets. In your case it ends up in In the meantime, can I ask you to try and paginate your |
So temp space as considered above. Will try the above tomorrow thanks. |
splitting into sizeable chunks worked for the above use anyway, thanks |
What happened?
What Happened:
Expected Behavior:
Versions
ChromaDB V 0.4.9
Python 3.10
Relevant log output
The text was updated successfully, but these errors were encountered: