You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm using pgvector as data store, because Chroma won't work either (same problem as in #986 ), so I'm ingesting PDFs via memgpt load directory..., chunking them, getting embeddings from Azure OpenAI and storing them in pgvector.
It works fine with small PDFs, but bigger PDFs fail every time with this error message where the code breaks down in pg8000:
File "/home/arne/src/MemGPT/memgpt/cli/cli_load.py", line 222, in load_directory
store_docs(str(name), docs, user_id)
File "/home/arne/src/MemGPT/memgpt/cli/cli_load.py", line 139, in store_docs
insert_passages_into_source(passages, name, user_id, config)
File "/home/arne/src/MemGPT/memgpt/cli/cli_load.py", line 58, in insert_passages_into_source
storage.insert_many(passages)
File "/home/arne/src/MemGPT/memgpt/agent_store/db.py", line 478, in insert_many
conn.execute(upsert_stmt)
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1416, in execute
return meth(
^^^^^
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 517, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1848, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1988, in _exec_single_context
self._handle_dbapi_exception(
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2347, in _handle_dbapi_exception
raise exc_info[1].with_traceback(exc_info[2])
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
self.dialect.do_execute(
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
cursor.execute(statement, parameters)
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/pg8000/legacy.py", line 254, in execute
self._context = self._c.execute_unnamed(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/pg8000/core.py", line 701, in execute_unnamed
self.send_BIND(NULL_BYTE, params)
File "/home/arne/.cache/pypoetry/virtualenvs/pymemgpt-K6xMi7ln-py3.11/lib/python3.11/site-packages/pg8000/core.py", line 765, in send_BIND
NULL_BYTE + statement_name_bin + h_pack(0) + h_pack(len(params))
^^^^^^^^^^^^^^^^^^^
struct.error: 'h' format requires -32768 <= number <= 32767
Digging deeper the problem seems to be that in ./memgpt/agent_store/db.py the insert_many-method generates SQL-code like this:
INSERT INTO <table> (<columns>....) VALUES (%s %s %s ...)
But there are limits on the number of %s. So when executed into pg8000 with 9 columns (id, user_id, text, doc_id, agent_id, data_source, embedding, embedding_dim, embedding_model, metadata_) with a chunk size of maybe 20,000 it becomes 180,000 which is more than 32,767 and pg8000 will raise the error.
So when generating the statements, we need to bring down the number of %s.
Please describe your setup
How did you install memgpt?
git clone, poetry install
Describe your setup
What's your OS? Linux/WSL2
How are you running memgpt? Terminal/ZSH
The text was updated successfully, but these errors were encountered:
@ArneJanning thanks for reporting this - could you please try the fix in #994 to see if it resolves your issue? You can also wait for the nightly package tomorrow which should include it.
If you get a chance, could you also please let me know how large the PDF file was, and if it was a folder of files or a single file? Then I can try to reproduce the error as well.
@sarahwooders Thank you very much for your quick fix! I made my own little fix and put it into #1004 which calculates and uses the optimal chunk size for pg8000 instead of hard-coding 1,000 which gives us more performance.
I was loading scientific PDFs with about 1,000 pages each in a folder of files, works without problem now.
Describe the bug
I'm using pgvector as data store, because Chroma won't work either (same problem as in #986 ), so I'm ingesting PDFs via
memgpt load directory...
, chunking them, getting embeddings from Azure OpenAI and storing them inpgvector
.It works fine with small PDFs, but bigger PDFs fail every time with this error message where the code breaks down in
pg8000
:Digging deeper the problem seems to be that in
./memgpt/agent_store/db.py
theinsert_many
-method generates SQL-code like this:INSERT INTO <table> (<columns>....) VALUES (%s %s %s ...)
But there are limits on the number of %s. So when executed into
pg8000
with 9 columns (id, user_id, text, doc_id, agent_id, data_source, embedding, embedding_dim, embedding_model, metadata_
) with a chunk size of maybe 20,000 it becomes 180,000 which is more than 32,767 andpg8000
will raise the error.So when generating the statements, we need to bring down the number of %s.
Please describe your setup
git clone
,poetry install
memgpt
? Terminal/ZSHThe text was updated successfully, but these errors were encountered: