Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during indexing large data #35

Closed
vkhizanov opened this issue Aug 29, 2022 · 6 comments
Closed

Error during indexing large data #35

vkhizanov opened this issue Aug 29, 2022 · 6 comments

Comments

@vkhizanov
Copy link

I'm experiencing an unknown error during creating indices for large table. I tried to create index for smaller chunk and it works well, but for larger it throws an error. Could you please suggest where to dig to solve that issue?

osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 10;
CREATE INDEX
osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 1000;
CREATE INDEX
osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 10000000;
CREATE INDEX
osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 1000000000;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

What information should I provide to debug the problem?

@ankane
Copy link
Member

ankane commented Aug 29, 2022

Hey @vkhizanov, thanks for reporting. A few questions to help debug:

  1. What version of pgvector are you on (SELECT * FROM pg_extension WHERE extname = 'vector';)?
  2. What installation method did you use (source, Docker, Homebrew, or PGXN)?
  3. What OS are you on (Ubuntu 20.04, macOS 12, etc)?
  4. What information does the server log provide about the crash?
  5. Are you seeing high memory usage with the process that's creating the index before the crash? (you can get the process id with SELECT pid, query FROM pg_stat_activity; and use a tool like ps to get the memory)

@vkhizanov
Copy link
Author

vkhizanov commented Aug 29, 2022

  1. pgvector version
  oid   | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
--------+---------+----------+--------------+----------------+------------+-----------+--------------
 161913 | vector  |       10 |         2200 | t              | 0.2.7      |           |
  1. Installed from source
  2. Ubuntu 20.04.1 LTS
  3. postgresql-2022-08-29_175607.csv
  4. The process memory consumption increases up to ~88% (out of 4Gb) of RAM memory. Then it remains the stable at that value and fully freed after crash

@ankane
Copy link
Member

ankane commented Aug 29, 2022

Thanks @vkhizanov. From the log, it looks like the server is running out of memory.

server process (PID 86997) was terminated by signal 9: Killed
Failed process was running: CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops);

What's SHOW maintenance_work_mem; return?

Edit: SHOW shared_buffers; would be helpful as well.

Edit 2: Past issue for reference - #7.

@vkhizanov
Copy link
Author

vkhizanov commented Aug 29, 2022

osm=# SHOW maintenance_work_mem;
 maintenance_work_mem
----------------------
 10GB
(1 row)

osm=# SHOW shared_buffers;
 shared_buffers
----------------
 1GB
(1 row)

Should I reduce those values?

@ankane
Copy link
Member

ankane commented Aug 29, 2022

I think that's the cause. shared_buffers look fine (25% of server memory), but I'd decrease maintenance_work_mem to 1GB or less to ensure the server doesn't run out of memory (tuning suggestions).

@vkhizanov
Copy link
Author

@ankane Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants