Skip to content

[Feature Request] PgVectorScale vs PGVector for performance #95

@qdrddr

Description

@qdrddr

Purpose of the feature.
PgVectorScale complements pgvector, the open-source vector data extension for PostgreSQL. Please consider adding it to the documendb package.
https://medium.com/@simeon.emanuilov/fee1f9349efc

Describe the solution you'd like
Add the PgVectorScale extension to this PG package to be built-in.

Describe alternatives you've considered
Use other PG implementations with PgVectorScale built-in, add the extension manually or use slower PGVector.

Additional context
Testing it works:

  1. Create a table with an embedding column:
CREATE TABLE IF NOT EXISTS document_embedding (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    metadata JSONB,
    contents TEXT,
    embedding VECTOR(1536)
);
  1. Populate the table with your data, including the vector embeddings. You can use the same clients and methods as pgvector for this step.
  2. Create a StreamingDiskANN index on the embedding column:
CREATE INDEX document_embedding_idx ON document_embedding
USING diskann (embedding);
  1. Query the table to find the closest embeddings using the index:
SELECT *
FROM document_embedding
ORDER BY embedding <=> $1
LIMIT 10;

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions