You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To add the pg_vector extension to your PostgreSQL container, you'll need to use a PostgreSQL image that includes this extension.
The official PostgreSQL image doesn't include pg_vector by default, so we'll use a custom image that has it pre-installed.
Create a Dockerfile with the following content:
FROM postgres:16
RUN apt-get update && apt-get install -y \
git \
build-essential \
postgresql-server-dev-16
RUN git clone https://github.com/pgvector/pgvector.git && \
cd pgvector && \
make && \
make install
CMD ["postgres"]
Build the custom image named "postgres-with-vector":
docker build -t postgres-with-vector .
Run a container named "postgres-rag" with this custom "postgres-with-vector" image, create the database "rag_example", and open the port 5432 for a Postgrex connection:
Note that we can use directly Postgrex.query! commands instead of using the module RAG.Repo. If we use Postgrex, we can do something like:
{:ok,pg}=Postgrex.start_link(database: "rag_example",type: "RAG.PostgrexTypes")Postgrex.query!(pg,"create extension if not exists vector;",[])Postgrex.query!(pg,"drop table if exists documents;",[])Postgrex.query!(pg,"create table documents ....",[])
We use our "Repo" module".
RAG.Repo.query!("create extension if not exists vector;")RAG.Repo.query!("drop table if exists documents;")
We check in the terminal that the index HNSW method is available:
rag_example=# select * from pg_am where amname='hnsw'16450 | hnsw | hnswhandler | i
We create a table with two columns, "content" and "embedding" where the datatypes are respectively "text" and "vector(384)". The later is because we will be using an embedding model with 384 dimensions (see further).
We create an hnsw index on the "embedding" column using the "cosine" distance.
cf documentation: an HNSW index creates a multilayer graph. It has better query performance than IVFFlat (in terms of speed-recall tradeoff), but has slower build times and uses more memory. Also, an index can be created without any data in the table
RAG.Repo.query!(""" CREATE TABLE IF NOT EXISTS documents ( id SERIAL PRIMARY KEY, content TEXT, embedding vector(384) )""")RAG.Repo.query!("create index if not exists embedding_idx on documents using hnsw (embedding vector_cosine_ops);")
Check in the terminal (that runs psql in the container) the details of the created table "documents" and the indexes we created:
rag_example=# \d documents id | integer | | not null | nextval('documents_id_seq'::regclass) content | text | | | embedding | vector(384) | | |
rag_example=# select * from pg_indexes where tablename='documents' public | documents | documents_pkey | | CREATE UNIQUE INDEX documents_pkey ON public.documents USING btree (id) public | documents | documents_embedding_idx | | CREATE INDEX documents_embedding_idx ON public.documents USING hnsw (embedding vector_cosine_ops)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Install Vector extension to Postgres with Docker
To add the pg_vector extension to your
PostgreSQL
container, you'll need to use aPostgreSQL
image that includes this extension.The official PostgreSQL image doesn't include
pg_vector
by default, so we'll use a custom image that has it pre-installed.Create a Dockerfile with the following content:
Build the custom image named "postgres-with-vector":
docker build -t postgres-with-vector .
Run a container named "postgres-rag" with this custom "postgres-with-vector" image, create the database "rag_example", and open the port 5432 for a
Postgrex
connection:In another terminal, connect to the running "postgres-rag" container and execute
psql
to connect to the "rag_example" database:docker exec -it postgres-rag psql -U postgres -d rag_example
With
Elixir
, we use theEcto.Repo
behaviour. This give an easy DSL (instead of our SQL visPostgrex
):We use our "Repo" module".
We check in the terminal that the index
HNSW
method is available:We create a table with two columns, "content" and "embedding" where the datatypes are respectively "text" and "vector(384)". The later is because we will be using an embedding model with 384 dimensions (see further).
We create an
hnsw
index on the "embedding" column using the "cosine" distance.cf documentation: an HNSW index creates a multilayer graph. It has better query performance than IVFFlat (in terms of speed-recall tradeoff), but has slower build times and uses more memory. Also, an index can be created without any data in the table
Check in the terminal (that runs
psql
in the container) the details of the created table "documents" and the indexes we created:Beta Was this translation helpful? Give feedback.
All reactions