## Find the Docker Images
The `pgvector` docker image can be found [pgvector image](https://hub.docker.com/r/pgvector/pgvector/tags), and pull the specific version of the image.

More details about the `pgvector` can be found on the [git repo](https://github.com/pgvector/pgvector).

## Setup the Docker

we can create a volume to store the data on local machine with:

`docker volume create pgvector-data`

Then start the docker container with:

```Dockerfile
docker \
    run \
        --name pgvector-container \
        -d \
        -v pgvector-data:/var/lib/postgresql/data \
        -p 5432:5432 \
        -e POSTGRES_PASSWORD=test \
        pgvector/pgvector:0.8.0-pg17
```

I am using `pgvector:0.8.0-pg17` version, but you can choose any version you want.

Note that can simply mount a local path to the container instead of creating a volume.


## Connnect to the DB

Check if the port `5432` is open and listening:

`lsof -i :5432 `

Connect to the DB with [pgAdmin](https://www.pgadmin.org/)

Right clicker on the `Servers` and choose `Register`
Then
![epimind](images/epimind.jpg)
![pgadminconn](images/pgadminconn.jpg)

**The password is set as `test` in the above command.**

![pgAdmin](images/pgadmin.jpg)

## Try with Python
Although we can connect the DB simply with [`psycopg2`](https://pypi.org/project/psycopg2) （with `pip install psycopg2`）, we can also
[`pgvector-python`](https://github.com/pgvector/pgvector-python/tree/master) is provided with vector db examples.

we can install `pgvector-python` with:

`pip install pgvector`

`pip install psycopg2`


####  Try with psycopg2

In [None]:
import psycopg2

conn = psycopg2.connect(
  database="vector_database",
  user="postgres",
  password="password",
  host="localhost",
  port=5432
)


In [None]:
cursor = conn.cursor()

In [None]:
cursor.execute(f"SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;")
result = cursor.fetchall()
result

In [None]:
conn.close()

#### Sentence embeddings with SentenceTransformers

Adapted from [Example](https://github.com/pgvector/pgvector-python/blob/master/examples/sentence_transformers/example.py)

`pip install -U sentence-transformers`

In [5]:
from pgvector.psycopg import register_vector
import psycopg
conn = psycopg.connect(dbname='vector_database', 
                       user="postgres",
                      password="password",
                      host="localhost",
                      port=5432,
                       autocommit=True)

conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
register_vector(conn)

conn.execute('DROP TABLE IF EXISTS documents')
conn.execute('CREATE TABLE documents (id bigserial PRIMARY KEY, content text, embedding vector(384))')

<psycopg.Cursor [COMMAND_OK] [IDLE] (host=localhost user=postgres database=vector_database) at 0x1069fbc50>

In [6]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

  from tqdm.autonotebook import tqdm, trange


In [7]:
corpus = [
 'Aromatic amines are quantified by an isotope-dilution gas chromatographic',
'tandem mass spectrometric method (ID GC-MS/MS). Urine samples are collected',
'and stored at approximately -70±10°C. 13C and 2H internal standards are added',
'and the samples are hydrolyzed, cleaned up, and extracted on support liquid extraction (SLE) cartridges.',  
'The analytes are then derivatized to form pentafluoropropionamides, and analyzed by GC/MS/MS, using multiple reaction monitoring (MRM).',
'The analyte concentrations are derived from the ratio of the integrated peaks of native to labeled ions by comparison to a standard curve.'
]
embeddings = model.encode(corpus)

In [8]:
for content, embedding in zip(corpus, embeddings):
    conn.execute('INSERT INTO documents (content, embedding) VALUES (%s, %s)', (content, embedding))

document_id = 1
neighbors = conn.execute('SELECT content FROM documents WHERE id != %(id)s ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = %(id)s) LIMIT 5', {'id': document_id}).fetchall()
for neighbor in neighbors:
    print(neighbor[0])

The analytes are then derivatized to form pentafluoropropionamides, and analyzed by GC/MS/MS, using multiple reaction monitoring (MRM).
The analyte concentrations are derived from the ratio of the integrated peaks of native to labeled ions by comparison to a standard curve.
tandem mass spectrometric method (ID GC-MS/MS). Urine samples are collected
and the samples are hydrolyzed, cleaned up, and extracted on support liquid extraction (SLE) cartridges.
and stored at approximately -70±10°C. 13C and 2H internal standards are added


In [9]:
import pandas as pd
res = conn.execute("Select * from documents")
df = pd.DataFrame(res.fetchall(), columns=['ID', 'Text', 'Vector'])
df.head()

Unnamed: 0,ID,Text,Vector
0,1,Aromatic amines are quantified by an isotope-d...,"[0.045202147, 0.0030057812, -0.02215013, -0.05..."
1,2,tandem mass spectrometric method (ID GC-MS/MS)...,"[-0.011900019, 0.010700991, -0.03438481, 0.003..."
2,3,and stored at approximately -70±10°C. 13C and ...,"[-0.019950623, 0.00401085, -0.120606475, 0.049..."
3,4,"and the samples are hydrolyzed, cleaned up, an...","[-0.06613456, -0.017036632, -0.04517956, -0.04..."
4,5,The analytes are then derivatized to form pent...,"[0.02515953, -0.15884632, -0.08782755, 0.02154..."


In [10]:
conn.close()