## Find the Docker Images
The `pgvector` docker image can be found [pgvector image](https://hub.docker.com/r/pgvector/pgvector/tags), and pull the specific version of the image.

More details about the `pgvector` can be found on the [git repo](https://github.com/pgvector/pgvector).

## Setup the Docker

we can create a volume to store the data on local machine with:

`docker volume create pgvector-data`

Then start the docker container with:

```Dockerfile
docker \
    run \
        --name epimind \
        -d \
        -v <YOUR LOCAL CODE PATH>:/root/ \
        -p 5432:5432 \
        -p 8888:8888 \
        -e POSTGRES_PASSWORD=password \
        ainilaha/epimind:latest
```

There is `epmind_db` in the docker.


## Try with Python
we can try
[`pgvector-python`](https://github.com/pgvector/pgvector-python/tree/master) as provided with vector db examples.



#### Sentence embeddings with SentenceTransformers

Adapted from [Example](https://github.com/pgvector/pgvector-python/blob/master/examples/sentence_transformers/example.py)

In [7]:
from pgvector.psycopg import register_vector
import psycopg
conn = psycopg.connect(dbname='epimind_db', 
                       user="postgres",
                       password="password",
                       host="localhost",
                       port=5432,
                       autocommit=True)



In [8]:
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
register_vector(conn)

conn.execute('DROP TABLE IF EXISTS documents')
conn.execute('CREATE TABLE documents (id bigserial PRIMARY KEY, content text, embedding vector(384))')

<psycopg.Cursor [COMMAND_OK] [IDLE] (host=localhost user=postgres database=epimind_db) at 0xffff690efdd0>

In [9]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

  from tqdm.autonotebook import tqdm, trange


In [10]:
corpus = [
 'Aromatic amines are quantified by an isotope-dilution gas chromatographic',
'tandem mass spectrometric method (ID GC-MS/MS). Urine samples are collected',
'and stored at approximately -70±10°C. 13C and 2H internal standards are added',
'and the samples are hydrolyzed, cleaned up, and extracted on support liquid extraction (SLE) cartridges.',  
'The analytes are then derivatized to form pentafluoropropionamides, and analyzed by GC/MS/MS, using multiple reaction monitoring (MRM).',
'The analyte concentrations are derived from the ratio of the integrated peaks of native to labeled ions by comparison to a standard curve.'
]
embeddings = model.encode(corpus)

In [11]:
for content, embedding in zip(corpus, embeddings):
    conn.execute('INSERT INTO documents (content, embedding) VALUES (%s, %s)', (content, embedding))

document_id = 1
neighbors = conn.execute('SELECT content FROM documents WHERE id != %(id)s ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = %(id)s) LIMIT 5', {'id': document_id}).fetchall()
for neighbor in neighbors:
    print(neighbor[0])

The analytes are then derivatized to form pentafluoropropionamides, and analyzed by GC/MS/MS, using multiple reaction monitoring (MRM).
The analyte concentrations are derived from the ratio of the integrated peaks of native to labeled ions by comparison to a standard curve.
tandem mass spectrometric method (ID GC-MS/MS). Urine samples are collected
and the samples are hydrolyzed, cleaned up, and extracted on support liquid extraction (SLE) cartridges.
and stored at approximately -70±10°C. 13C and 2H internal standards are added


In [12]:
import pandas as pd
res = conn.execute("Select * from documents")
df = pd.DataFrame(res.fetchall(), columns=['ID', 'Text', 'Vector'])
df.head()

Unnamed: 0,ID,Text,Vector
0,1,Aromatic amines are quantified by an isotope-d...,"[0.045202095, 0.0030057423, -0.022150079, -0.0..."
1,2,tandem mass spectrometric method (ID GC-MS/MS)...,"[-0.011899977, 0.010701008, -0.034384813, 0.00..."
2,3,and stored at approximately -70±10°C. 13C and ...,"[-0.019950543, 0.0040108133, -0.12060646, 0.04..."
3,4,"and the samples are hydrolyzed, cleaned up, an...","[-0.06613452, -0.017036635, -0.04517958, -0.04..."
4,5,The analytes are then derivatized to form pent...,"[0.025159566, -0.1588463, -0.08782749, 0.02154..."


In [13]:
conn.close()