Download the pre-computed embeddings file embeddings.csv.gz
from the following location on Google Drive.
https://drive.google.com/file/d/1XQa74OCJlun2e3CVbyhOuaMu7Mq6Ww3l/view?usp=sharing
Create the file setenv.sh
in the following way. You're free to choose whatever values you like. Note the following.
SAMPLES
is the number of(title, embedding)
pairs to import into the database.TRANSACTIONS
is the actual number of benchmark operations to perform, referring to the number of queries to make.APIKEY
is the API Key for AnnDB, which also is being evaluated here.DATASET
is the name of a dataset in AnnDB. See the documentation there for more information.
export PGHOST=localhost
export PGPORT=6432
export PGDATABASE=pgbench
export PGUSER=pgbench
export PGPASSWORD=pgbench
export SAMPLES=10000000
export TRANSACTIONS=10
export APIKEY=<redacted>
export DATASET='wikipedia-titles'
In one terminal window, run the following command to run an instance of PostgreSQL in a Docker image, using the environment variables established in setenv.sh
.
source setenv.sh && docker run -e POSTGRES_DB=$PGDATABASE -e POSTGRES_USER=$PGUSER -e POSTGRES_PASSWORD=$PGPASSWORD -p $PGPORT:5432 postgres:13.2
In another terminal window, run the following command to run the benchmark, using the same environment variables established in setenv.sh
.
source setenv.sh && time make clean report.txt