Skip to content

dventimiglia/PGNNS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pgnns

Quick Start

Download Embeddings

Download the pre-computed embeddings file embeddings.csv.gz from the following location on Google Drive.

https://drive.google.com/file/d/1XQa74OCJlun2e3CVbyhOuaMu7Mq6Ww3l/view?usp=sharing

Create Environment Script

Create the file setenv.sh in the following way. You're free to choose whatever values you like. Note the following.

  • SAMPLES is the number of (title, embedding) pairs to import into the database.
  • TRANSACTIONS is the actual number of benchmark operations to perform, referring to the number of queries to make.
  • APIKEY is the API Key for AnnDB, which also is being evaluated here.
  • DATASET is the name of a dataset in AnnDB. See the documentation there for more information.
export PGHOST=localhost
export PGPORT=6432
export PGDATABASE=pgbench
export PGUSER=pgbench
export PGPASSWORD=pgbench
export SAMPLES=10000000
export TRANSACTIONS=10
export APIKEY=<redacted>
export DATASET='wikipedia-titles'

Start PostgreSQL Database

In one terminal window, run the following command to run an instance of PostgreSQL in a Docker image, using the environment variables established in setenv.sh.

source setenv.sh && docker run -e POSTGRES_DB=$PGDATABASE -e POSTGRES_USER=$PGUSER -e POSTGRES_PASSWORD=$PGPASSWORD -p $PGPORT:5432 postgres:13.2

Run Benchmark

In another terminal window, run the following command to run the benchmark, using the same environment variables established in setenv.sh.

source setenv.sh && time make clean report.txt 

Setup

Install Conda

Create Conda Environment

Install Conda Packages

Set up Kaggle CLI

Create Environment Script

Start PostgreSQL Database

Run Benchmark

About

PostgreSQL Nearest Neighbor Search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published