Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support pgvector extension #445

Merged
merged 1 commit into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions external/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ SUBDIRS += polar_monitor_preload

# NB: those will be ignored in minimal mode.
ifeq ($(enable_polar_minimal),no)
SUBDIRS += pgvector
SUBDIRS += polar_worker
SUBDIRS += polar_tde_utils
SUBDIRS += polar_parameter_check
Expand Down
8 changes: 8 additions & 0 deletions external/pgvector/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/.git/
/dist/
/results/
/tmp_check/
/sql/vector--?.?.?.sql
regression.*
*.o
*.so
6 changes: 6 additions & 0 deletions external/pgvector/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
root = true

[*.{c,h,pl,pm,sql}]
indent_style = tab
indent_size = tab
tab_width = 4
102 changes: 102 additions & 0 deletions external/pgvector/.github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
name: build
on: [push, pull_request]
jobs:
ubuntu:
runs-on: ${{ matrix.os }}
if: ${{ !startsWith(github.ref_name, 'mac') && !startsWith(github.ref_name, 'windows') }}
strategy:
fail-fast: false
matrix:
include:
- postgres: 17
os: ubuntu-22.04
- postgres: 16
os: ubuntu-22.04
- postgres: 15
os: ubuntu-22.04
- postgres: 14
os: ubuntu-22.04
- postgres: 13
os: ubuntu-20.04
- postgres: 12
os: ubuntu-20.04
- postgres: 11
os: ubuntu-20.04
steps:
- uses: actions/checkout@v4
- uses: ankane/setup-postgres@v1
with:
postgres-version: ${{ matrix.postgres }}
dev-files: true
- run: make
env:
PG_CFLAGS: -Wall -Wextra -Werror -Wno-unused-parameter -Wno-sign-compare
- run: |
export PG_CONFIG=`which pg_config`
sudo --preserve-env=PG_CONFIG make install
- run: make installcheck
- if: ${{ failure() }}
run: cat regression.diffs
- run: |
sudo apt-get update
sudo apt-get install libipc-run-perl
- run: make prove_installcheck
mac:
runs-on: macos-latest
if: ${{ !startsWith(github.ref_name, 'windows') }}
steps:
- uses: actions/checkout@v4
- uses: ankane/setup-postgres@v1
with:
postgres-version: 14
- run: make
env:
PG_CFLAGS: -Wall -Wextra -Werror -Wno-unused-parameter
- run: make install
- run: make installcheck
- if: ${{ failure() }}
run: cat regression.diffs
- run: |
brew install cpanm
cpanm --notest IPC::Run
wget -q https://github.com/postgres/postgres/archive/refs/tags/REL_14_5.tar.gz
tar xf REL_14_5.tar.gz
- run: make prove_installcheck PROVE_FLAGS="-I ./postgres-REL_14_5/src/test/perl" PERL5LIB="/Users/runner/perl5/lib/perl5"
- run: make clean && /usr/local/opt/llvm@15/bin/scan-build --status-bugs make
windows:
runs-on: windows-latest
if: ${{ !startsWith(github.ref_name, 'mac') }}
steps:
- uses: actions/checkout@v4
- uses: ankane/setup-postgres@v1
with:
postgres-version: 14
- run: |
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvars64.bat" && ^
nmake /NOLOGO /F Makefile.win && ^
nmake /NOLOGO /F Makefile.win install && ^
nmake /NOLOGO /F Makefile.win installcheck && ^
nmake /NOLOGO /F Makefile.win clean && ^
nmake /NOLOGO /F Makefile.win uninstall
shell: cmd
i386:
if: ${{ !startsWith(github.ref_name, 'mac') && !startsWith(github.ref_name, 'windows') }}
runs-on: ubuntu-latest
container:
image: debian:11
options: --platform linux/386
steps:
- run: apt-get update && apt-get install -y build-essential git libipc-run-perl postgresql-13 postgresql-server-dev-13 sudo
- run: service postgresql start
- run: |
git clone https://github.com/${{ github.repository }}.git pgvector
cd pgvector
git fetch origin ${{ github.ref }}
git reset --hard FETCH_HEAD
make
make install
chown -R postgres .
sudo -u postgres make installcheck
sudo -u postgres make prove_installcheck
env:
PG_CFLAGS: -Wall -Wextra -Werror -Wno-unused-parameter -Wno-sign-compare
13 changes: 13 additions & 0 deletions external/pgvector/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/dist/
/log/
/results/
/tmp_check/
/sql/vector--?.?.?.sql
*.o
*.so
*.bc
*.dll
*.dylib
*.obj
*.lib
*.exp
148 changes: 148 additions & 0 deletions external/pgvector/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
## 0.5.1 (2023-10-10)

- Improved performance of HNSW index builds
- Added check for MVCC-compliant snapshot for index scans

## 0.5.0 (2023-08-28)

- Added HNSW index type
- Added support for parallel index builds for IVFFlat
- Added `l1_distance` function
- Added element-wise multiplication for vectors
- Added `sum` aggregate
- Improved performance of distance functions
- Fixed out of range results for cosine distance
- Fixed results for NULL and NaN distances for IVFFlat

## 0.4.4 (2023-06-12)

- Improved error message for malformed vector literal
- Fixed segmentation fault with text input
- Fixed consecutive delimiters with text input

## 0.4.3 (2023-06-10)

- Improved cost estimation
- Improved support for spaces with text input
- Fixed infinite and NaN values with binary input
- Fixed infinite values with vector addition and subtraction
- Fixed infinite values with list centers
- Fixed compilation error when `float8` is pass by reference
- Fixed compilation error on PowerPC
- Fixed segmentation fault with index creation on i386

## 0.4.2 (2023-05-13)

- Added notice when index created with little data
- Fixed dimensions check for some direct function calls
- Fixed installation error with Postgres 12.0-12.2

## 0.4.1 (2023-03-21)

- Improved performance of cosine distance
- Fixed index scan count

## 0.4.0 (2023-01-11)

If upgrading with Postgres < 13, see [this note](https://github.com/pgvector/pgvector#040).

- Changed text representation for vector elements to match `real`
- Changed storage for vector from `plain` to `extended`
- Increased max dimensions for vector from 1024 to 16000
- Increased max dimensions for index from 1024 to 2000
- Improved accuracy of text parsing for certain inputs
- Added `avg` aggregate for vector
- Added experimental support for Windows
- Dropped support for Postgres 10

## 0.3.2 (2022-11-22)

- Fixed `invalid memory alloc request size` error

## 0.3.1 (2022-11-02)

If upgrading from 0.2.7 or 0.3.0, [recreate](https://github.com/pgvector/pgvector#031) all `ivfflat` indexes after upgrading to ensure all data is indexed.

- Fixed issue with inserts silently corrupting `ivfflat` indexes (introduced in 0.2.7)
- Fixed segmentation fault with index creation when lists > 6500

## 0.3.0 (2022-10-15)

- Added support for Postgres 15
- Dropped support for Postgres 9.6

## 0.2.7 (2022-07-31)

- Fixed `unexpected data beyond EOF` error

## 0.2.6 (2022-05-22)

- Improved performance of index creation for Postgres < 12

## 0.2.5 (2022-02-11)

- Reduced memory usage during index creation
- Fixed index creation exceeding `maintenance_work_mem`
- Fixed error with index creation when lists > 1600

## 0.2.4 (2022-02-06)

- Added support for parallel vacuum
- Fixed issue with index not reusing space

## 0.2.3 (2022-01-30)

- Added indexing progress for Postgres 12+
- Improved interrupt handling during index creation

## 0.2.2 (2022-01-15)

- Fixed compilation error on Mac ARM

## 0.2.1 (2022-01-02)

- Fixed `operator is not unique` error

## 0.2.0 (2021-10-03)

- Added support for Postgres 14

## 0.1.8 (2021-09-07)

- Added cast for `vector` to `real[]`

## 0.1.7 (2021-06-13)

- Added cast for `numeric[]` to `vector`

## 0.1.6 (2021-06-09)

- Fixed segmentation fault with `COUNT`

## 0.1.5 (2021-05-25)

- Reduced memory usage during index creation

## 0.1.4 (2021-05-09)

- Fixed kmeans for inner product
- Fixed multiple definition error with GCC 10

## 0.1.3 (2021-05-06)

- Added Dockerfile
- Fixed version

## 0.1.2 (2021-04-26)

- Vectorized distance calculations
- Improved cost estimation

## 0.1.1 (2021-04-25)

- Added binary representation for `COPY`
- Marked functions as `PARALLEL SAFE`

## 0.1.0 (2021-04-20)

- First release
20 changes: 20 additions & 0 deletions external/pgvector/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
ARG PG_MAJOR=15
FROM postgres:$PG_MAJOR
ARG PG_MAJOR

COPY . /tmp/pgvector

RUN apt-get update && \
apt-mark hold locales && \
apt-get install -y --no-install-recommends build-essential postgresql-server-dev-$PG_MAJOR && \
cd /tmp/pgvector && \
make clean && \
make OPTFLAGS="" && \
make install && \
mkdir /usr/share/doc/pgvector && \
cp LICENSE README.md /usr/share/doc/pgvector && \
rm -r /tmp/pgvector && \
apt-get remove -y build-essential postgresql-server-dev-$PG_MAJOR && \
apt-get autoremove -y && \
apt-mark unhold locales && \
rm -rf /var/lib/apt/lists/*
20 changes: 20 additions & 0 deletions external/pgvector/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group

Portions Copyright (c) 1994, The Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
49 changes: 49 additions & 0 deletions external/pgvector/META.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"name": "vector",
"abstract": "Open-source vector similarity search for Postgres",
"description": "Supports L2 distance, inner product, and cosine distance",
"version": "0.5.1",
"maintainer": [
"Andrew Kane <andrew@ankane.org>"
],
"license": {
"PostgreSQL": "http://www.postgresql.org/about/licence"
},
"prereqs": {
"runtime": {
"requires": {
"PostgreSQL": "11.0.0"
}
}
},
"provides": {
"vector": {
"file": "sql/vector.sql",
"docfile": "README.md",
"version": "0.5.1",
"abstract": "Open-source vector similarity search for Postgres"
}
},
"resources": {
"homepage": "https://github.com/pgvector/pgvector",
"bugtracker": {
"web": "https://github.com/pgvector/pgvector/issues"
},
"repository": {
"url": "https://github.com/pgvector/pgvector.git",
"web": "https://github.com/pgvector/pgvector",
"type": "git"
}
},
"generated_by": "Andrew Kane",
"meta-spec": {
"version": "1.0.0",
"url": "http://pgxn.org/meta/spec.txt"
},
"tags": [
"vectors",
"datatype",
"nearest neighbor search",
"approximate nearest neighbors"
]
}
Loading