Skip to content

Commit

Permalink
Merge pull request #2370 from drizzle-team/beta
Browse files Browse the repository at this point in the history
Beta
  • Loading branch information
AndriiSherman committed May 31, 2024
2 parents a78eefe + 61bc749 commit e922211
Show file tree
Hide file tree
Showing 39 changed files with 2,763 additions and 68 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/release-feature-branch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,32 @@ jobs:
contents: read
id-token: write
services:
postgres-postgis:
image: postgis/postgis:16-3.4
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: drizzle
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 54322:5432
postgres-vector:
image: pgvector/pgvector:pg16
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: drizzle
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 54321:5432
postgres:
image: postgres:14
env:
Expand Down Expand Up @@ -111,6 +137,8 @@ jobs:
if: steps.checks.outputs.has_new_release == 'true'
env:
PG_CONNECTION_STRING: postgres://postgres:postgres@localhost:5432/drizzle
PG_VECTOR_CONNECTION_STRING: postgres://postgres:postgres@localhost:54321/drizzle
PG_POSTGIS_CONNECTION_STRING: postgres://postgres:postgres@localhost:54322/drizzle
MYSQL_CONNECTION_STRING: mysql://root:root@localhost:3306/drizzle
PLANETSCALE_CONNECTION_STRING: ${{ secrets.PLANETSCALE_CONNECTION_STRING }}
NEON_CONNECTION_STRING: ${{ secrets.NEON_CONNECTION_STRING }}
Expand Down
28 changes: 28 additions & 0 deletions .github/workflows/release-latest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,32 @@ jobs:
- eslint-plugin-drizzle
runs-on: ubuntu-20.04
services:
postgres-postgis:
image: postgis/postgis:16-3.4
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: drizzle
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 54322:5432
postgres-vector:
image: pgvector/pgvector:pg16
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: drizzle
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 54321:5432
postgres:
image: postgres:14
env:
Expand Down Expand Up @@ -114,6 +140,8 @@ jobs:
if: steps.checks.outputs.has_new_release == 'true'
env:
PG_CONNECTION_STRING: postgres://postgres:postgres@localhost:5432/drizzle
PG_VECTOR_CONNECTION_STRING: postgres://postgres:postgres@localhost:54321/drizzle
PG_POSTGIS_CONNECTION_STRING: postgres://postgres:postgres@localhost:54322/drizzle
MYSQL_CONNECTION_STRING: mysql://root:root@localhost:3306/drizzle
PLANETSCALE_CONNECTION_STRING: ${{ secrets.PLANETSCALE_CONNECTION_STRING }}
NEON_CONNECTION_STRING: ${{ secrets.NEON_CONNECTION_STRING }}
Expand Down
138 changes: 138 additions & 0 deletions changelogs/drizzle-orm/0.31.0-beta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
## Breaking changes

### PostgreSQL indexes API was changed

The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit

Previous API

- No way to define SQL expressions inside `.on`.
- `.using` and `.on` in our case are the same thing, so the API is incorrect here.
- `.asc()`, `.desc()`, `.nullsFirst()`, and `.nullsLast()` should be specified for each column or expression on indexes, but not on an index itself.

```ts
// Index declaration reference
index('name')
.on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...)
.concurrently()
.using(sql``) // sql expression
.asc() or .desc()
.nullsFirst() or .nullsLast()
.where(sql``) // sql expression
```

Current API

```ts
// First example, with `.on()`
index('name')
.on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...)
.concurrently()
.where(sql``)
.with({ fillfactor: '70' })

// Second Example, with `.using()`
index('name')
.using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops'))
.where(sql``) // sql expression
.with({ fillfactor: '70' })
```

## New Features

### 馃帀 "pg_vector" extension support

> There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the `pg_vector` extension installed.
You can now specify indexes for `pg_vector` and utilize `pg_vector` functions for querying, ordering, etc.

Let's take a few examples of `pg_vector` indexes from the `pg_vector` docs and translate them to Drizzle

#### L2 distance, Inner product and Cosine distance

```ts
// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

const table = pgTable('items', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops'))
ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops'))
cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
}))
```

#### L1 distance, Hamming distance and Jaccard distance - added in pg_vector 0.7.0 version

```ts
// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);

const table = pgTable('table', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops'))
hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops'))
bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops'))
}))
```

For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.

You can also use the following helpers:

```ts
import { l2Distance, l1Distance, innerProduct,
cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'

l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'

innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'

hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'
```

If `pg_vector` has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done

```ts
export function l2Distance(
column: SQLWrapper | AnyColumn,
value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
return sql`${column} <-> ${value}`;
}
return sql`${column} <-> ${JSON.stringify(value)}`;
}
```

Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR

#### Examples

Let's take a few examples of `pg_vector` queries from the `pg_vector` docs and translate them to Drizzle

```ts
import { l2Distance } from 'drizzle-orm';

// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))

// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })

// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)

// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)

// and more!
```
Loading

0 comments on commit e922211

Please sign in to comment.