A Drizzle-style, fully-typed schema and query layer for Turbopuffer.
Define a namespace from column builders and get, for free:
- an inferred document type from your schema (no hand-written interfaces),
- capability-checked ranking so
annonly accepts vector fields andbm25only accepts BM25 fields, - typed filters with field names and value types checked against the document,
- result rows narrowed to exactly the attributes you request, with
$distpresent only when the query actually scores.
# Deno
deno add jsr:@alphaxiv/fugu
# pnpm (first-class JSR support)
pnpm add jsr:@alphaxiv/fugu
# npm / yarn / bun (via the JSR bridge)
npx jsr add @alphaxiv/fuguimport { bool, datetime, fugu, int, string, uuid, vector } from "@alphaxiv/fugu";
const db = fugu(); // reads env vars by default
// Define a typed namespace.
const papers = db.namespace({
name: "papers",
version: 1,
columns: {
id: uuid({ version: 7 }),
embedding: vector(3072),
body: string({ fullTextSearch: true }).array(),
title: string(),
votes: int(),
published_at: datetime(),
blocked: bool(),
},
});
// The document type is inferred from the columns.
type Paper = typeof papers.$inferDocument;
// { id: string; embedding: number[]; body: string[]; title: string;
// votes: number; published_at: string; blocked: boolean }
// Write.
await papers.upsert([
{
id: "0190a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b",
embedding: [/* 3072 floats */],
body: ["page 1 text", "page 2 text"],
title: "Attention Is All You Need",
votes: 9001,
published_at: "2024-01-18T00:00:00Z",
blocked: false,
},
]);
// Query: vector search, filtered, returning only the fields you need.
const { ann } = papers.rank;
const { and, eq, gte } = papers.filter;
const results = await papers.query({
rankBy: ann("embedding", queryEmbedding),
topK: 20,
filters: and(eq("blocked", false), gte("votes", 10)),
includeAttributes: ["title", "votes"],
});
results[0].title; // string (requested)
results[0].votes; // number (requested)
results[0].$dist; // number (ANN scores, so $dist is present and required)
results[0].body; // compile error: not requestedfugu(...) takes a Turbopuffer client. Provide it however suits you:
// From explicit options.
const db = fugu({ client: { apiKey, region } });
// From a client you construct (for custom timeouts, retries, base URL, etc.).
import { Turbopuffer } from "@turbopuffer/turbopuffer";
const db = fugu({ client: new Turbopuffer({ apiKey, region, timeout: 30_000 }) });
// Or omit `client` to read TURBOPUFFER_API_KEY and TURBOPUFFER_REGION from the environment.
const db = fugu();The environment prefix is optional and separate from credentials; see
Environments and versioning.
Columns mirror Turbopuffer's real attribute types exactly. Each builder takes optional config; the only method that
transforms the value type is .array().
| Builder | Turbopuffer type | Value type | Notes |
|---|---|---|---|
string<T>(opts?) |
string |
T extends string |
{ fullTextSearch } enables BM25 |
int(opts?) |
int |
number |
signed 64-bit |
uint(opts?) |
uint |
number |
unsigned 64-bit |
float(opts?) |
float |
number |
64-bit |
bool(opts?) |
bool |
boolean |
|
uuid<T>(opts?) |
uuid |
T extends string |
{ version: 7 } asserts UUIDv7 on write |
datetime(opts?) |
datetime |
string |
RFC 3339 / ISO 8601 |
vector(n, opts?) |
[n]f32 |
number[] |
dense vector, ANN-indexed |
column.array() |
[]… |
T[] |
wraps any scalar into its array form |
Every scalar builder accepts { filterable } (defaults to true; set false to skip the filter index on attributes
you never filter or sort by).
The string and uuid builders take a type parameter, so branded id types flow straight into the inferred document:
import type { PaperId } from "./ids.ts";
const papers = db.namespace({
name: "papers",
version: 1,
columns: {
id: uuid<PaperId>({ version: 7 }), // document `id` is PaperId, not just string
// …
},
});A namespace must include an id column; it doubles as the document key, so re-upserting the same id overwrites the
row. Turbopuffer ids are uint64, UUID, or string, so id can be a uuid(), a string(), or a uint() (int() works
too) column.
// BM25 over an array of page strings. Full-text attributes are non-filterable by default.
body: string({ fullTextSearch: true }).array(),
// Dense vector. Defaults to f32 with an ANN index.
embedding: vector(3072),
// Half precision, or stored-without-an-ANN-index:
embedding16: vector(3072, { encoding: "f16" }),
stored: vector(256, { ann: false }),The distance metric is set once per namespace, not per vector, because Turbopuffer applies a single metric to every
vector column. It defaults to cosine_distance:
db.namespace({
name: "papers",
version: 1,
distanceMetric: "euclidean_squared", // optional; applies to all vectors, defaults to cosine_distance
columns: { embedding: vector(3072) /* … */ },
});await papers.upsert(document); // a single document
await papers.upsert(documents); // or an array, batched into one write- Documents are typed as the inferred document, so the shape, value types, and branded ids are all checked.
- Re-upserting an existing id overwrites it (use it to reprocess a row or recompute an embedding).
- The compiled schema is sent on every write, and any
uuid({ version: 7 })columns validate their values before the request. - An empty array is a no-op.
Patch partially updates documents: pass the id plus only the fields you want to change. Other fields are left
untouched.
await papers.patch({ id, votes: 42 }); // one record
await papers.patch([{ id: a, votes: 1 }, { id: b }]); // or an array
await papers.patchWhere(eq("blocked", true), { votes: 0 }); // every matching documentDelete by id or by filter:
await papers.delete(id); // one id
await papers.delete([a, b, c]); // or many
await papers.deleteWhere(lt("votes", 0)); // every matching documentA patch's id must reference an existing document, only the document's real fields are accepted (typed and partial),
and version-checked columns still validate. Unlike upsert, patches and deletes send no schema. Empty patch([]) /
delete([]) are no-ops.
const rows = await papers.query({
rankBy, // required: how to rank (see below)
topK, // optional: max rows
filters, // optional: a filter expression
includeAttributes, // optional: which attributes to return
});Turbopuffer returns only the id by default and lets you opt into attributes. fugu mirrors that in the type, so a row
contains exactly what you asked for:
includeAttributes |
Row shape |
|---|---|
omitted / false |
{ id, $dist? } only |
["title", "votes"] |
{ id, title, votes } (those fields required) plus $dist if scored |
true |
every attribute |
Touching a field you did not request is a compile error, which also catches forgetting to include an attribute you then use.
$dist is present and required for scoring ranks (ann, knn, bm25, and the score combinators) and absent when you
order by an attribute (asc / desc), matching Turbopuffer's behavior. No optional-chaining needed on the common path.
const scored = await papers.query({ rankBy: ann("embedding", q) });
scored[0].$dist; // number
const ordered = await papers.query({ rankBy: papers.rank.desc("votes") });
ordered[0].$dist; // compile error: attribute ordering produces no scorenamespace.filter is an operator set bound to the document. Destructure what you need:
const { and, or, not, eq, ne, lt, lte, gt, gte, inArray, notInArray } = papers.filter;
const filters = and(
eq("blocked", false),
gte("published_at", "2024-01-01T00:00:00Z"),
or(gt("votes", 100), inArray("title", ["A", "B"])),
);| Operator | Meaning |
|---|---|
eq(field, value) / ne(field, value) |
equals / not equals |
lt / lte / gt / gte |
comparisons |
inArray(field, values) / notInArray(field, values) |
membership |
and(...filters) / or(...filters) / not(filter) |
composition |
Field names and value types are checked against the document, so eq("blocked", 5) or a misspelled field will not
compile.
Operators are bound to the namespace (via namespace.filter) rather than standalone imports, because the document type
only appears in keyof T positions of a filter, which TypeScript cannot infer back from a filter value. Binding fixes
the document so the field checks actually fire.
namespace.rank is an operator set whose field arguments are constrained by index capability, not just value type:
const { ann, knn, bm25, asc, desc, sum, max, product } = papers.rank;
ann("embedding", queryVector); // ok: embedding is a vector field
bm25("body", "transformers"); // ok: body is a BM25 field
bm25("title", "transformers"); // compile error: title has no BM25 index
ann("body", queryVector); // compile error: body is not a vector field| Operator | Produces |
|---|---|
ann(field, vector) / knn(field, vector) |
approximate / exact vector search (vector fields only) |
bm25(field, query, params?) |
BM25 full-text ranking (BM25 fields only) |
asc(field) / desc(field) |
order by an attribute, no score |
sum(...terms) / max(...terms) / product(weight, term) |
combine BM25 sub-scores |
Turbopuffer does hybrid search by running vector and BM25 queries and fusing the rankings on the client. Because results are typed, the fusion is a few lines:
const { ann, bm25 } = papers.rank;
const { eq } = papers.filter;
const [vec, text] = await Promise.all([
papers.query({
rankBy: ann("embedding", queryVector),
topK: 100,
filters: eq("blocked", false),
includeAttributes: ["title"],
}),
papers.query({
rankBy: bm25("body", queryText),
topK: 100,
filters: eq("blocked", false),
includeAttributes: ["title"],
}),
]);
const K = 60; // reciprocal-rank-fusion constant
const scores = new Map<string, number>();
for (const list of [vec, text]) {
list.forEach((row, rank) => scores.set(row.id, (scores.get(row.id) ?? 0) + 1 / (K + rank)));
}
const ranked = [...scores.entries()].sort((a, b) => b[1] - a[1]).map(([id]) => id);The full namespace key is {environment}-{name}-v{version}. Both the environment and version segments are dropped when
omitted, so the key can be anything from papers to prod-papers-v2. Two knobs:
environmentisolates deployments. Pass"prod"/"staging"/"dev"tofugu(...)so they never share an index. Because Turbopuffer namespaces are addressable by prefix, this also lets you list or tear down everything in one environment at once. When omitted, fugu falls back to theFUGU_ENVenvironment variable, so you can set the environment per deployment without threading it through code (an explicitenvironmentalways wins).versionis your migration escape hatch, and it is optional. Turbopuffer attribute types are immutable once a namespace holds data, so a breaking schema change (a changed type or removed attribute) means bumpingversionto roll onto a fresh, empty namespace. Adding indexing to an existing attribute does not need a bump. Omitversionentirely for namespaces you never expect to migrate.
const db = fugu({ client, environment: "staging" });
db.namespace({ name: "papers", version: 2, columns }).key; // "staging-papers-v2"
db.namespace({ name: "papers", columns }).key; // "staging-papers"