Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Claude Code Context for Duckgres

This file provides context for Claude Code sessions working on this codebase.

## Project Overview

Duckgres is a PostgreSQL wire protocol server backed by DuckDB. It allows any PostgreSQL client (psql, pgAdmin, lib/pq, psycopg2, JDBC, etc.) to connect and execute queries against DuckDB databases.

## Architecture

```
PostgreSQL Client → TLS → Duckgres Server → DuckDB (per-user database)
```

### Key Components

- **main.go**: Entry point, configuration loading (CLI flags, env vars, YAML)
- **server/server.go**: Server struct, connection handling, graceful shutdown
- **server/conn.go**: Client connection handling, query execution, COPY protocol
- **server/protocol.go**: PostgreSQL wire protocol message encoding/decoding
- **server/catalog.go**: pg_catalog compatibility (views, functions, query rewriting)
- **server/types.go**: Type OID mapping between DuckDB and PostgreSQL
- **server/ratelimit.go**: Rate limiting for brute-force protection
- **server/tls.go**: Auto-generation of self-signed TLS certificates

## PostgreSQL Wire Protocol

The server implements the PostgreSQL v3 protocol:

### Message Types (server/protocol.go)
- **Frontend (client→server)**: Query, Parse, Bind, Describe, Execute, Sync, Close, CopyData, CopyDone
- **Backend (server→client)**: AuthOK, RowDescription, DataRow, CommandComplete, ReadyForQuery, CopyInResponse, CopyOutResponse

### Query Flow
1. Client sends Query message ('Q')
2. Server parses SQL, rewrites pg_catalog references
3. Server executes via DuckDB's database/sql driver
4. Server sends RowDescription + DataRow messages
5. Server sends CommandComplete + ReadyForQuery

### Extended Query Protocol
Supports prepared statements (Parse/Bind/Execute) for parameterized queries and binary result formats.

## pg_catalog Compatibility (server/catalog.go)

psql and other clients expect PostgreSQL system catalogs. We provide compatibility by:

1. **Creating views** in main schema that mirror pg_catalog tables:
- `pg_database`, `pg_class_full`, `pg_collation`, `pg_policy`, `pg_roles`
- `pg_statistic_ext`, `pg_publication`, `pg_publication_rel`, `pg_inherits`, etc.

2. **Creating macros** for PostgreSQL functions:
- `pg_get_userbyid`, `pg_table_is_visible`, `format_type`, `pg_get_expr`
- `obj_description`, `col_description`, `pg_get_indexdef`, etc.

3. **Query rewriting** to replace PostgreSQL-specific syntax:
- `pg_catalog.pg_class` → `pg_class_full`
- `OPERATOR(pg_catalog.~)` → `~`
- `::pg_catalog.regtype` → `::VARCHAR`

## COPY Protocol (server/conn.go)

Supports bulk data transfer:
- **COPY TO STDOUT**: Streams query results to client
- **COPY FROM STDIN**: Receives data from client, inserts row by row
- Supports CSV format with HEADER option

## Configuration

Three-tier configuration (highest to lowest priority):
1. CLI flags (`--port`, `--config`, etc.)
2. Environment variables (`DUCKGRES_PORT`, etc.)
3. YAML config file
4. Built-in defaults

## Testing

```bash
# Build
go build -o duckgres .

# Run on non-standard port
./duckgres --port 35437

# Connect with psql
PGPASSWORD=postgres psql "host=127.0.0.1 port=35437 user=postgres sslmode=require"

# Test commands
\dt # List tables
\d tablename # Describe table
\l # List databases
```

## Common Development Tasks

### Adding a new pg_catalog view
1. Add view creation SQL in `initPgCatalog()` in `catalog.go`
2. Add regex pattern to rewrite `pg_catalog.viewname` to `viewname`
3. Add the replacement in `rewritePgCatalogQuery()`

### Adding a new PostgreSQL function
1. Add `CREATE MACRO` in the `functions` slice in `initPgCatalog()`
2. Add function name to `pgCatalogFunctions` slice for query rewriting

### Adding protocol support
1. Add message type constant in `protocol.go`
2. Add write function (e.g., `writeCopyData()`)
3. Handle in message loop in `conn.go`

## Dependencies

- `github.com/duckdb/duckdb-go/v2` - DuckDB Go driver
- `gopkg.in/yaml.v3` - YAML config parsing

## Known Limitations

- Single process (all users share one process)
- No replication
- Some pg_catalog tables are stubs (return empty)
- Type OID mapping is incomplete (some types show as "unknown")

## TODO Reference

See `TODO.md` for the full feature roadmap and known issues.
45 changes: 44 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ A PostgreSQL wire protocol compatible server backed by DuckDB. Connect with any
- **PostgreSQL Wire Protocol**: Full compatibility with PostgreSQL clients
- **TLS Encryption**: Required TLS connections with auto-generated self-signed certificates
- **Per-User Databases**: Each authenticated user gets their own isolated DuckDB database file
- **Password Authentication**: MD5 password authentication
- **Password Authentication**: Cleartext password authentication over TLS
- **Extended Query Protocol**: Support for prepared statements, binary format, and parameterized queries
- **COPY Protocol**: Bulk data import/export with `COPY FROM STDIN` and `COPY TO STDOUT`
- **DuckDB Extensions**: Configurable extension loading (ducklake enabled by default)
- **DuckLake Integration**: Auto-attach DuckLake catalogs for lakehouse workflows
- **Rate Limiting**: Built-in protection against brute-force attacks
- **Graceful Shutdown**: Waits for in-flight queries before exiting
- **Flexible Configuration**: YAML config files, environment variables, and CLI flags

## Quick Start
Expand Down Expand Up @@ -144,6 +146,46 @@ ATTACH 'ducklake:postgres:host=localhost dbname=ducklake' (DATA_PATH 's3://my-bu

See [DuckLake documentation](https://ducklake.select/docs/stable/duckdb/usage/connecting) for more details.

## COPY Protocol

Duckgres supports PostgreSQL's COPY protocol for efficient bulk data import and export:

```sql
-- Export data to stdout (tab-separated)
COPY tablename TO STDOUT;

-- Export as CSV with headers
COPY tablename TO STDOUT WITH CSV HEADER;

-- Export query results
COPY (SELECT * FROM tablename WHERE id > 100) TO STDOUT WITH CSV;

-- Import data from stdin
COPY tablename FROM STDIN;

-- Import CSV with headers
COPY tablename FROM STDIN WITH CSV HEADER;
```

This works with psql's `\copy` command and programmatic COPY operations from PostgreSQL drivers.

## Graceful Shutdown

Duckgres handles shutdown signals (SIGINT, SIGTERM) gracefully:

- Stops accepting new connections immediately
- Waits for in-flight queries to complete (default 30s timeout)
- Logs active connection count during shutdown
- Closes all database connections cleanly

The shutdown timeout can be configured:

```go
cfg := server.Config{
ShutdownTimeout: 60 * time.Second,
}
```

## Rate Limiting

Built-in rate limiting protects against brute-force authentication attacks:
Expand Down Expand Up @@ -219,6 +261,7 @@ GROUP BY name;
- `DROP TABLE/INDEX/VIEW`
- `ALTER TABLE`
- `BEGIN/COMMIT/ROLLBACK` (DuckDB transaction support)
- `COPY` - Bulk data loading and export (see below)

### PostgreSQL Compatibility
- Extended query protocol (prepared statements)
Expand Down
10 changes: 5 additions & 5 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,26 @@

### Protocol Compatibility
- [ ] **Binary Format Support**: Encode results in binary format for better performance with some clients
- [ ] **COPY Protocol**: Support `COPY FROM`/`COPY TO` for bulk data loading
- [x] **COPY Protocol**: Support `COPY FROM`/`COPY TO` for bulk data loading
- [ ] **Cancel Request Handling**: Properly cancel long-running queries

### Compatibility
- [x] **System Catalog Emulation**: Basic `pg_catalog` compatibility for psql
- [x] `\dt` (list tables) - working
- [x] `\l` (list databases) - working
- [ ] `\d <table>` (describe table) - needs more pg_class columns
- [x] `\d <table>` (describe table) - working
- [ ] **Information Schema**: Emulate PostgreSQL's `information_schema`
- [ ] **Session Variables**: Support `SET` commands (timezone, search_path, etc.)
- [ ] **Type OID Mapping**: Proper PostgreSQL OID mapping for all DuckDB types

### Features
- [ ] **Extensions**: Load DuckDB extensions on startup
- [x] **Extensions**: Load DuckDB extensions on startup

### Operations
- [ ] **Hot Reload**: Reload config without restart
- [ ] **Admin Commands**: `\duckgres status`, `\duckgres users`, etc.
- [ ] **Docker Image**: Official container image
- [ ] **Graceful Shutdown**: Finish in-flight queries before shutdown
- [x] **Graceful Shutdown**: Finish in-flight queries before shutdown

## Medium Priority

Expand Down Expand Up @@ -73,7 +73,7 @@
## Known Issues

- [ ] Some PostgreSQL drivers may fail with unsupported OIDs
- [ ] `\d` commands in psql don't work (need system catalog)
- [x] `\d` commands in psql don't work (need system catalog) - fixed
- [ ] Transaction isolation may differ from PostgreSQL behavior
- [ ] Large result sets may cause memory issues (no streaming)

Expand Down
Loading