Conversation
Implement the DatabaseProvider interface for PostgreSQL streaming replication topologies. The provider uses lib/pq to connect to PostgreSQL instances and maps WAL-based replication state to the common ReplicationStatus struct. - GetReplicationStatus: queries pg_stat_wal_receiver (standby) or pg_current_wal_lsn (primary) - IsReplicaRunning: checks WAL receiver status - SetReadOnly/IsReadOnly: manages default_transaction_read_only - StartReplication: resumes WAL replay (streaming is automatic) - StopReplication: pauses WAL replay via pg_wal_replay_pause() Also adds PostgreSQLTopologyUser and PostgreSQLTopologyPassword config fields, unit tests, and updated documentation. Closes #53
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdding PostgreSQL support to orchestrator by implementing a Changes
Sequence DiagramsequenceDiagram
participant Orchestrator as Orchestrator
participant Provider as PostgreSQLProvider
participant PgConn as PostgreSQL Connection
participant WALRcv as WAL Receiver
participant PgDB as PostgreSQL Backend
Orchestrator->>Provider: GetReplicationStatus(key)
Provider->>PgConn: openPostgreSQLTopology()
PgConn->>PgDB: CONNECT
Provider->>PgDB: pg_is_in_recovery()
PgDB-->>Provider: true (standby)
Provider->>PgDB: SELECT * FROM pg_stat_wal_receiver
PgDB-->>WALRcv: WAL receiver status row
WALRcv-->>Provider: status, lsn
Provider->>PgDB: pg_last_wal_replay_lsn()
PgDB-->>Provider: current LSN position
Provider->>PgDB: pg_last_xact_replay_timestamp()
PgDB-->>Provider: replay timestamp
Provider->>Provider: Calculate lag, populate ReplicationStatus
Provider-->>Orchestrator: ReplicationStatus{<br/>ReplicaRunning: true,<br/>ReplayLag: calculated<br/>}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new PostgreSQL DatabaseProvider implementation so orchestrator can interact with PostgreSQL streaming replication via the provider abstraction layer (alongside the existing MySQL provider).
Changes:
- Introduces
PostgreSQLProviderimplementingDatabaseProvider, plus minimal unit tests. - Extends orchestrator configuration with PostgreSQL topology credentials.
- Vendors
github.com/lib/pqand documents PostgreSQL provider usage/behavior.
Reviewed changes
Copilot reviewed 5 out of 53 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| vendor/modules.txt | Adds github.com/lib/pq to vendored module list. |
| vendor/github.com/lib/pq/stmt.go | Vendored pq driver source. |
| vendor/github.com/lib/pq/staticcheck.conf | Vendored pq staticcheck configuration. |
| vendor/github.com/lib/pq/ssl.go | Vendored pq TLS handling. |
| vendor/github.com/lib/pq/scram/scram.go | Vendored pq SCRAM implementation. |
| vendor/github.com/lib/pq/rows.go | Vendored pq rows decoding logic. |
| vendor/github.com/lib/pq/quote.go | Vendored pq SQL quoting helpers. |
| vendor/github.com/lib/pq/pqerror/pqerror.go | Vendored pq error-code helpers. |
| vendor/github.com/lib/pq/pqerror/codes.go | Vendored generated PostgreSQL error codes. |
| vendor/github.com/lib/pq/oid/types.go | Vendored generated OID constants/type names. |
| vendor/github.com/lib/pq/oid/doc.go | Vendored OID package docs/types. |
| vendor/github.com/lib/pq/notify.go | Vendored LISTEN/NOTIFY support. |
| vendor/github.com/lib/pq/notice.go | Vendored NOTICE handler plumbing. |
| vendor/github.com/lib/pq/krb.go | Vendored GSS provider registration interface. |
| vendor/github.com/lib/pq/internal/proto/sz_64.go | Vendored protocol size constants (non-32bit). |
| vendor/github.com/lib/pq/internal/proto/sz_32.go | Vendored protocol size constants (32bit). |
| vendor/github.com/lib/pq/internal/proto/proto.go | Vendored protocol constants/types. |
| vendor/github.com/lib/pq/internal/pqutil/user_windows.go | Vendored user lookup (Windows). |
| vendor/github.com/lib/pq/internal/pqutil/user_posix.go | Vendored user lookup (POSIX). |
| vendor/github.com/lib/pq/internal/pqutil/user_other.go | Vendored user lookup (other platforms). |
| vendor/github.com/lib/pq/internal/pqutil/pqutil.go | Vendored misc helpers. |
| vendor/github.com/lib/pq/internal/pqutil/perm_unsupported.go | Vendored SSL key permission handling (unsupported OSes). |
| vendor/github.com/lib/pq/internal/pqutil/perm.go | Vendored SSL key permission checks. |
| vendor/github.com/lib/pq/internal/pqutil/path.go | Vendored home/.pgpass path logic. |
| vendor/github.com/lib/pq/internal/pqtime/pqtime.go | Vendored timestamp parsing/formatting. |
| vendor/github.com/lib/pq/internal/pqtime/loc.go | Vendored timezone cache. |
| vendor/github.com/lib/pq/internal/pqsql/copy.go | Vendored COPY detection helper. |
| vendor/github.com/lib/pq/internal/pgservice/pgservice.go | Vendored service-file support. |
| vendor/github.com/lib/pq/internal/pgpass/pgpass.go | Vendored .pgpass parsing. |
| vendor/github.com/lib/pq/error.go | Vendored pq Error type and error handling. |
| vendor/github.com/lib/pq/encode.go | Vendored text/binary encoding/decoding. |
| vendor/github.com/lib/pq/doc.go | Vendored pq package documentation. |
| vendor/github.com/lib/pq/deprecated.go | Vendored deprecated APIs/aliases. |
| vendor/github.com/lib/pq/copy.go | Vendored COPY FROM STDIN implementation. |
| vendor/github.com/lib/pq/conn_go18.go | Vendored context-aware driver interfaces/cancel. |
| vendor/github.com/lib/pq/compose.yaml | Vendored pq dev/test compose file. |
| vendor/github.com/lib/pq/buf.go | Vendored protocol read/write buffers. |
| vendor/github.com/lib/pq/as_go126.go | Vendored go1.26 build-tag helper. |
| vendor/github.com/lib/pq/as.go | Vendored As() helper (pre-go1.26). |
| vendor/github.com/lib/pq/array.go | Vendored array scanning/encoding. |
| vendor/github.com/lib/pq/README.md | Vendored pq README. |
| vendor/github.com/lib/pq/LICENSE | Vendored pq license. |
| vendor/github.com/lib/pq/CHANGELOG.md | Vendored pq changelog. |
| vendor/github.com/lib/pq/.gitignore | Vendored pq gitignore. |
| vendor/github.com/lib/pq/.gitattributes | Vendored pq gitattributes. |
| go/inst/provider_postgresql_test.go | Adds basic tests for provider name/interface/non-nil constructor. |
| go/inst/provider_postgresql.go | Implements PostgreSQL provider using database/sql + lib/pq. |
| go/config/config.go | Adds PostgreSQL topology credential fields to Configuration. |
| go.sum | Adds lib/pq checksums. |
| go.mod | Adds lib/pq dependency. |
| docs/database-providers.md | Documents PostgreSQL provider configuration/activation/behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Operation | PostgreSQL Implementation | | ||
| |---------------------|---------------------------------------------------------------| | ||
| | GetReplicationStatus | Queries `pg_stat_wal_receiver` (standby) or `pg_current_wal_lsn()` (primary). Reports WAL LSN as position and `replay_lag` as lag. | | ||
| | IsReplicaRunning | Checks `pg_stat_wal_receiver` for an active WAL receiver with `status = 'streaming'`. | | ||
| | SetReadOnly | Runs `ALTER SYSTEM SET default_transaction_read_only = on/off` followed by `SELECT pg_reload_conf()`. | | ||
| | IsReadOnly | Queries `SHOW default_transaction_read_only`. | | ||
| | StartReplication | Calls `SELECT pg_wal_replay_resume()`. Streaming replication itself starts automatically when the standby connects. | | ||
| | StopReplication | Calls `SELECT pg_wal_replay_pause()` to pause WAL replay. The WAL receiver remains connected. | |
There was a problem hiding this comment.
The table claims GetReplicationStatus reports replay_lag as lag, but the current provider implementation runs on the standby and will generally not be able to read replay_lag (it comes from pg_stat_replication on the primary). Please adjust the documentation to match the actual lag source, or update the provider to compute/report lag in the documented way.
| cfg := config.Config | ||
| connStr := fmt.Sprintf( | ||
| "host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable connect_timeout=5", | ||
| hostname, port, cfg.PostgreSQLTopologyUser, cfg.PostgreSQLTopologyPassword, | ||
| ) |
There was a problem hiding this comment.
The PostgreSQL DSN is built via fmt.Sprintf with raw user/password values. In lib/pq key=value DSNs, values containing spaces/quotes/backslashes need quoting/escaping; otherwise passwords like p@ss word or values containing sslmode= can break the connection string (or change its semantics). Consider constructing the DSN via net/url (postgres://user:pass@host:port/db?...) or using pq's Config/Connector API so credentials are properly escaped.
| func openPostgreSQLTopology(hostname string, port int) (*sql.DB, error) { | ||
| cfg := config.Config | ||
| connStr := fmt.Sprintf( | ||
| "host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable connect_timeout=5", |
There was a problem hiding this comment.
The DSN hardcodes sslmode=disable, which forces plaintext connections even when TLS is available/required. This is a security regression compared to letting the driver default or allowing configuration. Please make sslmode (and related TLS options) configurable, or at minimum avoid forcing disable so deployments can use require/verify-full as needed.
| "host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable connect_timeout=5", | |
| "host=%s port=%d user=%s password=%s dbname=postgres connect_timeout=5", |
| func openPostgreSQLTopology(hostname string, port int) (*sql.DB, error) { | ||
| cfg := config.Config | ||
| connStr := fmt.Sprintf( | ||
| "host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable connect_timeout=5", | ||
| hostname, port, cfg.PostgreSQLTopologyUser, cfg.PostgreSQLTopologyPassword, | ||
| ) | ||
| db, err := sql.Open("postgres", connStr) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| db.SetMaxOpenConns(3) | ||
| db.SetMaxIdleConns(1) | ||
| return db, nil | ||
| } |
There was a problem hiding this comment.
openPostgreSQLTopology creates a new *sql.DB for every call and closes it at the end of each provider method. Since provider methods are called frequently during discovery/polling, this defeats connection pooling and can create significant connection churn/load on PostgreSQL. Consider caching sql.DB per host:port (similar to sqlutils.GetGenericDB), or maintaining a long-lived pool and not closing it per request.
| // On a standby it queries pg_stat_wal_receiver; on a primary it queries | ||
| // pg_stat_replication. |
There was a problem hiding this comment.
The comment says the primary path queries pg_stat_replication, but the implementation only queries pg_current_wal_lsn(). Either update the comment to match behavior, or implement the intended pg_stat_replication-based mapping (e.g., to expose downstream replica lag/health on primaries).
| // On a standby it queries pg_stat_wal_receiver; on a primary it queries | |
| // pg_stat_replication. | |
| // On a standby it queries pg_stat_wal_receiver; on a primary it derives status | |
| // using pg_current_wal_lsn() and related functions. |
| EXTRACT(EPOCH FROM replay_lag) | ||
| FROM pg_stat_wal_receiver w | ||
| LEFT JOIN pg_stat_replication r ON true |
There was a problem hiding this comment.
This standby query extracts replay_lag by LEFT JOINing pg_stat_replication, but pg_stat_replication is a primary view and will be empty on standbys, so lagSeconds will typically be NULL and reported as unknown (-1). If you want standby lag, consider computing it from standby-side functions (e.g., now() - pg_last_xact_replay_timestamp()), or from pg_stat_wal_receiver.latest_end_time, and remove the pg_stat_replication join.
| EXTRACT(EPOCH FROM replay_lag) | |
| FROM pg_stat_wal_receiver w | |
| LEFT JOIN pg_stat_replication r ON true | |
| EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp()) | |
| FROM pg_stat_wal_receiver w |
| return log.Errore(err) | ||
| } | ||
| defer db.Close() | ||
|
|
There was a problem hiding this comment.
StartReplication always calls pg_wal_replay_resume(). On a primary (not in recovery) this function errors, so callers that accidentally invoke StartReplication on a primary will get a failure. Consider checking pg_is_in_recovery() first (or reusing the same inRecovery check as GetReplicationStatus) and either no-op or return a clearer error when not a standby.
| // Only standbys (in recovery) support WAL replay control functions. | |
| var inRecovery bool | |
| if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil { | |
| return log.Errore(err) | |
| } | |
| if !inRecovery { | |
| log.Infof("StartReplication called on primary %s:%d; instance is not in recovery, so WAL replay resume is not applicable", key.Hostname, key.Port) | |
| return nil | |
| } |
| return log.Errore(err) | ||
| } | ||
| defer db.Close() | ||
|
|
There was a problem hiding this comment.
StopReplication always calls pg_wal_replay_pause(), which will error on primaries (not in recovery). Similar to StartReplication, consider guarding with pg_is_in_recovery() and no-op / return a clearer error when invoked on a primary.
| var inRecovery bool | |
| if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil { | |
| return log.Errore(err) | |
| } | |
| if !inRecovery { | |
| return log.Errore(fmt.Errorf("cannot pause WAL replay on primary %s:%d (not in recovery)", key.Hostname, key.Port)) | |
| } |
|
Warning Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting |
Summary
DatabaseProviderinterface for PostgreSQL streaming replication (go/inst/provider_postgresql.go)ReplicationStatusstructPostgreSQLTopologyUserandPostgreSQLTopologyPasswordconfiguration fields togo/config/config.gogithub.com/lib/pqas a vendored dependency for PostgreSQL connectivitydocs/database-providers.mdwith PostgreSQL provider documentation including configuration, activation, and behavioral differences from MySQLTest plan
go build -o /dev/null ./go/cmd/orchestratorcompiles successfullygo test ./go/inst/... -run TestPostgreSQL -v-- all 3 tests pass (provider name, interface compliance, non-nil constructor)Closes #53
Summary by CodeRabbit