Summary
pg_stat_io (inherited from the PostgreSQL 16 merge) gives per-backend,
per-IO-object, per-context I/O statistics, and Cloudberry already exposes
cluster-wide rollups via gp_stat_io and gp_stat_io_summary. However, the
underlying counters are only incremented on the shared-buffer-manager path
(bufmgr.c, localbuf.c, md.c). Append-Optimized (AO) and
Append-Optimized Column-Oriented (AOCO) table I/O is not counted at all.
Because AO/AOCO storage bypasses the shared buffer pool and uses its own
BufferedRead / BufferedAppend layer, reads, writes, and extends against
these tables are invisible to pg_stat_io. For a workload built primarily on
AO/AOCO tables — which is common in Cloudberry analytics deployments — the view
significantly under-reports actual physical I/O.
Current behavior
grep -rn pgstat_count_io_op src/backend/access/appendonly src/backend/access/aocs
returns no matches — the AO/AOCO read/write paths contain no
instrumentation.
- I/O counters are populated only from
bufmgr.c, localbuf.c, and md.c,
i.e. the heap / index / temp-relation paths inherited from PostgreSQL.
- As a result,
pg_stat_io (and gp_stat_io / gp_stat_io_summary) reflect
heap/index/catalog I/O but omit the storage format that defines Cloudberry.
Expected behavior
I/O performed by AO/AOCO tables should be reflected in pg_stat_io, so that
operators can observe physical I/O for the storage types they actually use.
Proposed approach (for discussion)
Instrument the AO/AOCO buffered I/O layer with pgstat_count_io_op[_time]()
calls, analogous to the existing md.c instrumentation:
- Reads — in the AO/AOCO
BufferedRead path (bufmgr equivalent for AO),
count IOOP_READ.
- Writes / extends — in the
BufferedAppend / segment-file extend path,
count IOOP_WRITE and IOOP_EXTEND.
- fsync — where AO segment files are flushed (
register_dirty_segment /
mdimmedsync equivalents).
Open design questions:
- IOOBJECT classification. AO/AOCO data does not live in shared buffers,
so IOOBJECT_RELATION (which today implies buffer-pool involvement) may be
misleading. Options: reuse IOOBJECT_RELATION, or add a new IO object
(e.g. IOOBJECT_AO_RELATION) to keep AO I/O distinguishable. Adding an enum
value changes the fixed-size shared stats struct and the
pg_stat_io / pg_stat_get_io() output shape, so it needs care.
- IOCONTEXT mapping. AO has no shared-buffer eviction/reuse semantics, so
hits / evictions / reuses are not meaningful; only
reads / writes / extends / fsyncs would be populated. Decide how the
non-applicable columns are reported (zero vs. NULL).
op_bytes. AO varblocks are variable-sized rather than BLCKSZ-aligned,
so the per-op byte accounting differs from the heap path.
Impact / motivation
- Observability parity: AO/AOCO is the primary storage format for many
Cloudberry analytics workloads, yet is the one format pg_stat_io can't see.
- Capacity planning & troubleshooting: physical read/write volume for AO tables
is currently only obtainable indirectly.
Notes
- Affects:
main.
- Related existing surfaces:
pg_stat_io view (system_views.sql),
gp_stat_io / gp_stat_io_summary (system_views_gp*.sql),
pgstat_io.c, pgstat.h IO enums.
Summary
pg_stat_io(inherited from the PostgreSQL 16 merge) gives per-backend,per-IO-object, per-context I/O statistics, and Cloudberry already exposes
cluster-wide rollups via
gp_stat_ioandgp_stat_io_summary. However, theunderlying counters are only incremented on the shared-buffer-manager path
(
bufmgr.c,localbuf.c,md.c). Append-Optimized (AO) andAppend-Optimized Column-Oriented (AOCO) table I/O is not counted at all.
Because AO/AOCO storage bypasses the shared buffer pool and uses its own
BufferedRead/BufferedAppendlayer, reads, writes, and extends againstthese tables are invisible to
pg_stat_io. For a workload built primarily onAO/AOCO tables — which is common in Cloudberry analytics deployments — the view
significantly under-reports actual physical I/O.
Current behavior
grep -rn pgstat_count_io_op src/backend/access/appendonly src/backend/access/aocsreturns no matches — the AO/AOCO read/write paths contain no
instrumentation.
bufmgr.c,localbuf.c, andmd.c,i.e. the heap / index / temp-relation paths inherited from PostgreSQL.
pg_stat_io(andgp_stat_io/gp_stat_io_summary) reflectheap/index/catalog I/O but omit the storage format that defines Cloudberry.
Expected behavior
I/O performed by AO/AOCO tables should be reflected in
pg_stat_io, so thatoperators can observe physical I/O for the storage types they actually use.
Proposed approach (for discussion)
Instrument the AO/AOCO buffered I/O layer with
pgstat_count_io_op[_time]()calls, analogous to the existing
md.cinstrumentation:BufferedReadpath (bufmgrequivalent for AO),count
IOOP_READ.BufferedAppend/ segment-file extend path,count
IOOP_WRITEandIOOP_EXTEND.register_dirty_segment/mdimmedsyncequivalents).Open design questions:
so
IOOBJECT_RELATION(which today implies buffer-pool involvement) may bemisleading. Options: reuse
IOOBJECT_RELATION, or add a new IO object(e.g.
IOOBJECT_AO_RELATION) to keep AO I/O distinguishable. Adding an enumvalue changes the fixed-size shared stats struct and the
pg_stat_io/pg_stat_get_io()output shape, so it needs care.hits/evictions/reusesare not meaningful; onlyreads/writes/extends/fsyncswould be populated. Decide how thenon-applicable columns are reported (zero vs. NULL).
op_bytes. AO varblocks are variable-sized rather thanBLCKSZ-aligned,so the per-op byte accounting differs from the heap path.
Impact / motivation
Cloudberry analytics workloads, yet is the one format
pg_stat_iocan't see.is currently only obtainable indirectly.
Notes
main.pg_stat_ioview (system_views.sql),gp_stat_io/gp_stat_io_summary(system_views_gp*.sql),pgstat_io.c,pgstat.hIO enums.