Skip to content

pg_stat_io: track I/O for Append-Optimized (AO/AOCO) tables #1834

Description

@my-ship-it

Summary

pg_stat_io (inherited from the PostgreSQL 16 merge) gives per-backend,
per-IO-object, per-context I/O statistics, and Cloudberry already exposes
cluster-wide rollups via gp_stat_io and gp_stat_io_summary. However, the
underlying counters are only incremented on the shared-buffer-manager path
(bufmgr.c, localbuf.c, md.c). Append-Optimized (AO) and
Append-Optimized Column-Oriented (AOCO) table I/O is not counted at all.

Because AO/AOCO storage bypasses the shared buffer pool and uses its own
BufferedRead / BufferedAppend layer, reads, writes, and extends against
these tables are invisible to pg_stat_io. For a workload built primarily on
AO/AOCO tables — which is common in Cloudberry analytics deployments — the view
significantly under-reports actual physical I/O.

Current behavior

  • grep -rn pgstat_count_io_op src/backend/access/appendonly src/backend/access/aocs
    returns no matches — the AO/AOCO read/write paths contain no
    instrumentation.
  • I/O counters are populated only from bufmgr.c, localbuf.c, and md.c,
    i.e. the heap / index / temp-relation paths inherited from PostgreSQL.
  • As a result, pg_stat_io (and gp_stat_io / gp_stat_io_summary) reflect
    heap/index/catalog I/O but omit the storage format that defines Cloudberry.

Expected behavior

I/O performed by AO/AOCO tables should be reflected in pg_stat_io, so that
operators can observe physical I/O for the storage types they actually use.

Proposed approach (for discussion)

Instrument the AO/AOCO buffered I/O layer with pgstat_count_io_op[_time]()
calls, analogous to the existing md.c instrumentation:

  • Reads — in the AO/AOCO BufferedRead path (bufmgr equivalent for AO),
    count IOOP_READ.
  • Writes / extends — in the BufferedAppend / segment-file extend path,
    count IOOP_WRITE and IOOP_EXTEND.
  • fsync — where AO segment files are flushed (register_dirty_segment /
    mdimmedsync equivalents).

Open design questions:

  1. IOOBJECT classification. AO/AOCO data does not live in shared buffers,
    so IOOBJECT_RELATION (which today implies buffer-pool involvement) may be
    misleading. Options: reuse IOOBJECT_RELATION, or add a new IO object
    (e.g. IOOBJECT_AO_RELATION) to keep AO I/O distinguishable. Adding an enum
    value changes the fixed-size shared stats struct and the
    pg_stat_io / pg_stat_get_io() output shape, so it needs care.
  2. IOCONTEXT mapping. AO has no shared-buffer eviction/reuse semantics, so
    hits / evictions / reuses are not meaningful; only
    reads / writes / extends / fsyncs would be populated. Decide how the
    non-applicable columns are reported (zero vs. NULL).
  3. op_bytes. AO varblocks are variable-sized rather than BLCKSZ-aligned,
    so the per-op byte accounting differs from the heap path.

Impact / motivation

  • Observability parity: AO/AOCO is the primary storage format for many
    Cloudberry analytics workloads, yet is the one format pg_stat_io can't see.
  • Capacity planning & troubleshooting: physical read/write volume for AO tables
    is currently only obtainable indirectly.

Notes

  • Affects: main.
  • Related existing surfaces: pg_stat_io view (system_views.sql),
    gp_stat_io / gp_stat_io_summary (system_views_gp*.sql),
    pgstat_io.c, pgstat.h IO enums.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions