Skip to content

[Bug] PAX DELETE may crash in MergeRawInfo when merging SUM stats for minmax_columns #1767

@ryapandt

Description

@ryapandt

Apache Cloudberry version

2.1

What happened

A PAX table may crash a segment during frequent DELETE / INSERT operations when minmax_columns contains columns that also have SUM statistics support, such as int, bigint, numeric, or similar numeric types.

The crash happens in the PAX DELETE visibility-map statistics refresh path.

example log :


PANIC XX000 Unexpected internal error: Segment process received signal SIGSEGV

libpostgres.so datumCopy
pax.so cbdb::datumCopy
pax.so pax::MicroPartitionStats::MergeRawInfo
pax.so pax::MicroPartitionStatsUpdater::Update
pax.so pax::TableDeleter::UpdateStatsInAuxTable
pax.so pax::TableDeleter::DeleteWithVisibilityMap

What you think should happen instead

The DELETE and INSERT statements should complete successfully.

PAX micro-partition statistics should be refreshed without crashing the segment.

How to reproduce

DROP TABLE IF EXISTS pax_minmax_sum_dml_crash;

CREATE TABLE pax_minmax_sum_dml_crash (
    dist_key int,
    id bigint,
    amount bigint,
    k int,
    payload text
)
USING pax
WITH (minmax_columns = 'amount')
DISTRIBUTED BY (dist_key);

-- Put all rows on one segment to make the repro more likely to hit
-- multiple PAX internal groups in the same segment.
INSERT INTO pax_minmax_sum_dml_crash
SELECT
    1 AS dist_key,
    i::bigint AS id,
    i::bigint AS amount,
    (i % 1000)::int AS k,
    md5(i::text) AS payload
FROM generate_series(1, 800000) AS s(i);

-- Repeated small DELETE + INSERT operations.
-- On affected builds, one of the DELETE statements may crash a segment
-- during PAX micro-partition statistics refresh.
DO $$
DECLARE
    v_iter int;
    v_lo bigint;
BEGIN
    FOR v_iter IN 1..200 LOOP
        v_lo := ((v_iter - 1) * 1000 + 1)::bigint;

        DELETE FROM pax_minmax_sum_dml_crash
        WHERE id BETWEEN v_lo AND v_lo + 99;

        INSERT INTO pax_minmax_sum_dml_crash
        SELECT
            1 AS dist_key,
            (1000000000 + v_iter * 10000 + i)::bigint AS id,
            (1000000000 + v_iter * 10000 + i)::bigint AS amount,
            (i % 1000)::int AS k,
            md5((1000000000 + v_iter * 10000 + i)::text) AS payload
        FROM generate_series(1, 100) AS s(i);

        RAISE NOTICE 'finished iteration %', v_iter;
    END LOOP;
END $$;

SELECT count(*) FROM pax_minmax_sum_dml_crash;

Operating System

rocky 9.6

Anything else

Suspected root cause from CODEX:

PAX minmax_columns appear to maintain not only min/max stats but also SUM stats for supported column types.

During DELETE visibility-map refresh, MicroPartitionStatsUpdater::Update() may merge existing group-level raw stats through:

MicroPartitionStats::MergeRawInfo()
In MergeRawInfo(), serialized SUM stats appear to be deserialized using the physical column type metadata:

FromValue(..., typlen, typbyval, column_index)
However, the serialized SUM value should be interpreted using the aggregate return type metadata, for example:

sum(bigint) returns numeric
So a SUM result for a bigint column may be serialized as numeric, but later decoded as bigint. This can produce an invalid Datum and eventually crash in datumCopy().

There are also suspicious datumCopy() calls where typByVal and typLen appear to be passed in the wrong order, for example:

cbdb::datumCopy(newval, sum_stat->rettyplen, sum_stat->rettypbyval);
while the wrapper signature is:

datumCopy(Datum value, bool typByVal, int typLen)
This should likely be:

cbdb::datumCopy(newval, sum_stat->rettypbyval, sum_stat->rettyplen);
Similar calls may also exist in MicroPartitionStats::MergeTo().

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: BugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions