Skip to content

Fixed RELCACHE_FORCE_RELEASE related use-after-free bugs#8477

Merged
onurctirtir merged 2 commits intocitusdata:mainfrom
berndreiss:main
Feb 16, 2026
Merged

Fixed RELCACHE_FORCE_RELEASE related use-after-free bugs#8477
onurctirtir merged 2 commits intocitusdata:mainfrom
berndreiss:main

Conversation

@berndreiss
Copy link
Copy Markdown
Contributor

BUG REPORT FOR PROPOSED CHANGES

Summary

Three use-after-free bugs exist in Citus where Relation struct members are accessed after
relation_close() or table_close() has been called. Under normal operation these may go
unnoticed because the relcache entry often remains cached in memory, but they are real memory
safety violations that cause segfaults and incorrect behavior when PostgreSQL is built
with RELCACHE_FORCE_RELEASE (which forces immediate eviction of relcache entries on close).

Versions

  • Citus: main branch at commit e1875b93 ("Add changelog entry for Citus v14.0.0")
  • PostgreSQL: REL_18_STABLE (18.2)

Note

The following bug sections assume that the extension is installed. The steps for
reproducing the bug are always performed on A. vanilla PostgreSQL and Citus
compiled from scratch without any special compile flags, B. with Postgres compiled
with DRELCACHE_FORCE_RELEASE and Citus compiled with bugs, and C. with Postgres
compiled DRELCACHE_FORCE_RELEASE and Citus compiled with the bug fixes.

Additionally the flags -O0 -g and the configure options --enable-debug and
--enable-cassert have been used for Postgres.

Bug 1: index.c:200relation->rd_id used after table_close()

The relation is closed at line 176, but relation->rd_id is read at line 200 when
createIndexStatement->idxname == NULL and the index has expressions. At this point
the Relation struct may have been freed.

Reproduction

CREATE TABLE test_index_uaf (id int, value numeric);
SELECT create_distributed_table('test_index_uaf', 'id');

-- Expression index with no explicit name triggers the bug
CREATE INDEX ON test_index_uaf ((value + 1));

Observed behavior

A. Vanilla Postgres:

No error:

CREATE INDEX

B. With RELCACHE_FORCE_RELEASE:

Random OID is used for relation lookup:

ERROR:  could not open relation with OID 2139062143

2139062143 = 0x7F7F7F7F — the cassert wipe pattern, confirming the read is from freed memory.

C. With RELCACHE_FORCE_RELEASE and the bug fix:

No error:

CREATE INDEX

Bug 2: alter_table.c:1366relation->rd_rel->relam used after relation_close()

The relation is closed at line 1357, but relation->rd_rel->relam is dereferenced at
lines 1366 and 1370. This involves a double dereference (relation->rd_rel then ->relam),
making it more dangerous -> if either relation or rd_rel itself point to freed memory, the dereference
can segfault.

Reproduction

CREATE TABLE test_alter_uaf (id int, value text);
SELECT create_distributed_table('test_alter_uaf', 'id');

-- alter_distributed_table calls CreateTableConversion which has the bug
SELECT alter_distributed_table('test_alter_uaf', shard_count := 4, cascade_to_colocated := false);

Observed behavior

A. Vanilla Postgres:

No error:

NOTICE:  creating a new table for public.test_alter_uaf
NOTICE:  moving the data of public.test_alter_uaf
NOTICE:  dropping the old public.test_alter_uaf
NOTICE:  renaming the new table to public.test_alter_uaf
 alter_distributed_table
-------------------------

(1 row)

B. With RELCACHE_FORCE_RELEASE:

The server process crashed:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
        The connection to the server was lost. Attempting reset: Failed.
        The connection to the server was lost. Attempting reset: Failed.
!?>

The server process output confirms the segmentation fault:

2026-02-14 17:13:31.514 UTC [169] LOG:  client backend (PID 182) was terminated by signal 11: Segmentation fault
2026-02-14 17:13:31.514 UTC [169] DETAIL:  Failed process was running: SELECT alter_distributed_table('test_alter_uaf', shard_count := 4, cascade_to_colocated := false);
2026-02-14 17:13:31.514 UTC [169] LOG:  terminating any other active server processes
2026-02-14 17:13:31.515 UTC [185] FATAL:  the database system is in recovery mode
2026-02-14 17:13:31.515 UTC [169] LOG:  all server processes terminated; reinitializing

C. With RELCACHE_FORCE_RELEASE and the bug fix:

No error:

NOTICE:  creating a new table for public.test_alter_uaf
NOTICE:  moving the data of public.test_alter_uaf
NOTICE:  dropping the old public.test_alter_uaf
NOTICE:  renaming the new table to public.test_alter_uaf
 alter_distributed_table
-------------------------

(1 row)

Bug 3: multi_partitioning_utils.c:342RelationGetRelationName(relation) used after relation_close()

The relation is closed at line 338, but RelationGetRelationName(relation) is called at
line 342 inside the ereport. The macro expands to NameStr((relation)->rd_rel->relname),
which involves a double dereference through rd_rel -> the same dangerous pattern as Bug 2.

Reproduction

CREATE TABLE test_partutil_uaf (id int, value text);
SELECT create_distributed_table('test_partutil_uaf', 'id');

-- Calling fix_partition_shard_index_names on a non-partitioned distributed
-- table takes the else branch at line 336 which closes and then reads
SELECT fix_partition_shard_index_names('test_partutil_uaf'::regclass);

Observed behavior

A. Vanilla Postgres:

An error is thrown:

ERROR:  Fixing shard index names is only applicable to partitioned tables or partitions, and "test_partutil_uaf" is neither

B. With RELCACHE_FORCE_RELEASE:

The server process crashed:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
        The connection to the server was lost. Attempting reset: Failed.
        The connection to the server was lost. Attempting reset: Failed.
!?>

The server process output confirms the segmentation fault:

2026-02-14 17:19:18.574 UTC [169] LOG:  client backend (PID 203) was terminated by signal 11: Segmentation fault
2026-02-14 17:19:18.574 UTC [169] DETAIL:  Failed process was running: SELECT fix_partition_shard_index_names('test_partutil_uaf'::regclass);
2026-02-14 17:19:18.574 UTC [169] LOG:  terminating any other active server processes
2026-02-14 17:19:18.574 UTC [209] LOG:  could not send data to client: Connection reset by peer
2026-02-14 17:19:18.575 UTC [210] FATAL:  the database system is in recovery mode
2026-02-14 17:19:18.575 UTC [169] LOG:  all server processes terminated; reinitializing

C. With RELCACHE_FORCE_RELEASE and the bug fix:

An error is thrown:

ERROR:  Fixing shard index names is only applicable to partitioned tables or partitions, and "test_partutil_uaf" is neither

Impact

  • Bug 1 causes incorrect behavior (wrong OID used for index creation). In the best case
    it errors out; in the worst case it could silently create an index on the wrong relation.
  • Bug 2 crashes the PostgreSQL backend with a segfault, terminating the client connection
    and requiring server recovery.
  • Bug 3 crashes the PostgreSQL backend with a segfault when
    fix_partition_shard_index_names() is called on a non-partitioned distributed table,
    terminating the client connection.

I hope this makes my changes transparent. If you need any further information, just let me know.

Kind regards
Bernd

@berndreiss
Copy link
Copy Markdown
Contributor Author

@berndreiss please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Member

@onurctirtir onurctirtir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, all code changes make sense to me!

@onurctirtir onurctirtir enabled auto-merge (squash) February 16, 2026 11:11
@onurctirtir onurctirtir merged commit 364beed into citusdata:main Feb 16, 2026
227 of 228 checks passed
@berndreiss
Copy link
Copy Markdown
Contributor Author

Glad I could be of help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants