Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DELETE .. ORDER BY .. LIMIT on partitioned tables without unique constraint uses non-unique ctid for self joins #16569

Closed
dwilches opened this issue Apr 19, 2024 · 8 comments

Comments

@dwilches
Copy link

Expected behavior

The following statement should attempt to remove 5 rows from the table:

deleteFrom(MY_TABLE)
    .where(MY_TABLE.RESULT_TS.lt(deleteBeforeThis))
    .orderBy(MY_TABLE.RESULT_TS)
    .limit(5)

Actual behavior

Instead, if MY_TABLE is a partitioned table, that statement is attempting to delete 5 rows of each partition, as the SQL generated is:

delete from MY_TABLE
where MY_TABLE.ctid in (
  select MY_TABLE.ctid
  from MY_TABLE
  where MY_TABLE.RESULT_TS < timestamp with time zone '2023-03-16 12:00:00+00:00'
  order by MY_TABLE.RESULT_TS
  fetch next 5 rows only
);

The issue is, that subselect returns a list of ctid and teh delete attmepts to delete by ctid, which in the case of a partitioned table is not unique:

SELECT ctid FROM MY_TABLE LIMIT 1;
 ctid  
-------
 (0,1)

But:

SELECT count(*) FROM MY_TABLE WHERE ctid='(0,1)';
 count 
-------
     2

Steps to reproduce the problem

Create a partitioned table with 2 partitions, and 1 record each one, then do SELECT ctid FROM MY_TABLE to see the ctids are not unique.

jOOQ Version

jOOQ 3.15.8

Database product and version

PostgreSQL 14.11 (Ubuntu 14.11-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

Java Version

openjdk 11

JDBC / R2DBC driver name and version (include name if unofficial driver)

No response

@dwilches
Copy link
Author

dwilches commented Apr 19, 2024

A possible fix that works for us: because we know which fields are the PK of our table (and which one is the partitioning key), if DELETE ... ORDER BY ... LIMIT gave us a way to specify which field to use instead of ctid, we could use that to pick the right columns that make our rows unique.

@lukaseder
Copy link
Member

Thanks for your report. I'll look into this right away.

@lukaseder
Copy link
Member

@lukaseder
Copy link
Member

In a test schema like this:

create table t_partitioned (
  id int not null,
  d date not null,
  
  constraint pk_t_partitioned primary key (id, d)
) partition by range (d)
;

create table t_partitioned_200001 partition of t_partitioned
  for values from ('2000-01-01') to ('2000-02-01')
;

create table t_partitioned_200002 partition of t_partitioned
  for values from ('2000-02-01') to ('2000-03-01')
;

This test here works just fine:

    public void testPartitionedDeleteLimit() {
        assumeNotNull(TPartitioned());

        try {
            insertPartitioned();

            assertEquals(updateCount(2),
            create().deleteFrom(TPartitioned())
                    .orderBy(TPartitioned_ID())
                    .limit(2)
                    .execute());

            assertEquals(8, fetchCount(TPartitioned()));
        }
        finally {
            reset(TPartitioned());
        }
    }

    private void insertPartitioned() {
        create().insertInto(TPartitioned())
                .columns(TPartitioned_ID(), TPartitioned_D())
                .values(1, Date.valueOf("2000-01-01"))
                .values(2, Date.valueOf("2000-01-02"))
                .values(3, Date.valueOf("2000-01-03"))
                .values(4, Date.valueOf("2000-01-04"))
                .values(5, Date.valueOf("2000-01-05"))
                .values(11, Date.valueOf("2000-02-01"))
                .values(12, Date.valueOf("2000-02-02"))
                .values(13, Date.valueOf("2000-02-03"))
                .values(14, Date.valueOf("2000-02-04"))
                .values(15, Date.valueOf("2000-02-05"))
                .execute();
    }

The difference is probably that my test table has an explicit primary key. Yours probably doesn't? You can add a synthetic key to your generated code like this:
https://www.jooq.org/doc/latest/manual/code-generation/codegen-advanced/codegen-config-database/codegen-database-synthetic-objects/codegen-database-synthetic-pks/

I'll look into the case where there's no primary key as well...

@lukaseder
Copy link
Member

Here's a suggested fix for the case where jOOQ doesn't have a primary key:

We should compare (tableoid, ctid) pairs in PostgreSQL, instead.

@lukaseder
Copy link
Member

lukaseder commented Apr 22, 2024

Perhaps, the Table.rowid() expression should produce this (tableoid, ctid) pair as well, though that would be a backwards incompatible change:

For this fix, I won't change the Table.rowid() behaviour yet (at least not for backports).

@lukaseder lukaseder changed the title DELETE .. ORDER BY .. LIMIT issue with partitioned tables DELETE .. ORDER BY .. LIMIT on partitioned tables without unique constraint uses non-unique ctid for self joins Apr 22, 2024
3.20 Other improvements automation moved this from To do to Done Apr 22, 2024
lukaseder added a commit that referenced this issue Apr 22, 2024
without unique constraint uses non-unique ctid for self joins
lukaseder added a commit that referenced this issue Apr 22, 2024
without unique constraint uses non-unique ctid for self joins
lukaseder added a commit that referenced this issue Apr 22, 2024
without unique constraint uses non-unique ctid for self joins
lukaseder added a commit that referenced this issue Apr 22, 2024
without unique constraint uses non-unique ctid for self joins
@dwilches
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants