Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] gp_aoseg_name may return wrong value. #215

Closed
1 of 2 tasks
shmiwy opened this issue Sep 21, 2023 · 5 comments · Fixed by #248
Closed
1 of 2 tasks

[Bug] gp_aoseg_name may return wrong value. #215

shmiwy opened this issue Sep 21, 2023 · 5 comments · Fixed by #248
Assignees
Labels
type: Bug Something isn't working

Comments

@shmiwy
Copy link

shmiwy commented Sep 21, 2023

Cloudberry Database version

No response

What happened

SET gp_create_table_random_default_distribution=off;
DROP TABLE IF EXISTS foo;

CREATE TABLE foo (a INT, b INT, c CHAR(128)) WITH (appendonly=true);
CREATE INDEX foo_index ON foo(b);
INSERT INTO foo SELECT i as a, 1 as b, 'hello world' as c FROM generate_series(1, 100) AS i;

SET optimizer=off;

VACUUM foo;
DELETE FROM foo WHERE a < 4;
SELECT COUNT(*) FROM foo;

SELECT count(*) FROM pg_class WHERE relname='foo';

SELECT segno, tupcount, state FROM gp_aoseg_name('foo');
VACUUM full foo;
SELECT segno, tupcount, state FROM gp_aoseg_name('foo');
DELETE FROM foo WHERE a < 12;
SELECT segno, tupcount, state FROM gp_aoseg_name('foo'); 

drop function gp_aoseg_name;

sql with gp_aoseg_name appears to return some useless rows (tupcount = 0)

gpadmin=# SELECT COUNT(*) FROM foo;
 count
-------
    97
(1 row)

gpadmin=#
gpadmin=# SELECT count(*) FROM pg_class WHERE relname='foo';
 count
-------
     1
(1 row)

gpadmin=#
gpadmin=# SELECT segno, tupcount, state FROM gp_aoseg_name('foo');
 segno | tupcount | state
-------+----------+-------
     1 |      100 |     1
     2 |        0 |     1
     3 |        0 |     1
     4 |        0 |     1
(4 rows)

gpadmin=# VACUUM full foo;
VACUUM
gpadmin=# SELECT segno, tupcount, state FROM gp_aoseg_name('foo');
 segno | tupcount | state
-------+----------+-------
     1 |       25 |     1
     2 |       72 |     1
     3 |        0 |     1
     4 |        0 |     1
(4 rows)

gpadmin=# DELETE FROM foo WHERE a < 12;
DELETE 8
gpadmin=# SELECT segno, tupcount, state FROM gp_aoseg_name('foo');
 segno | tupcount | state
-------+----------+-------
     1 |       25 |     1
     2 |       72 |     1
     3 |        0 |     1
     4 |        0 |     1
(4 rows)

What you think should happen instead

There should be no rows with tupcount=0

How to reproduce

rerun the sql.

Operating System

centos7

Anything else

No response

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

@shmiwy shmiwy added the type: Bug Something isn't working label Sep 21, 2023
@Ray-Eldath Ray-Eldath self-assigned this Sep 21, 2023
@Ray-Eldath
Copy link
Contributor

Ray-Eldath commented Sep 22, 2023

This is also caused by gp_appendonly_insert_files. when setting to 1 there's no lines with tupcount=0, and when setting to 8 there's seven (8-1) empty lines. should this be considered "in-fact-a-feature-not-bug" as well? @avamingli

@avamingli
Copy link
Collaborator

avamingli commented Sep 22, 2023

This is also caused by gp_appendonly_insert_files. when setting to 1 there's no lines with tupcount=0, and when setting to 8 there's seven (8-1) empty lines.

Agree, it's not a bug, as we will open gp_appendonly_insert_files segfiles anyway during insertion.

We could do better if there are no enough input data, the default gp_appendonly_insert_files_tuples_range is 10W,
if there are less than gp_appendonly_insert_files plus gp_appendonly_insert_files_tuples_range rows to be inserted, then we could open less segfiles than gp_appendonly_insert_files.
If it is optimized, the example of this issue could be avoided.

@avamingli
Copy link
Collaborator

gp_appendonly_insert_files plus gp_appendonly_insert_files_tuples_range

gp_appendonly_insert_files multiply gp_appendonly_insert_files_tuples_range

@avamingli
Copy link
Collaborator

This is also caused by gp_appendonly_insert_files. when setting to 1 there's no lines with tupcount=0, and when setting to 8 there's seven (8-1) empty lines.

Agree, it's not a bug, as we will open gp_appendonly_insert_files segfiles anyway during insertion.

We could do better if there are no enough input data, the default gp_appendonly_insert_files_tuples_range is 10W, if there are less than gp_appendonly_insert_files plus gp_appendonly_insert_files_tuples_range rows to be inserted, then we could open less segfiles than gp_appendonly_insert_files. If it is optimized, the example of this issue could be avoided.

I will take this one soon.

@congxuebin
Copy link
Collaborator

        CREATE FUNCTION gp_aoseg_name("tname" varchar)
        RETURNS TABLE("segno" integer, "tupcount" numeric, "state" smallint, "modcount" numeric) AS \$\$
            DECLARE
            aoseg_table_name varchar;
            subquery varchar;
            begin
            select concat('pg_aoseg.pg_aoseg_', array_to_string(array(select oid from pg_class where relname=tname), '')) into aoseg_table_name;
            
            subquery := 'select segno, sum(tupcount) as tupcount, state, sum(modcount) as modcount from gp_dist_random(''' || aoseg_table_name || ''') group by segno, state order by segno ';
            RETURN QUERY execute subquery;
            END\$\$
        LANGUAGE plpgsql VOLATILE
        COST 100;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants