[Bug] count(*) query crashed and db in recovery mode #212

liyxbeijing · 2023-09-21T01:52:57Z

Cloudberry Database version

warehouse=# select version();
version

PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-b
it compiled on Aug 3 2023 10:15:47
(1 row)

What happened

count(*) query crashed:

# \d+ zzz_admin.aaa
                                         Table "zzz_admin.aaa"
    Column     |               Type                | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
---------------+-----------------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 table_catalog | information_schema.sql_identifier |           |          |         | plain    |             |              | 
 table_schema  | information_schema.sql_identifier |           |          |         | plain    |             |              | 
 table_name    | information_schema.sql_identifier |           |          |         | plain    |             |              | 
 dt            | text                              |           |          |         | extended |             |              | 
Distributed by: (table_catalog)
Access method: heap

=# select count(*) from zzz_admin.aaa;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> \q

logs:

2023-09-21 08:40:37.498183 CST,"gpadmin","",p433642,th-891975552,"[local]",,2023-09-21 08:38:56 CST,0,con21392,cmd29,seg-1,,,,sx1,"LOG","00000","statement: select count(*) from zzz_admin.aaa;",,,,,,,0,,"postgres.c",1708,
2023-09-21 08:40:37.551299 CST,,,p433642,th123456,,,2023-09-21 08:38:56 CST,0,con21392,cmd29,seg-1,,,,,"PANIC","XX000","Unexpected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1    0x7fb5c8cc6630 libpthread.so.0 <symbol not found> + 0xc8cc6630
2    0xd5cc6c postgres DirectFunctionCall1Coll (fmgr.c:806)
"
2023-09-21 08:40:37.651110 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 433642) was terminated by signal 11: Segmentation fault","Failed process was running: select count(*) from zzz_admin.aaa;",,,,,,0,,"postmaster.c",4229,
2023-09-21 08:40:37.651133 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","terminating any other active server processes",,,,,,,0,,"postmaster.c",3959,
2023-09-21 08:40:37.652496 CST,,,p75086,th-891975552,,,,0,,,seg-1,,,,,"WARNING","01000","ic-proxy-server: received signal 3",,,,,,,0,,"ic_proxy_main.c",474,
2023-09-21 08:40:37.652519 CST,"gpadmin","",p434360,th-891975552,"[local]",,2023-09-21 08:40:37 CST,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at 1C/27F1B788",,,,,,0,,"postmaster.c",2746,
2023-09-21 08:40:37.652569 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","background worker ""ic proxy process"" (PID 75086) exited with exit code 1",,,,,,,0,,"postmaster.c",4208,
2023-09-21 08:40:37.656272 CST,"gpadmin",,p434361,th-891975552,"","18994",2023-09-21 08:40:37 CST,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at 1C/27F1B788",,,,,,0,,"postmaster.c",2746,
2023-09-21 08:40:37.665056 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","all server processes terminated; reinitializing",,,,,,,0,,"postmaster.c",4515,
2023-09-21 08:40:37.831100 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system was interrupted; last known up at 2023-09-21 08:38:41 CST",,,,,,,0,,"xlog.c",6816,
2023-09-21 08:40:37.831129 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","Synchronization of the wal directory starts.",,,,,,,0,,"fd.c",3446,
2023-09-21 08:40:37.831191 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","synchronization of the wal directory finishes.",,,,,,,0,,"fd.c",3448,
2023-09-21 08:40:37.831701 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progress",,,,,,,0,,"xlog.c",7385,
2023-09-21 08:40:37.831701 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progress",,,,,,,0,,"xlog.c",7385,
2023-09-21 08:40:37.974973 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","redo starts at 1C/4035D810",,,,,,,0,,"xlog.c",7674,
2023-09-21 08:40:37.980902 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","invalid record length at 1C/40918430: wanted 24, got 0",,,,,,,0,,"xlog.c",4482,
2023-09-21 08:40:37.980911 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","redo done at 1C/40918260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s",,,,,,,0,,"xlog.c",7962,
2023-09-21 08:40:38.265962 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 1C/40918430",,,,,,,0,,"xlog.c",8051,
2023-09-21 08:40:38.281916 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","latest completed transaction id is 1719112 and next transaction id is 1719113",,,,,,,0,,"xlog.c",8449,
2023-09-21 08:40:38.281968 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",8476,
2023-09-21 08:40:38.285848 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit compiled on Aug  3 2023 10:15:10",,,,,,,0,,"postmaster.c",3538,
2023-09-21 08:40:38.285861 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit compiled on Aug  3 2023 10:15:10",,,,,,0,,"postmaster.c",3540,
2023-09-21 08:40:38.291444 CST,,,p434368,th-891975552,,,,0,con1,,seg-1,,,,sx1,"LOG","00000","initialized 1 resource queues",,,,,,,0,,"resscheduler.c",267,
2023-09-21 08:40:38.362261 CST,,,p434368,th-891975552,,,,0,con1,,seg-1,,,,sx1,"LOG","00000","Crash recovery broadcast of the distributed transaction 'Commit Prepared' broadcast succeeded for gid = 2015839.",,,,,,,0,,"cdbdtxrecovery.c",84,
2023-09-21 08:40:38.364541 CST,,,p434368,th-891975552,,,,0,con1,,seg-1,,,,sx1,"LOG","00000","DTM Started",,,,,,,0,,"cdbdtxrecovery.c",155,
2023-09-21 08:40:38.368227 CST,,,p434374,th-891975552,,,,0,con4,,seg-1,,,,,"LOG","00000","pg_cron scheduler started",,,,,,,0,,"pg_cron.c",376,
2023-09-21 08:40:42.656687 CST,"gpadmin",,p434423,th-891975552,"","19002",2023-09-21 08:40:42 CST,0,,,seg-1,,,,sx1,"LOG","00000","rejecting TCP connection to master using internalconnection protocol",,,,,,,0,,"auth.c",570,
2023-09-21 08:40:42.697845 CST,"gpadmin",,p434423,th-891975552,"","19002",2023-09-21 08:40:42 CST,0,,,seg-1,,,,,"LOG","00000","standby ""gp_walreceiver"" is now a synchronous standby with priority 1",,,,,,"START_REPLICATION 1C/40000000 TIMELINE 1",0,,"syncrep.c",671,

What you think should happen instead

No response

How to reproduce

rerun the query.

Operating System

centos 7.9

Anything else

No response

Are you willing to submit PR?

Yes, I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct.

The text was updated successfully, but these errors were encountered:

congxuebin · 2023-11-27T02:57:43Z

Just a reminder, this issue was marked as priority high by Max.

liyxbeijing added the type: Bug Something isn't working label Sep 21, 2023

my-ship-it added the priority: High After critical issues are fixed, these should be dealt with before any further issues. label Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] count(*) query crashed and db in recovery mode #212

[Bug] count(*) query crashed and db in recovery mode #212

liyxbeijing commented Sep 21, 2023

congxuebin commented Nov 27, 2023

[Bug] count(*) query crashed and db in recovery mode #212

[Bug] count(*) query crashed and db in recovery mode #212

Comments

liyxbeijing commented Sep 21, 2023

Cloudberry Database version

What happened

What you think should happen instead

How to reproduce

Operating System

Anything else

Are you willing to submit PR?

Code of Conduct

congxuebin commented Nov 27, 2023