Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in auto_explain with recursive plans #2920

Closed
Tracked by #6448
metdos opened this issue Sep 2, 2019 · 5 comments · Fixed by #6406
Closed
Tracked by #6448

Crash in auto_explain with recursive plans #2920

metdos opened this issue Sep 2, 2019 · 5 comments · Fixed by #6406

Comments

@metdos
Copy link
Contributor

metdos commented Sep 2, 2019

Could be related #2009

@marcocitus marcocitus added the bug label Oct 16, 2019
@pykello
Copy link
Contributor

pykello commented May 11, 2020

I tried running multi_check when auto_explain is loaded. It seems to crash for tests with recursive planning with a backtrace like:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fad43cdc859 in __GI_abort () at abort.c:79
#2  0x00005653e3f4363a in ExceptionalCondition (
    conditionName=conditionName@entry=0x5653e415c780 "!(ActiveSnapshot != ((void *)0))", 
    errorType=errorType@entry=0x5653e3f9901d "FailedAssertion", fileName=fileName@entry=0x5653e415c547 "snapmgr.c", 
    lineNumber=lineNumber@entry=843) at assert.c:54
#3  0x00005653e3f85715 in GetActiveSnapshot () at snapmgr.c:843
#4  0x00005653e3f861ca in GetActiveSnapshot () at snapmgr.c:845
#5  0x00005653e3c2eaa8 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x7fad3590b9a0, into=into@entry=0x0, 
    es=es@entry=0x5653e5b59198, queryString=queryString@entry=0x0, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, 
    planduration=0x7ffcd4ec9ef0) at explain.c:497
#6  0x00007fad40db6d3d in ExplainSubPlans (distributedPlan=0x7fad359139b0, distributedPlan=0x7fad359139b0, es=0x5653e5b59198)
    at planner/multi_explain.c:217
#7  CitusExplainScan (node=<optimized out>, ancestors=<optimized out>, es=0x5653e5b59198) at planner/multi_explain.c:122
#8  0x00005653e3c2c977 in ExplainNode (planstate=<optimized out>, ancestors=ancestors@entry=0x0, 
    relationship=relationship@entry=0x0, plan_name=plan_name@entry=0x0, es=es@entry=0x5653e5b59198) at explain.c:1786
#9  0x00005653e3c2e736 in ExplainPrintPlan (es=es@entry=0x5653e5b59198, queryDesc=queryDesc@entry=0x5653e5c1fa68)
    at explain.c:705
#10 0x00007fad4488256f in explain_ExecutorEnd (queryDesc=0x5653e5c1fa68) at auto_explain.c:388
#11 0x00005653e3c49ece in PortalCleanup (portal=<optimized out>) at portalcmds.c:301
#12 0x00005653e3f74c85 in PortalDrop (portal=0x5653e5b2e408, isTopCommit=<optimized out>) at portalmem.c:499
#13 0x00005653e3e1a49e in exec_simple_query (
    query_string=0x5653e5a57ac8 "with x as (select a, random() from t) select random(), x.* from x;") at postgres.c:1225
#14 0x00005653e3e1bd23 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x5653e5af45d0, dbname=<optimized out>, 
    username=<optimized out>) at postgres.c:4247
#15 0x00005653e3d9269a in BackendRun (port=0x5653e5af20b0, port=0x5653e5af20b0) at postmaster.c:4437
#16 BackendStartup (port=0x5653e5af20b0) at postmaster.c:4128
#17 ServerLoop () at postmaster.c:1704
#18 0x00005653e3d93512 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#19 0x00005653e3aba651 in main (argc=3, argv=0x5653e5a51510) at main.c:228

@scottybrisbane
Copy link

We'd love to see Citus support auto_explain.

@hslightdb
Copy link
Contributor

hslightdb commented Feb 8, 2022

fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself.
we released an auto_explain fork by fixing the issue.
https://github.com/hslightdb/auto_explain

@scottybrisbane
Copy link

fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue. https://github.com/hslightdb/auto_explain

Has this fix been raised or proposed upstream at all? We'd love to see this fixed in the upstream.

@onderkalaci
Copy link
Member

onderkalaci commented Sep 26, 2022

Easy way to repro on Citus:

 LOAD 'auto_explain';
CREATE TABLE test(a int);
SELECT create_distributed_table('test', 'a');
INSERT INTO test SELECT i FROM generate_series(0,1000000)i;

 set auto_explain.log_min_duration to 0;        
WITH cte_1 AS (SELECT * FROM test LIMIT 1) SELECT count(*) FROM cte_1;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 48.723 ms
 @:-!> 

emelsimsek added a commit that referenced this issue Oct 7, 2022
DESCRIPTION: Fixes a bug that causes crash when using auto_explain with recursive queries.
emelsimsek added a commit that referenced this issue Oct 11, 2022
…tension.

The crash happens with recursively planned queries. For such queries, subplans are explained via the ExplainOnePlan function of postgresql. This function reconstructs the query description from the plan therefore it expects the ActiveSnaphot for the query be available. This fix makes sure that the snapshot is in the stack before calling ExplainOnePlan.

Fixes #2920.
@onderkalaci onderkalaci changed the title Crash in auto_explain Crash in auto_explain with recursive plans Oct 18, 2022
emelsimsek added a commit that referenced this issue Oct 19, 2022
DESCRIPTION: Fixes a bug that causes crash when using auto_explain with recursive queries.
emelsimsek added a commit that referenced this issue Oct 19, 2022
…tension.

The crash happens with recursively planned queries. For such queries, subplans are explained via the ExplainOnePlan function of postgresql. This function reconstructs the query description from the plan therefore it expects the ActiveSnaphot for the query be available. This fix makes sure that the snapshot is in the stack before calling ExplainOnePlan.

Fixes #2920.
emelsimsek added a commit that referenced this issue Oct 19, 2022
…ursive queries (#6406)

This crash happens with recursively planned queries. For such queries,
subplans are explained via the ExplainOnePlan function of postgresql.
This function reconstructs the query description from the plan therefore
it expects the ActiveSnaphot for the query be available. This fix makes
sure that the snapshot is in the stack before calling ExplainOnePlan.

Fixes #2920.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment