New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null pointer in equalTupleDescs
crashes the server
#3825
Comments
There is another crash on the same coordinator with the exact call stack, with another |
We've seen this issue before, and had the internal conversations (e-mail titled
So, we think that this might be a PostgreSQL issue. Though, we should investigate more and probably submit a patch. |
equalTupleDescs
crashes the server
We have 5 new crash dumps with the same call stack on a coordinator with PG11 and citus 9.0 on may 12-13 |
Another crash on a citus 9.2 and pg 12 formation's coordinator node with the same stack |
Is it happening on the same cluster or on different clusters @kileri ? |
The formation I looked into was called <Customer 1> Production Cluster. I am not sure if @mtuncer came across with the previous 5 occurrences on the same formation though. |
Another crash has happened on <Customer 1> Production Cluster two days ago. |
@onlined It'd be useful to check the schema of <Customer 1> and see if there is anything unusual? Like a [custom] data type. Or, queries returning unusual tuples? Or anything that seems unusual? |
I investgated this further and I reached that there is a single task in the task list passed to SELECT a, b, c, d, e, f, g, h, i, j, k
FROM public.tbl_123456 tbl
WHERE ((b OPERATOR(pg_catalog.=) 123123) AND
((g)::text OPERATOR(pg_catalog.=) 'text1'::text) AND
((d)::text OPERATOR(pg_catalog.=) 'text2'::text) AND
((e)::text OPERATOR(pg_catalog.=) 'text3'::text))
|
I hit another core dump, at the same formation, and most likely the same table. our dump_backup folder has now 8 core dumps. query was a single table router select query where coordinator has almost nothing to do for execution. |
We had 3 occurrences on the same server: Oct. 16, 22, and Nov. 4 |
Another occurrence which presumably does not involve Citus: |
It appeared on two coordinators at Citus Cloud on Dec. 18 and 28. |
Logged another 4 hits from Dec 30 to Jan 9 |
2 more on Jan 26 |
my current understanding of why this might happen is this:
I have manually added an error to
I believe the above has the same root cause as this issue. One thing that came to my mind was to remove this line that sets it to NULL: https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1984 |
I have sent a patch to postgres for this issue and it is still active: https://www.postgresql.org/message-id/flat/3229167.1617210650%40sss.pgh.pa.us#9ac0555c613861cc8b4d2934185018d9 |
This is merged to Postgres and backported to until PG11. |
This trace was taken from a recent select query crash dump
Citus version is 9.2-2 on PG 12.2
Query is
select *
with 2 filters on 2 non-distribution columnsThe text was updated successfully, but these errors were encountered: