MDEV-38474: ASAN heap-use-after-free in st_join_table::cleanup#4726
MDEV-38474: ASAN heap-use-after-free in st_join_table::cleanup#4726DaveGosselin-MariaDB merged 1 commit intoMariaDB:11.4from
Conversation
| delete pushdown_unit; | ||
| pushdown_unit= nullptr; | ||
|
|
||
| cleanup_stranded_units(); |
There was a problem hiding this comment.
This changes the cleanup order of units, so the comment for void st_select_lex_unit::remember_my_cleanup() introduced by commit 34a8209 needs to be updated.
There was a problem hiding this comment.
Add a new comment here explaining why this call should appear at the end of st_select_lex_unit::cleanup.
There was a problem hiding this comment.
Thanks. I was hoping for a bit more detail in the analysis to make its way into this comment. Perhaps that would better appear in the git commit message. In any case, yes, you've included a test case and have added the comment, but it's not clear why we need to delay the stranded unit cleanup. Can you explain plainly why this fixes the problem and with some more details? You say that the parent-first "is required for the safe destruction of merged tables" (and I'm not doubting) but can you explain why? For some inspiration, see the commit message of the commit that you referenced in the description, 34a8209 as well as the various code-level comments in that commit.
There was a problem hiding this comment.
@DaveGosselin-MariaDB I will try with an example. Let's say, we have a query:
SELECT * FROM t1
WHERE a IN (
SELECT b FROM t2
WHERE a IN (
SELECT c FROM t3 WHERE FALSE HAVING c < 0
)
);
Where, Unit1 is the main query, Unit2 is the middle query, and Unit3 is the innermost query.
The list will be Unit1->Unit2->Unit3
Unit1->cleanup() begins, calls cleanup_stranded_units(), which again calls Unit2->cleanup(), calls cleanup_stranded_units(), which again calls, Unit3->cleanup(). Here, any heap allocated objects related to join are cleared. When the control reaches back to Unit2->cleanup(), it now tries its own local cleanup. Since, Unit3 was merged into Unit2, and the resources have already been freed, we have dangling pointers in Unit2 that still refers to Unit3's structures. This leads to accessing freed memory.
Moving cleanup_stranded_units() to the end of cleanup call fixes this, because Unit2 (the parent) completes its own local cleanup first (and thereby, clearing its references to any structures). Only then, Unit3's cleanup is called. Each unit cleanups up its own first, then its child.
DaveGosselin-MariaDB
left a comment
There was a problem hiding this comment.
Hi @abhishek593 , thanks for the analysis follow-up. Please update the git commit message to include a summary of your analysis as another paragraph. Then I'll be able to approve and merge it.
|
@DaveGosselin-MariaDB I have updated the commit message. |
cleanup_stranded_units() was added at the start of st_select_lex_unit::cleanup() by 34a8209. This causes a use-after-free when nested subqueries are merged into their parent unit. With nested subqueries like: SELECT * FROM t1 WHERE a IN (SELECT b FROM t2 WHERE a IN (SELECT c FROM t3 WHERE FALSE HAVING c < 0)); the stranded_clean_list chains the units as: Unit1 -> Unit2 -> Unit3. Because cleanup_stranded_units() was called first, Unit1->cleanup() would recursively trigger Unit2->cleanup(), which in turn would trigger Unit3->cleanup(). Unit3's cleanup frees its heap-allocated join structures. But since Unit3 was merged into Unit2, Unit2 still holds references to Unit3's structures (e.g., st_join_table). When control returns to Unit2 for its own local cleanup, it accesses already-freed memory. Fix: move cleanup_stranded_units() to the end of cleanup(). This way, each unit completes its own local cleanup first—clearing its references to any child structures—before triggering cleanup of its stranded (child) units. This enforces a parent-first cleanup order.
28f12e5 to
05594fc
Compare
Fix a regression introduced by 34a8209 which added cleanup_stranded_units() to the start of st_select_lex_unit::cleanup().
Changes: