Add memory leak tests, and fix memory leaks related to repeated table creation/destruction #5537

Mytherin · 2022-11-29T14:28:26Z

Memory Leak Tests

This PR adds a new type of test - the memory leak test. These tests do not look for traditional memory leaks (i.e. forgetting to call free) - but rather detect increases of memory usage by the system when running specific queries or operations in a loop. The former (forgetting to call free) almost never happens in DuckDB as we use RAII and smart pointers to avoid having to manually clean up resources. The latter, however, can happen in case objects are not cleaned up early enough (even if they do get cleaned up eventually when the database shuts down).

These tests are written using the C++ API, and look like this:

TEST_CASE("Rollback create table", "[memoryleak]") {
	if (!TestMemoryLeaks()) {
		return;
	}
	DuckDB db;
	Connection con(db);
	while (true) {
		REQUIRE_NO_FAIL(con.Query("BEGIN"));
		REQUIRE_NO_FAIL(con.Query("CREATE TABLE t2(i INT);"));
		REQUIRE_NO_FAIL(con.Query("ROLLBACK"));
	}
}

Note two structures:

We have a specific flag to activate these tests that must be passed to the unittester (--test-memory-leaks) - this is checked for using the TestMemoryLeaks at the top. If the flag is not passed the tests are skipped.
The test executes queries using a while(true) loop - this is why this flag is necessary. Running the test using the standard unittest program means the test will never terminate.

Instead, to run these tests the test_memory_leaks.py script must be run, e.g.:

python3 test/memoryleak/test_memory_leaks.py --test="Rollback create table"`

This script runs the the test until either (a) the program's memory usage stabilizes/stops growing for more than 10 seconds straight, or (b) the timeout is reached (in which case the test is marked as a failure). By default the timeout is 60 seconds.

This PR also includes fixes for two memory leaks found through these tests.

Rolling Back Tables

This PR fixes an issue where tables that were created and then rolled back were not properly erased from the catalog - leading to an increase in memory usage. The reason this happened was that the catalog is set up with a layer of indirection where we have entries sitting in a separate vector from the name map. This layer of indirection exists to support ACID renaming of entries in the catalog set.

This PR makes this less error prone by creating a dedicated EntryIndex class that uses RAII to correctly destroy the entries as soon as no more mapping points to them, rather than leaving them in the catalog until the database is shut down.

Eviction Queue Flooding

This PR also fixes #5501 - where creating and dropping temporary tables repeatedly causes memory usage to increase. The problem here is that the eviction queue of the buffer manager was flooded with blocks from the temporary tables that were no longer actually present in the database (as the temporary table had been erased). This PR fixes that by calling PurgeQueue also when blocks are destroyed - rather than only when they are inserted into the queue.

…ary leak test

…entries

Mause

Mostly nitpicks

Mause · 2022-11-30T01:27:07Z