Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-4789: Fix an incorrect memory assumption for hash join statistics #676

Merged
merged 8 commits into from
Jan 26, 2024

Conversation

bandetto
Copy link
Member

@bandetto bandetto commented Jan 18, 2024

Fix an incorrect memory assumption for hash join statistics

When a hashjoin's hashtable runs out of allowed memory, it divides the hash
table into batches, and stores each batch to disk. When using EXPLAIN ANALYZE,
an additional description is created for the number of batches and the memory
they used. (With normal EXPLAIN, the request is not executed, and accordingly,
batches are not created). To store information about the disk space occupied by
a batch, the HashJoinTableStats structure is used, which has a batchstats
member of type HashJoinBatchStats, which is an array of structures for each
batch.

Usually, the number of batches is calculated at the beginning of hashjoin, but
sometimes the calculation algorithm may miss. Then, the
ExecHashIncreaseNumBatches() is used, which increases the current number of
batches, multiplying the previous one by two, and allocating additional memory
to the structures that need it.

Each structure of type HashJoinBatchStats takes 80 bytes, opposed to 8 bytes
for a pointer. Since memory for batchstats member is allocated for the
structures themselves, and not pointers to them, the size of realloc() call
for batchstats in ExecHashIncreaseNumBatches() can become more than
MaxAllocSize and cause an error.

In the beginning of ExecHashIncreaseNumBatches(), there is a check that
assumes that each item in any of the reallocated arrays would not be more than 8
bytes (sizeof(void *)). oldnbatch is equal to nbatch / 2. When we take the
size of the HashJoinBatchStats into account, we may notice that this check is
not correct specifically for this allocation, and is off by 10 times of the
value of the faulty repalloc() call.

The issue is that batchstats grows faster than other arrays containing
pointers. So this check does not prevent the faulty statistics batches
reallocation. When we are nearing MaxAllocSize with
nbatch * sizeof(stats->batchstats[0]), and try to reallocate the batchstats
member, we will encounter an error, because allocation size requested will be
more than MaxAllocSize.

Use repalloc_huge() instead of repalloc() for statistics, to allow
requesting memory up to:
MaxAllocSize / ((sizeof(void *) * 2) = 67 108 863;
67 108 863 * 80 = 5 368 709 040, which is slightly less than 5GB.

On a 32-bit machines, nbatch * sizeof(stats->batchstats[0]) could exceed
MaxAllocHugeSize, which is 2 ^ 32 = 4GB. Add a separate check and stop if
nbatch > MaxAllocHugeSize / sizeof(HashJoinBatchStats) to avoid integer
overflow.

On 64-bit machines, maximum allowed amount of batches is unchanged.

@bandetto bandetto marked this pull request as ready for review January 18, 2024 10:36
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62045

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/977661

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/977662

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62101

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/980215

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/980216

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/980209

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/980210

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62201

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/984620

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/984614

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/984621

@RekGRpth
Copy link
Member

provide last commit message

On a 32-bit machines, `nbatch * sizeof(stats->batchstats[0])` could exceed
MaxAllocHugeSize, which is SIZE_MAX / 2. Stop if `nbatch > MaxAllocHugeSize /
sizeof(HashJoinBatchStats)` to avoid integer overflow.
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62245

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/986017

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/986018

@RekGRpth
Copy link
Member

provide last commit message

I meant the commit message that will go into the resulting patch.

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62415

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/992699

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/992700

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62523

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/997908

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/997909

@BenderArenadata
Copy link

Failed job Behave tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/997906

@andr-sokolov andr-sokolov merged commit 625bcf5 into adb-6.x-dev Jan 26, 2024
5 checks passed
@andr-sokolov andr-sokolov deleted the ADBDEV-4789 branch January 26, 2024 05:17
Stolb27 pushed a commit that referenced this pull request Feb 19, 2024
When a hashjoin's hashtable runs out of allowed memory, it divides the hash
table into batches, and stores each batch to disk. When using EXPLAIN ANALYZE,
an additional description is created for the number of batches and the memory
they used. (With normal EXPLAIN, the request is not executed, and accordingly,
batches are not created). To store information about the disk space occupied by
a batch, the HashJoinTableStats structure is used, which has a batchstats
member of type HashJoinBatchStats, which is an array of structures for each
batch.

Usually, the number of batches is calculated at the beginning of hashjoin, but
sometimes the calculation algorithm may miss. Then, the
ExecHashIncreaseNumBatches() is used, which increases the current number of
batches, multiplying the previous one by two, and allocating additional memory
to the structures that need it.

Each structure of type HashJoinBatchStats takes 80 bytes, opposed to 8 bytes
for a pointer. Since memory for batchstats member is allocated for the
structures themselves, and not pointers to them, the size of realloc() call
for batchstats in ExecHashIncreaseNumBatches() can become more than
MaxAllocSize and cause an error.

In the beginning of ExecHashIncreaseNumBatches(), there is a check that
assumes that each item in any of the reallocated arrays would not be more than 8
bytes (sizeof(void *)). oldnbatch is equal to nbatch / 2. When we take the
size of the HashJoinBatchStats into account, we may notice that this check is
not correct specifically for this allocation, and is off by 10 times of the
value of the faulty repalloc() call.

The issue is that batchstats grows faster than other arrays containing
pointers. So this check does not prevent the faulty statistics batches
reallocation. When we are nearing MaxAllocSize with
nbatch * sizeof(stats->batchstats[0]), and try to reallocate the batchstats
member, we will encounter an error, because allocation size requested will be
more than MaxAllocSize.

Use repalloc_huge() instead of repalloc() for statistics, to allow
requesting memory up to:
MaxAllocSize / ((sizeof(void *) * 2) = 67 108 863;
67 108 863 * 80 = 5 368 709 040, which is slightly less than 5GB.

On a 32-bit machines, nbatch * sizeof(stats->batchstats[0]) could exceed
MaxAllocHugeSize, which is 2 ^ 32 = 4GB. Add a separate check and stop if
nbatch > MaxAllocHugeSize / sizeof(HashJoinBatchStats) to avoid integer
overflow.

On 64-bit machines, maximum allowed amount of batches is unchanged.

(cherry picked from commit 625bcf5)
@Stolb27 Stolb27 mentioned this pull request Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants