Consider using a disk-based hash table for hash join avoiding OOM #11607

SunRunAway · 2019-08-05T03:47:29Z

Feature Request

Is your feature request related to a problem? Please describe:

Consider using a disk-based hash table for hash join avoiding OOM.

HashJoinExecutor uses a hash table describing the map of join keys and inner table rows.

TiDB's hash join is implemented by innerResult and mvmap.MVMap. The innerResult stores all the rows of the inner table, and the mvmap.MVMap stores the map of (join key, inner table pointer). This allows us to use these two structures to get a map of join keys and inner table rows.
When the inner table is particularly large, the innerResult will take up a lot of memory; when the join key is particularly large, mvmap.MVMap will also take up a lot of memory. There will be problems with OOM at this time.

Describe the feature you'd like:

We already have a config mem-quota-query, which set the memory quota for a query in bytes.
Introduce a new config oom-use-tmp-storage, default is true. Set to true to enable use of temporary disk for some executors(in this issue, it is hash join) when mem-quota-query is exceeded.
Show disk usage of an executor in explain analyze
Show disk usage of a query in SELECT * FROM information_schema.processlist;
Consider disk usage in cost model.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

tasks:

The improvement of mvmap.MVMap

hash join executor: decrease the memory usage of hashTable in HashJoinExec #11832
index join
performance and code clean executor: reorg codes for hashtable in HashJoinExec #11937

Disk-based innerResult

hash join
- utilities: executor: utilities for disk-based hash join #12116
- implement disk-based hash join: executor: implement disk-based hash join #12067
index join

cost model, explain analyze, and disk usage control

change cost model of a hash join if it will be spilled planner: consider disk cost in hashJoin #13246
show disk usage information in explain analyze show disk usage information in explain analyze #12625

Some tiny issues

[For new contributor]Show disk usage of a query in SELECT * FROM information_schema.processlist; Show disk usage of a query in information_schema.processlist #13931
[For new contributor] Show disk usage of a query in low query and statement summary Show disk usage of a query in slow query and statement summary #16883
add metrics for disk usage of a query add metrics for disk usage of a query #17263
[For new contributor]change the default value of mem-quota-query change the default value of mem-quota-query #12937
[For new contributor]temporary storage usage limitation of all queries. temporary storage usage limitation of total queries. #13983
[For new contributor]Define temporary storage in config file. Define temporary storage in config file. #13982
[help wanted]multiple instances of tidb-server may use the same temporary directy multiple instances of tidb-server may use the same temporary directory #13981

The text was updated successfully, but these errors were encountered:

SunRunAway · 2019-08-19T03:41:59Z

The implementation refers to cdb. And here's an illustration of it at http://www.unixuser.org/~euske/doc/cdbinternals/index.html

Consider putting MainTable and SubTables referred in cdb into memory which is equivalent to MVMap, and records in cdb are equivalent to innerResult (regardless of innerResult in disk or in memory).

We divide this issue into two steps,

1) The improvement of `mvmap.MVMap`

Consider using the following code pattern:

h := hash(joinKeys)
hashTable.Put(h, rowPointer)

hashTable itself is a map, a simple description of the structure is map[h][]rowPointer or a self-implemented fix-sized hash map like SubTable described in cdb.

Compared to the original implementation benefits:

Originally MVMap requires joinKeys, which requires re-applying memory and then re-splicing memory for each join key. Now it is the first to calculate the hash with join keys, and use the interface like Update to remove the memory splicing process.

Thus the memory footprint of MVMap is a fixed value, related to the number of rows. After the implementation, we can measure the memory usage according to the different of the number of rows.

2) Disk-based `innerResult`

Define a threshold call MemLimit, when the memory usage of an innerResult used by a join executor exceeds MemLimit, it will be spilled out to disk.
I've written a slide to demonstrate how Spilling to disk is triggered, https://docs.google.com/presentation/d/1Sa9xNbDTPnLwnQHLKfpwksdYXWodisXPEqp-WR5Up0U/edit?usp=sharing

When innerResult is in a disk, we can consider storing the join keys twice in front of each line, so that when reading the disk, we can read the join keys first, then read Data if the join keys match.

SunRunAway added type/enhancement sig/execution SIG execution labels Aug 5, 2019

SunRunAway self-assigned this Aug 16, 2019

qw4990 added the type/new-feature label Aug 20, 2019

SunRunAway mentioned this issue Aug 22, 2019

executor: decrease the memory usage of hashTable in HashJoinExec #11832

Merged

SunRunAway mentioned this issue Aug 29, 2019

executor: reorg codes for hashtable in HashJoinExec #11937

Merged

5 tasks

This was referenced Sep 6, 2019

executor: implement disk-based hash join #12067

Merged

executor: utilities for disk-based hash join #12116

Merged

SunRunAway mentioned this issue Sep 27, 2019

Disk-based sort executor #12431

Closed

SunRunAway mentioned this issue Oct 11, 2019

show disk usage information in explain analyze #12625

Closed

SunRunAway mentioned this issue Nov 7, 2019

planner: consider disk cost in hashJoin #13246

Merged

This was referenced Dec 5, 2019

Show disk usage of a query in information_schema.processlist #13931

Closed

Define temporary storage in config file. #13982

Closed

SunRunAway added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Dec 9, 2019

tennix mentioned this issue Dec 23, 2019

Feature request: Add ephemeral storage for TiDB pod pingcap/tidb-operator#1395

Closed

XuHuaiyu mentioned this issue Jan 8, 2020

improve the memory management for executors #14390

Open

2 tasks

XuHuaiyu added this to TODO in Memory Management Jan 9, 2020

XuHuaiyu moved this from TODO to In Progress in Memory Management Jan 9, 2020

zz-jason added this to Need Triage in SIG Runtime Kanban via automation Mar 11, 2020

SunRunAway moved this from Issue Backlog: Need Triage to Backlog: Low Priority in SIG Runtime Kanban Mar 11, 2020

SunRunAway mentioned this issue Mar 23, 2020

temporary storage usage limitation of total queries. #13983

Closed

zz-jason removed good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. type/new-feature labels Apr 3, 2020

SunRunAway mentioned this issue May 18, 2020

add metrics for disk usage of a query #17263

Open

SunRunAway added the epic/memory-management label May 27, 2020

ghost mentioned this issue Jul 23, 2020

Support using temporary files for queries that require a lot of memory #9186

Open

ichn-hu mentioned this issue Nov 3, 2020

Welcome to contribute #20804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using a disk-based hash table for hash join avoiding OOM #11607

Consider using a disk-based hash table for hash join avoiding OOM #11607

SunRunAway commented Aug 5, 2019 •

edited

SunRunAway commented Aug 19, 2019 •

edited

Consider using a disk-based hash table for hash join avoiding OOM #11607

Consider using a disk-based hash table for hash join avoiding OOM #11607

Comments

SunRunAway commented Aug 5, 2019 • edited

Feature Request

Some tiny issues

SunRunAway commented Aug 19, 2019 • edited

1) The improvement of mvmap.MVMap

2) Disk-based innerResult

SunRunAway commented Aug 5, 2019 •

edited

SunRunAway commented Aug 19, 2019 •

edited

1) The improvement of `mvmap.MVMap`

2) Disk-based `innerResult`