Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using a disk-based hash table for hash join avoiding OOM #11607

Open
9 of 14 tasks
SunRunAway opened this issue Aug 5, 2019 · 1 comment
Open
9 of 14 tasks
Assignees
Labels
epic/memory-management help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/execution SIG execution type/enhancement

Comments

@SunRunAway
Copy link
Contributor

SunRunAway commented Aug 5, 2019

Feature Request

Is your feature request related to a problem? Please describe:

Consider using a disk-based hash table for hash join avoiding OOM.

HashJoinExecutor uses a hash table describing the map of join keys and inner table rows.

TiDB's hash join is implemented by innerResult and mvmap.MVMap. The innerResult stores all the rows of the inner table, and the mvmap.MVMap stores the map of (join key, inner table pointer). This allows us to use these two structures to get a map of join keys and inner table rows.
When the inner table is particularly large, the innerResult will take up a lot of memory; when the join key is particularly large, mvmap.MVMap will also take up a lot of memory. There will be problems with OOM at this time.

Describe the feature you'd like:

  1. We already have a config mem-quota-query, which set the memory quota for a query in bytes.
  2. Introduce a new config oom-use-tmp-storage, default is true. Set to true to enable use of temporary disk for some executors(in this issue, it is hash join) when mem-quota-query is exceeded.
  3. Show disk usage of an executor in explain analyze
  4. Show disk usage of a query in SELECT * FROM information_schema.processlist;
  5. Consider disk usage in cost model.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

tasks:

  1. The improvement of mvmap.MVMap
  1. Disk-based innerResult
  1. cost model, explain analyze, and disk usage control

Some tiny issues

@SunRunAway
Copy link
Contributor Author

SunRunAway commented Aug 19, 2019

The implementation refers to cdb. And here's an illustration of it at http://www.unixuser.org/~euske/doc/cdbinternals/index.html

Consider putting MainTable and SubTables referred in cdb into memory which is equivalent to MVMap, and records in cdb are equivalent to innerResult (regardless of innerResult in disk or in memory).

We divide this issue into two steps,

1) The improvement of mvmap.MVMap

Consider using the following code pattern:

h := hash(joinKeys)
hashTable.Put(h, rowPointer)

hashTable itself is a map, a simple description of the structure is map[h][]rowPointer or a self-implemented fix-sized hash map like SubTable described in cdb.

Compared to the original implementation benefits:

  • Originally MVMap requires joinKeys, which requires re-applying memory and then re-splicing memory for each join key. Now it is the first to calculate the hash with join keys, and use the interface like Update to remove the memory splicing process.

Thus the memory footprint of MVMap is a fixed value, related to the number of rows. After the implementation, we can measure the memory usage according to the different of the number of rows.

2) Disk-based innerResult

Define a threshold call MemLimit, when the memory usage of an innerResult used by a join executor exceeds MemLimit, it will be spilled out to disk.
I've written a slide to demonstrate how Spilling to disk is triggered, https://docs.google.com/presentation/d/1Sa9xNbDTPnLwnQHLKfpwksdYXWodisXPEqp-WR5Up0U/edit?usp=sharing

When innerResult is in a disk, we can consider storing the join keys twice in front of each line, so that when reading the disk, we can read the join keys first, then read Data if the join keys match.

@SunRunAway SunRunAway added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Dec 9, 2019
@XuHuaiyu XuHuaiyu added this to TODO in Memory Management Jan 9, 2020
@XuHuaiyu XuHuaiyu moved this from TODO to In Progress in Memory Management Jan 9, 2020
@zz-jason zz-jason added this to Need Triage in SIG Runtime Kanban via automation Mar 11, 2020
@SunRunAway SunRunAway moved this from Issue Backlog: Need Triage to Backlog: Low Priority in SIG Runtime Kanban Mar 11, 2020
@zz-jason zz-jason removed good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. type/new-feature labels Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic/memory-management help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/execution SIG execution type/enhancement
Projects
No open projects
SIG Runtime Kanban
  
Backlog: Low Priority
Development

No branches or pull requests

3 participants