Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot trigger hash join spill to disk #20085

Closed
ghost opened this issue Sep 17, 2020 · 4 comments
Closed

Cannot trigger hash join spill to disk #20085

ghost opened this issue Sep 17, 2020 · 4 comments

Comments

@ghost
Copy link

ghost commented Sep 17, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

I tried using an example based on #12067 :

DROP TABLE IF EXISTS t1, t2;
CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY auto_increment, pad1 BLOB, pad2 BLOB, pad3 BLOB);
CREATE TABLE t2 (id INT NOT NULL PRIMARY KEY auto_increment, t1_id INT NOT NULL, pad1 BLOB, pad2 BLOB, pad3 BLOB, INDEX(t1_id));
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM dual;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
SELECT SLEEP(1);
ANALYZE TABLE t1, t2;

SET tidb_mem_quota_query=1;
EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;

I have oom-tmp-storage on:

mysql> select JSON_EXTRACT(@@tidb_config, '$."oom-use-tmp-storage"');
+--------------------------------------------------------+
| JSON_EXTRACT(@@tidb_config, '$."oom-use-tmp-storage"') |
+--------------------------------------------------------+
| true                                                   |
+--------------------------------------------------------+
1 row in set (0.00 sec)

2. What did you expect to see? (Required)

I expected the query to spill to disk. I have tried increasing the memory for the query slightly, so everything but the hash table could fit in memory, but no luck.

I tried also decreasing tidb_mem_quota_hashjoin to a very low number, and leaving tidb_mem_quota_query as default, but it won't spill the table to disk:

mysql> SET tidb_mem_quota_hashjoin = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SET tidb_mem_quota_query = DEFAULT;
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+---------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+----------------------+---------+
| id                          | estRows | actRows | task      | access object | execution info                                                                                                                                                                                  | operator info                                  | memory               | disk    |
+-----------------------------+---------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+----------------------+---------+
| HashJoin_27                 | 20.00   | 10      | root      |               | time:25.105775ms, loops:2, build_hash_table:{total:485.397µs, fetch:481.137µs, build:4.26µs}, probe:{concurrency:5, total:124.944049ms, max:25.073315ms, probe:337.849µs, fetch:124.6062ms}     | inner join, equal:[eq(test.t1.id, test.t2.id)] | 68.92578125 KB       | 0 Bytes |
| ├─TableReader_29(Build)     | 20.00   | 10      | root      |               | time:418.397µs, loops:2, cop_task: {num: 1, max:479.397µs, proc_keys: 10, rpc_num: 1, rpc_time: 460.817µs, copr_cache_hit_ratio: 0.00}                                                          | data:TableFullScan_28                          | 30.5556640625 KB     | N/A     |
| │ └─TableFullScan_28        | 20.00   | 10      | cop[tikv] | table:t1      | time:0s, loops:1                                                                                                                                                                                | keep order:false                               | N/A                  | N/A     |
| └─TableReader_31(Probe)     | 6000.00 | 3000    | root      |               | time:24.893267ms, loops:4, cop_task: {num: 1, max:24.145811ms, proc_keys: 3000, rpc_num: 1, rpc_time: 24.131031ms, copr_cache_hit_ratio: 0.00}                                                  | data:TableFullScan_30                          | 8.904180526733398 MB | N/A     |
|   └─TableFullScan_30        | 6000.00 | 3000    | cop[tikv] | table:t2      | time:4ms, loops:7                                                                                                                                                                               | keep order:false                               | N/A                  | N/A     |
+-----------------------------+---------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+----------------------+---------+
5 rows in set (0.03 sec)

3. What did you see instead (Required)

mysql> EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
ERROR 1105 (HY000): Out Of Memory Quota![conn_id=9]

4. What is your TiDB version? (Required)

mysql> SELECT tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-1239-gf0db8c68e-dirty
Edition: Community
Git Commit Hash: f0db8c68e1cd56a39524003e7282de87d248fef1
Git Branch: master
UTC Build Time: 2020-09-16 13:51:00
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)
@ghost ghost added the type/bug This issue is a bug. label Sep 17, 2020
@ghost ghost mentioned this issue Sep 17, 2020
9 tasks
@XuHuaiyu
Copy link
Contributor

  1. When set tidb_mem_quota_query = 1, the spill will be triggered as the log shows:
    [2020/09/18 10:25:52.396 +08:00] [INFO] [row_container.go:311] ["memory exceeds quota, spill to disk now."] [consumed=30981] [quota=1]

BUT even if the hash table is spilled to the disk, the memory quota is not enough for the other executors to execute successfully, thus the panic is triggered.

As the following example shows, the spill can be triggered successfully.

tidb> SET tidb_mem_quota_query=9310607;
Query OK, 0 rows affected (0.00 sec)

tidb> EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+---------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+----------------------+--------------+
| id                          | estRows | actRows | task      | access object | execution info                                                                                                                                                                                     | operator info                                  | memory               | disk         |
+-----------------------------+---------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+----------------------+--------------+
| HashJoin_27                 | 20.00   | 10      | root      |               | time:27.052715ms, loops:2, build_hash_table:{total:177.271µs, fetch:169.799µs, build:7.472µs}, probe:{concurrency:5, total:134.682836ms, max:27.002625ms, probe:1.109014ms, fetch:133.573822ms}    | inner join, equal:[eq(test.t1.id, test.t2.id)] | 50.92578125 KB       | 30.390625 KB |
| ├─TableReader_29(Build)     | 20.00   | 10      | root      |               | time:67.868µs, loops:2, cop_task: {num: 1, max:151.201µs, proc_keys: 0, rpc_num: 1, rpc_time: 129.544µs, copr_cache_hit_ratio: 0.00}                                                               | data:TableFullScan_28                          | 30.2548828125 KB     | N/A          |
| │ └─TableFullScan_28        | 20.00   | 0       | cop[tikv] | table:t1      | time:0ns, loops:0                                                                                                                                                                                  | keep order:false                               | N/A                  | N/A          |
| └─TableReader_31(Probe)     | 6000.00 | 3000    | root      |               | time:26.874063ms, loops:4, cop_task: {num: 1, max:15.457761ms, proc_keys: 0, rpc_num: 1, rpc_time: 15.434075ms, copr_cache_hit_ratio: 0.00}                                                        | data:TableFullScan_30                          | 8.829554557800293 MB | N/A          |
|   └─TableFullScan_30        | 6000.00 | 0       | cop[tikv] | table:t2      | time:0ns, loops:0                                                                                                                                                                                  | keep order:false                               | N/A                  | N/A          |
+-----------------------------+---------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+----------------------+--------------+
5 rows in set (0.02 sec)

@ghost
Copy link
Author

ghost commented Sep 18, 2020

Thanks! I think the problem is my hash table was too small. I've adjusted the testcase, and it's more reliable now:

SET tidb_mem_quota_query=DEFAULT;
DROP TABLE IF EXISTS t1, t2;
CREATE TABLE t1 (id BIGINT NOT NULL PRIMARY KEY auto_increment, pad1 BLOB, pad2 BLOB, pad3 BLOB);
CREATE TABLE t2 (id BIGINT NOT NULL PRIMARY KEY auto_increment, t1_id BIGINT NOT NULL, pad1 BLOB, pad2 BLOB, pad3 BLOB, INDEX(t1_id));
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM dual;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
SELECT SLEEP(1);
ANALYZE TABLE t1, t2;

SET tidb_mem_quota_query=500 * 1024 * 1024;
EXPLAIN ANALYZE
SELECT /*+ HASH_JOIN(t1, t2) */ *
FROM t1, t2 WHERE t1.id = t2.t1_id;

Closing this issue now.

@ghost ghost closed this as completed Sep 18, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Sep 18, 2020

Integrity check:
component severity RCA symptom affect_version fix_version fields are empty

Please comment /info to get template

@ti-srebot
Copy link
Contributor

Please edit this comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added

1. Root Cause Analysis (RCA)

2. Symptom

3. Minimal steps to reproduce the bug (optional)

4. Solution (optional)

5. W/A (Workaround)

6. Affected versions

7. Fixed versions

@ghost ghost removed the type/bug This issue is a bug. label Sep 18, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants