Increase the search conditions to achieve the purpose of narrowing the search scope, but the result is that more data is parsed, and the query becomes slower #13401

MrSsunlight · 2020-08-06T08:27:41Z

system: centos7
version: 20.3.11.97

The default.test_log table is a MergeTree() table,
The default.rule table is the Dictionary() table.

Query sql statement：

WITH 
	toDateTime('2020-08-05 00:00:00') As start_time, 
	toDateTime('2020-08-05 23:59:59') As end_time 
SELECT 
	tnow, 
	if(ipType = 0, IPv4NumToString(ipv4), IPv6NumToString(ipv6)) AS ip 
FROM default.test_log cl 
WHERE 
	cl.tnow >= start_time AND 
	cl.tnow <= end_time AND 
	ip in (
		SELECT 
			ra.addr 
		FROM default.rule ra 
		WHERE 
			(ra.is_deleted == 0 or ra.is_deleted == 1) and (ra.name = 'test')
	)
LIMIT 0, 10

The difference between the two query sql is that one more condition (ra.name ='test') is added；

set send_logs_level='trace'

A: Filter condition does not contain (ra.name='test')：

<Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(if(ipType = 0, IPv4NumToString(ipv4), IPv6NumToString(ipv6)) AS ip) IN ((SELECT ra.addr FROM default.rule AS ra WHERE (ra.is_deleted = 0) OR (ra.is_deleted = 1)) AS _subquery9)" moved to PREWHERE
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(tnow, ipType, ipv4, ipv6) ON default.test_log
<Debug> default.test_log (SelectExecutor): Key condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): Selected 4 parts by date, 4 parts by key, 87898 marks to read from 4 ranges
<Trace> default.test_log (SelectExecutor): Reading approx. 1437378474 rows with 12 streams
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Trace> CreatingSetsBlockInputStream: Creating set. 
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Debug> CreatingSetsBlockInputStream: Created. Set with 477382 entries from 477382 rows. In 0.497 sec.
<Information> executeQuery: Read 903366 rows, 20.78 MiB in 1.300 sec., 695092 rows/sec., 15.99 MiB/sec.
<Debug> MemoryTracker: Peak memory usage (for query): 146.56 MiB.

B: Filter conditions include (ra.name ='test'):

<Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(if(ipType = 0, IPv4NumToString(ipv4), IPv6NumToString(ipv6)) AS ip) IN ((SELECT ra.addr FROM default.rule AS ra WHERE ((ra.is_deleted = 0) OR (ra.is_deleted = 1)) AND (ra.name = 'test')) AS _subquery10)" moved to PREWHERE
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, name, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, name, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(tnow, ipType, ipv4, ipv6) ON default.test_log
<Debug> default.test_log (SelectExecutor): Key condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): Selected 4 parts by date, 4 parts by key, 87898 marks to read from 4 ranges
<Trace> default.test_log (SelectExecutor): Reading approx. 1437378474 rows with 12 streams
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Trace> CreatingSetsBlockInputStream: Creating set. 
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Debug> CreatingSetsBlockInputStream: Created. Set with 1 entries from 1 rows. In 0.644 sec.
<Debug> MergeTreeReadPool: Slow read, event №1: read 1048576 bytes in 1.081 sec., 0.970 MB/s.
<Debug> MergeTreeReadPool: Slow read, event №2: read 1048576 bytes in 1.687 sec., 0.622 MB/s.
<Debug> MergeTreeReadPool: Will lower number of threads to 11
<Information> executeQuery: Read 1412403247 rows, 27.63 GiB in 34.523 sec., 40912481 rows/sec., 819.54 MiB/sec.
<Debug> MemoryTracker: Peak memory usage (for query): 126.37 MiB.

The difference between A and B is CreatingSetsBlockInputStream: Created. Set with * entries from * rows ；

The number of results of '(ra.is_deleted == 0 or ra.is_deleted == 1) and (ra.name = 'test')' is much smaller than that of '(ra.is_deleted == 0 or ra.is_deleted == 1)', But the 'Created. Set entries' of A needs more than the query of B.<Debug> CreatingSetsBlockInputStream: Created. Set with 477382 entries from 477382 rows. In 0.497 sec and <Debug> CreatingSetsBlockInputStream: Created. Set with 1 entries from 1 rows. In 0.644 sec；

What is the logic of CreatingSetsBlockInputStream: Created. Set with 1 entrie ? Why I added a conditional restriction but got more data?

The text was updated successfully, but these errors were encountered:

MrSsunlight added the performance label Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the search conditions to achieve the purpose of narrowing the search scope, but the result is that more data is parsed, and the query becomes slower #13401

Increase the search conditions to achieve the purpose of narrowing the search scope, but the result is that more data is parsed, and the query becomes slower #13401

MrSsunlight commented Aug 6, 2020

Increase the search conditions to achieve the purpose of narrowing the search scope, but the result is that more data is parsed, and the query becomes slower #13401

Increase the search conditions to achieve the purpose of narrowing the search scope, but the result is that more data is parsed, and the query becomes slower #13401

Comments

MrSsunlight commented Aug 6, 2020