Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the search conditions to achieve the purpose of narrowing the search scope, but the result is that more data is parsed, and the query becomes slower #13401

Open
MrSsunlight opened this issue Aug 6, 2020 · 0 comments

Comments

@MrSsunlight
Copy link

system: centos7
version: 20.3.11.97

The default.test_log table is a MergeTree() table,
The default.rule table is the Dictionary() table.

Query sql statement:

WITH 
	toDateTime('2020-08-05 00:00:00') As start_time, 
	toDateTime('2020-08-05 23:59:59') As end_time 
SELECT 
	tnow, 
	if(ipType = 0, IPv4NumToString(ipv4), IPv6NumToString(ipv6)) AS ip 
FROM default.test_log cl 
WHERE 
	cl.tnow >= start_time AND 
	cl.tnow <= end_time AND 
	ip in (
		SELECT 
			ra.addr 
		FROM default.rule ra 
		WHERE 
			(ra.is_deleted == 0 or ra.is_deleted == 1) and (ra.name = 'test')
	)
LIMIT 0, 10

The difference between the two query sql is that one more condition (ra.name ='test') is added;

set send_logs_level='trace'

A: Filter condition does not contain (ra.name='test'):

<Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(if(ipType = 0, IPv4NumToString(ipv4), IPv6NumToString(ipv6)) AS ip) IN ((SELECT ra.addr FROM default.rule AS ra WHERE (ra.is_deleted = 0) OR (ra.is_deleted = 1)) AS _subquery9)" moved to PREWHERE
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(tnow, ipType, ipv4, ipv6) ON default.test_log
<Debug> default.test_log (SelectExecutor): Key condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): Selected 4 parts by date, 4 parts by key, 87898 marks to read from 4 ranges
<Trace> default.test_log (SelectExecutor): Reading approx. 1437378474 rows with 12 streams
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Trace> CreatingSetsBlockInputStream: Creating set. 
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Debug> CreatingSetsBlockInputStream: Created. Set with 477382 entries from 477382 rows. In 0.497 sec.
<Information> executeQuery: Read 903366 rows, 20.78 MiB in 1.300 sec., 695092 rows/sec., 15.99 MiB/sec.
<Debug> MemoryTracker: Peak memory usage (for query): 146.56 MiB.

B: Filter conditions include (ra.name ='test'):

<Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(if(ipType = 0, IPv4NumToString(ipv4), IPv6NumToString(ipv6)) AS ip) IN ((SELECT ra.addr FROM default.rule AS ra WHERE ((ra.is_deleted = 0) OR (ra.is_deleted = 1)) AND (ra.name = 'test')) AS _subquery10)" moved to PREWHERE
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, name, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(addr, name, is_deleted) ON default.rule
<Trace> AccessRightsContext (admin): Access granted: SELECT(tnow, ipType, ipv4, ipv6) ON default.test_log
<Debug> default.test_log (SelectExecutor): Key condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [1596556800, +inf)), (column 0 in (-inf, 1596643199]), and, and, unknown, and
<Debug> default.test_log (SelectExecutor): Selected 4 parts by date, 4 parts by key, 87898 marks to read from 4 ranges
<Trace> default.test_log (SelectExecutor): Reading approx. 1437378474 rows with 12 streams
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Trace> CreatingSetsBlockInputStream: Creating set. 
<Trace> InterpreterSelectQuery: FetchColumns -> Complete
<Debug> CreatingSetsBlockInputStream: Created. Set with 1 entries from 1 rows. In 0.644 sec.
<Debug> MergeTreeReadPool: Slow read, event №1: read 1048576 bytes in 1.081 sec., 0.970 MB/s.
<Debug> MergeTreeReadPool: Slow read, event №2: read 1048576 bytes in 1.687 sec., 0.622 MB/s.
<Debug> MergeTreeReadPool: Will lower number of threads to 11
<Information> executeQuery: Read 1412403247 rows, 27.63 GiB in 34.523 sec., 40912481 rows/sec., 819.54 MiB/sec.
<Debug> MemoryTracker: Peak memory usage (for query): 126.37 MiB.

The difference between A and B is CreatingSetsBlockInputStream: Created. Set with * entries from * rows ;

The number of results of '(ra.is_deleted == 0 or ra.is_deleted == 1) and (ra.name = 'test')' is much smaller than that of '(ra.is_deleted == 0 or ra.is_deleted == 1)', But the 'Created. Set entries' of A needs more than the query of B.<Debug> CreatingSetsBlockInputStream: Created. Set with 477382 entries from 477382 rows. In 0.497 sec and <Debug> CreatingSetsBlockInputStream: Created. Set with 1 entries from 1 rows. In 0.644 sec

What is the logic of CreatingSetsBlockInputStream: Created. Set with 1 entrie ? Why I added a conditional restriction but got more data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant