Getting numGroupsLimitReached for medium cardinality column (10000 distinct values)

I have a column with 10000 distinct values.
```
select count(distinct(device)) from metrics
```

```
{
	"resultTable": {
		"dataSchema": {
			"columnNames": [
				"distinctcount(device)"
			],
			"columnDataTypes": [
				"INT"
			]
		},
		"rows": [
			[
				10000
			]
		]
	},
	"exceptions": [],
	"numServersQueried": 1,
	"numServersResponded": 1,
	"numSegmentsQueried": 1,
	"numSegmentsProcessed": 1,
	"numSegmentsMatched": 1,
	"numConsumingSegmentsQueried": 0,
	"numDocsScanned": 3999999,
	"numEntriesScannedInFilter": 0,
	"numEntriesScannedPostFilter": 0,
	"numGroupsLimitReached": false,
	"totalDocs": 3999999,
	"timeUsedMs": 21,
	"segmentStatistics": [],
	"traceInfo": {},
	"minConsumingFreshnessTimeMs": 0
}
```


When i do a group by for this column, i get numGroupsLimitReached = true in the response. Even though the documentation states the default limit is set to 100k

select device, count(device) as aggreg from metrics group by device order by aggreg desc limit 10

```
{
	"resultTable": {
		"dataSchema": {
			"columnNames": [
				"device",
				"aggreg"
			],
			"columnDataTypes": [
				"STRING",
				"LONG"
			]
		},
		"rows": [
			[
				"device-6230",
				475
			],
			[
				"device-3277",
				470
			],
			[
				"device-2311",
				469
			],
			[
				"device-3933",
				469
			],
			[
				"device-4059",
				468
			],
			[
				"device-6002",
				468
			],
			[
				"device-621",
				466
			],
			[
				"device-2903",
				465
			],
			[
				"device-3900",
				463
			],
			[
				"device-9324",
				463
			]
		]
	},
	"exceptions": [],
	"numServersQueried": 1,
	"numServersResponded": 1,
	"numSegmentsQueried": 1,
	"numSegmentsProcessed": 1,
	"numSegmentsMatched": 1,
	"numConsumingSegmentsQueried": 0,
	"numDocsScanned": 3999999,
	"numEntriesScannedInFilter": 0,
	"numEntriesScannedPostFilter": 3999999,
	"numGroupsLimitReached": true,
	"totalDocs": 3999999,
	"timeUsedMs": 87,
	"segmentStatistics": [],
	"traceInfo": {},
	"minConsumingFreshnessTimeMs": 0
}
```


As per conversation in slack:
```
Do you have it configured explicitly? The config key is pinot.server.query.executor.num.groups.limit
10 replies

Amit Chopra  23 minutes ago
@Jackie - I believe you mean in pinot-server.conf? no i haven’t set it.

Jackie  21 minutes ago
Hmm.. That is unexpected

Jackie  20 minutes ago
Do you run the query in PQL mode or SQL mode?

Amit Chopra  20 minutes ago
SQL mode

Jackie  16 minutes ago
I just checked the code and we don't set it in SQL mode..

Jackie  15 minutes ago
Could you please file a github issue and put the details?

Amit Chopra  14 minutes ago
sure. Let me file an issue. Just so that i understand, you mean the 100k limit is not set? But what is the default limit in SQL mode today then?

Jackie  13 minutes ago
We don't put the numGroupsLimitReached in SQL mode. I don't know how it shows up in the response

Amit Chopra  12 minutes ago
got it

Amit Chopra  11 minutes ago
As per the second part of the question, given there was a limit of 10 on the query, shouldn’t this be handled by the engine (even if it was a column with more than 100k distinct values)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting numGroupsLimitReached for medium cardinality column (10000 distinct values) #6437

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Getting numGroupsLimitReached for medium cardinality column (10000 distinct values) #6437

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions