-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Affected Version
Druid version : 0.18.0
Description
- Use the Kafka-Indexing service to ingest real-time data
- The data Schema has a dimension field of type string;The real value, of course, is a number, and is typed as a string to make it easier to use the Bitmap index
- I had a problem converting this string type dimension field to a number type in an SQL query for aggregate statistics.The statistics occasionally show a value of 0
- There is no problem with corresponding statistics when the data is solidified
- When I tried to make this field into a numerical metric, I had no problem with statistics
- This issue did not occur in the previous version of the upgrade (0.15.0)
Kafka-Indexing json:
"dataSchema": {
"dataSource": "cell_monitor_v3_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
},
"metricsSpec": [
{
"type": "longSum",
"name": "cell_count",
"fieldName": "cell_count",
"expression": null
},
{
"type": "longSum",
"name": "sum_status_continue_time",
"fieldName": "sum_status_continue_time",
"expression": null
}
],
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "snapshot_time",
"format": "millis"
},
"dimensionsSpec": {
"dimensions": [
"space_status",
"reserve_status",
"dis_code",
"business_type",
"status_continue_time",
"has_exception"
]
}
}
},
"transformSpec": {
"transforms": [
{ "type": "expression", "name": "sum_status_continue_time", "expression": "nvl(status_continue_time,0)" }
]
}
}
As a string dimension, the query SQL is shown below, with a data statistic of 0 in the ingestion Task interval :
SELECT
__time,
SUM(cell_count),
round((SUM(CAST(status_continue_time AS NUMERIC))/2)/sum(cell_count),2),
SUM(CAST(status_continue_time AS NUMERIC))/2
FROM "cell_monitor_v3_test"
WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '10' MINUTE
AND space_status = 2
AND business_type > 0
AND has_exception = 0
GROUP BY __time
As a numerical indicator, the query SQL is shown below, and the data statistics are correct in the ingestion Task interval :
SELECT
__time,
SUM(cell_count),
round((SUM(sum_status_continue_time)/2)/sum(cell_count),2),
SUM(sum_status_continue_time)/2
FROM "cell_monitor_v3_test"
WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '10' MINUTE
AND space_status = 2
AND business_type > 0
AND has_exception = 0
GROUP BY __time