Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String last aggregator is not considering all the segments for calculating the aggregation #9645

Closed
roopini550 opened this issue Apr 8, 2020 · 5 comments

Comments

@roopini550
Copy link

String last aggregator is not considering all the segments in the given interval for calculating the aggregation in a groupby query.

Affected Version

Druid version - 0.16.0-incubating.

Description

We are using stringlast aggregation for getting the latest value of the dimension in datasource in a groupby query. But Last aggregators are considering only few segmets for calculating the last value of the specified dimension in the query, so giving wrong results. We expected that it will consider all the segments for calculating last aggregation.

@gianm
Copy link
Contributor

gianm commented Apr 8, 2020

Hi @roopini550,

Aggregators don't really have control over what segments they read. So I think what you are seeing is probably happening for some other reason. Do you have any more details? Some steps to reproduce this behavior would be great.

@roopini550
Copy link
Author

Hi @gianm ,

Here is our simple query with "descending": "true" for getting sample data in timestamp colum descending order.. And here "instrument_breached_time" is our timestamp field....

{
"queryType": "select",
"dataSource": "client_1_SLAMonitoring",
"descending": "true",
"dimensions": [],
"metrics": [],
"granularity": "all",
"intervals": [
"2020-04-06/2020-04-17"
],
"filter": { "type": "selector", "dimension": "instrument_id", "value":"Ass_7448" },
"pagingSpec": {
"pagingIdentifiers": {},
"threshold": 1000
}
}

And here is the result for above query..

[ {
"timestamp" : "2020-04-10T04:10:00.895Z",
"result" : {
"pagingIdentifiers" : {
"client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z_3" : -2,
"client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z" : -5
},
"dimensions" : [ "agent_name", "agent_id", ........" ],
"metrics" : [ ],
"events" : [ {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z_3",
"offset" : -1,
"event" : {
"timestamp" : "2020-04-10T04:33:14.399Z",
"instrument_id" : "Ass_7448",
"process_uuid" : ".....",
"task_name" : "UT1",
"task_status" : "ASSIGNED",
"task_completed_time" : 1587038797612,
"instrument_breached_time" : 1586493194399
}
}, {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z_3",
"offset" : -2,
"event" : {
"timestamp" : "2020-04-10T04:33:14.399Z",
"instrument_id" : "Ass_7448",
"task_name" : "................",
"task_status" : "ASSIGNED",
"task_completed_time" : 1587038797502,
"instrument_breached_time" : 1586493194399
}
}, {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z",
"offset" : -1,
"event" : {
"timestamp" : "2020-04-10T04:33:14.399Z",
"instrument_breached_time" : 1586493194399,
"instrument_id" : "Ass_7448",
"task_completed_time" : 1586491394783,
"task_name" : "................",
"task_status" : "QUEUED"
}
}, {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z",
"offset" : -2,
"event" : {
"timestamp" : "2020-04-10T04:06:14.399Z",
"instrument_breached_time" : 1586491574399,
"instrument_id" : "Ass_7448",
"task_completed_time" : 1586491394783,
"task_name" : "................",
"task_status" : "QUEUED"
}
}, {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z",
"offset" : -3,
"event" : {
"timestamp" : "2020-04-10T04:05:20.399Z",
"instrument_breached_time" : 1586491520399,
"instrument_id" : "Ass_7448",
"task_completed_time" : 1586491394783,
"task_name" : "................",
"task_status" : "QUEUED" }
}, {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z",
"offset" : -4,
"event" : {
"timestamp" : "2020-04-10T04:04:44.399Z",
"instrument_breached_time" : 1586491484399,
"instrument_id" : "Ass_7448",
"task_completed_time" : 1586491394783,
"task_name" : "................",
"task_status" : "QUEUED }
}, {
"segmentId" : "client_1_SLAMonitoring_2020-04-10T04:00:00.000Z_2020-04-10T05:00:00.000Z_2020-04-10T04:02:46.062Z",
"offset" : -5,
"event" : {
"timestamp" : "2020-04-10T04:03:14.969Z",
"instrument_breached_time" : 1586491394969,
"instrument_id" : "Ass_7448",
"task_completed_time" : 1586491394783,
"task_name" : "................",
"task_status" : "QUEUED"
}
} ]
}
} ]

And here is our Stringlast groupBy query on instrument_id for latest value of dimension task_status....

{
"queryType": "groupBy",
"dataSource": "client_1_SLAMonitoring",
"dimensions":["instrument_id"],
"threshold": 5,
"metric": "count",
"granularity": "all",
"filter": { "type": "selector", "dimension": "instrument_id", "value":"Ass_7448" },
"aggregations": [
{ "type" : "longLast", "name" : "comptimeLast","fieldName" : "task_completed_time" },
{
"type": "stringLast",
"name": "lastest_task_status",
"fieldName": "task_status"
}],
"postAggregations": [],
"intervals": [
"2020-04-06/2020-04-17"
]
}

And here is the result of above StringLast query...

[ {
"version" : "v1",
"timestamp" : "2020-04-06T00:00:00.000Z",
"event" : {
"lastest_task_status" : "QUEUED",
"instrument_id" : "Ass_7448",
"comptimeLast" : 1586491394783
}
} ]

As you can see here latest task status should be "ASSIGNED" here.. but it is giving QUEUED as a result..

And another observation here is Last aggregators are not considering segments having partition/version numbers in their segment ids. Same you can observe in the above example.

@roopini550
Copy link
Author

Hi,

Can anyone please give more info about this issue. we are stuck by using stringlast in some of our functionalities.

Thanks,
Roopini

@github-actions
Copy link

github-actions bot commented Aug 6, 2023

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 6, 2023
@github-actions
Copy link

github-actions bot commented Sep 4, 2023

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants