Optimize TI analytic rule deduplication and filters#13904
Optimize TI analytic rule deduplication and filters#13904v-atulyadav merged 4 commits intomasterfrom
Conversation
Reorder and simplify the Threat Intelligence analytic query to deduplicate records earlier and apply active/expiration filters afterwards. Added summarize arg_max(TimeGenerated, *) by Id, Url and by Id, ObservableValue to get the latest record before filtering, moved IsActive/ValidUntil/ExpirationDateTime checks to follow deduplication, and removed redundant time-order checks and extra arg_max usages related to EmailUrlInfo. These changes ensure the latest valid indicators are retained and simplify the join logic. Bumped version 1.0.5 -> 1.0.6.
|
Hello how are you I am GitHub bot |
There was a problem hiding this comment.
Pull request overview
This PR refactors the Threat Intelligence analytic rule query to deduplicate TI indicator records earlier (via arg_max) and streamline filtering/join logic, with a corresponding template version bump.
Changes:
- Deduplicate URL and domain TI indicators earlier using
summarize arg_max(TimeGenerated, *) .... - Reorder/adjust active/expiration filtering and simplify join-related logic with
EmailUrlInfo_. - Bump analytic rule template version from
1.0.5to1.0.6.
| | summarize arg_max(TimeGenerated, *) by Id, Url // Get latest record per Id FIRST | ||
| | where IsActive == true and ValidUntil > now() // THEN filter for active indicators that haven't expired |
There was a problem hiding this comment.
Reordering to deduplicate before applying IsActive/ValidUntil can drop valid indicators: if the most recent record for a given Id, Url is inactive/expired but an earlier record is still active/valid, the arg_max keeps the inactive one and the subsequent where removes it entirely. To preserve the latest valid indicator, apply the IsActive/ValidUntil filter before arg_max, or compute arg_max over only the valid subset (e.g., filter first, then summarize).
| | summarize arg_max(TimeGenerated, *) by Id, Url // Get latest record per Id FIRST | ||
| | where IsActive == true and ValidUntil > now() // THEN filter for active indicators that haven't expired | ||
| | join kind=innerunique (EmailUrlInfo_) on Url // Join with email URL info on URL |
There was a problem hiding this comment.
After switching to join kind=innerunique without re-selecting the latest EmailUrlInfo_TimeGenerated per indicator/URL, the chosen EmailUrlInfo_ row becomes effectively arbitrary when multiple email-URL records exist for the same Url. This can yield nondeterministic results and may associate older/newer email evidence incorrectly. Consider deduplicating EmailUrlInfo_ (e.g., arg_max(EmailUrlInfo_TimeGenerated, *) by Url) before the join, or restoring a post-join summarize arg_max(EmailUrlInfo_TimeGenerated, *) ... to make the selection deterministic.
| | summarize arg_max(TimeGenerated, *) by Id, ObservableValue // Get latest record per Id FIRST | ||
| | where IsActive == true // THEN filter for active indicators | ||
| | extend TrafficLightProtocolLevel = AdditionalFields.TLPLevel | ||
| | extend DomainName = tolower(ObservableValue) // Convert domain names to lowercase |
There was a problem hiding this comment.
In the domain branch, tolower(ObservableValue) is applied after deduplication/grouping by ObservableValue. If the same domain appears with different casing, it will be treated as distinct groups during summarize, potentially producing duplicates downstream even though DomainName is later normalized. Normalize first (create a lowercased domain column) and then summarize ... by Id, <lowercasedDomain> (or summarize by the normalized key you actually use later).
| | summarize arg_max(TimeGenerated, *) by Id, ObservableValue // Get latest record per Id FIRST | |
| | where IsActive == true // THEN filter for active indicators | |
| | extend TrafficLightProtocolLevel = AdditionalFields.TLPLevel | |
| | extend DomainName = tolower(ObservableValue) // Convert domain names to lowercase | |
| | extend DomainName = tolower(ObservableValue) // Normalize domain names to lowercase before summarization | |
| | summarize arg_max(TimeGenerated, *) by Id, DomainName // Get latest record per Id and normalized domain | |
| | where IsActive == true // THEN filter for active indicators | |
| | extend TrafficLightProtocolLevel = AdditionalFields.TLPLevel |
rahul0216
left a comment
There was a problem hiding this comment.
Please address discussed pointers.
Reorder and clean up ThreatIntelIndicators handling for URL and domain matches. For URLs, summarize (arg_max) by Id/ObservableValue first, then filter for active/unexpired indicators, and normalize ObservableValue to lowercase before joining with EmailUrlInfo_. For domains, normalize DomainName to lowercase earlier and correct the projected active field name to IsActive. These changes ensure joins use the latest, active indicators and consistent casing for reliable matching.
Release update for Threat Intelligence (NEW): add packaged artifact 3.0.16.zip and bump solution/template versions to 3.0.16. Update analytic rule 6 version to 1.0.6 and refine its KQL query logic to use arg_max(TimeGenerated,*) and improved filtering for URL/domain indicators. Standardize/roll back several resource apiVersion values (various Microsoft.OperationalInsights and Microsoft.SecurityInsights providers) and update resource descriptions and field mappings formatting across templates.
Reorder and simplify the Threat Intelligence analytic query to deduplicate records earlier and apply active/expiration filters afterwards. Added summarize arg_max(TimeGenerated, *) by Id, Url and by Id, ObservableValue to get the latest record before filtering, moved IsActive/ValidUntil/ExpirationDateTime checks to follow deduplication, and removed redundant time-order checks and extra arg_max usages related to EmailUrlInfo. These changes ensure the latest valid indicators are retained and simplify the join logic. Bumped version 1.0.5 -> 1.0.6.
Required items, please complete
Change(s):
Reason for Change(s):
Version Updated:
Testing Completed:
Checked that the validations are passing and have addressed any issues that are present:
Guidance <- remove section before submitting
Before submitting this PR please ensure that you have read the following sections and filled out the changes, reason for change and testing complete sections:
Thank you for your contribution to the Microsoft Sentinel Github repo.
Change(s):
Reason for Change(s):
Version updated:
Testing Completed:
Note: If updating a detection, you must update the version field.
Checked that the validations are passing and have addressed any issues that are present:
Note: Let us know if you have tried fixing the validation error and need help.