Skip to content

Optimize TI analytic rule deduplication and filters#13904

Merged
v-atulyadav merged 4 commits intomasterfrom
v-sabiraj-addinganalyticrulesinstructions
Mar 25, 2026
Merged

Optimize TI analytic rule deduplication and filters#13904
v-atulyadav merged 4 commits intomasterfrom
v-sabiraj-addinganalyticrulesinstructions

Conversation

@v-sabiraj
Copy link
Copy Markdown
Contributor

Reorder and simplify the Threat Intelligence analytic query to deduplicate records earlier and apply active/expiration filters afterwards. Added summarize arg_max(TimeGenerated, *) by Id, Url and by Id, ObservableValue to get the latest record before filtering, moved IsActive/ValidUntil/ExpirationDateTime checks to follow deduplication, and removed redundant time-order checks and extra arg_max usages related to EmailUrlInfo. These changes ensure the latest valid indicators are retained and simplify the join logic. Bumped version 1.0.5 -> 1.0.6.

Required items, please complete

Change(s):

  • See guidance below

Reason for Change(s):

  • See guidance below

Version Updated:

  • Required only for Detections/Analytic Rule templates
  • See guidance below

Testing Completed:

  • See guidance below

Checked that the validations are passing and have addressed any issues that are present:

  • See guidance below

Guidance <- remove section before submitting


Before submitting this PR please ensure that you have read the following sections and filled out the changes, reason for change and testing complete sections:

Thank you for your contribution to the Microsoft Sentinel Github repo.

Details of the code changes in your submitted PR. Providing descriptions for pull requests ensures there is context to changes being made and greatly enhances the code review process. Providing associated Issues that this resolves also easily connects the reason.

Change(s):

  • Updated syntax for XYZ.yaml

Reason for Change(s):

Version updated:

  • Yes
  • Detections/Analytic Rule templates are required to have the version updated

The code should have been tested in a Microsoft Sentinel environment that does not have any custom parsers, functions or tables, so that you validate no incorrect syntax and execution functions properly. If your submission requires a custom parser or function, it must be submitted with the PR.

Testing Completed:

  • Yes/No/Need Help

Note: If updating a detection, you must update the version field.

Before the submission has been made, please look at running the KQL and Yaml Validation Checks locally.
https://github.com/Azure/Azure-Sentinel#run-kql-validation-locally

Checked that the validations are passing and have addressed any issues that are present:

  • Yes/No/Need Help

Note: Let us know if you have tried fixing the validation error and need help.

References:


Reorder and simplify the Threat Intelligence analytic query to deduplicate records earlier and apply active/expiration filters afterwards. Added summarize arg_max(TimeGenerated, *) by Id, Url and by Id, ObservableValue to get the latest record before filtering, moved IsActive/ValidUntil/ExpirationDateTime checks to follow deduplication, and removed redundant time-order checks and extra arg_max usages related to EmailUrlInfo. These changes ensure the latest valid indicators are retained and simplify the join logic. Bumped version 1.0.5 -> 1.0.6.
@v-sabiraj v-sabiraj requested review from a team as code owners March 25, 2026 08:18
@contentautomationbot
Copy link
Copy Markdown

Hello how are you I am GitHub bot
😀😀
I see that you changed templates under the detections/analytic rules folder. Did you remember to update the version of the templates you changed?
If not, and if you want customers to be aware that a new version of this template is available, please update the version property of the template you changed.

@v-sabiraj v-sabiraj requested a review from Copilot March 25, 2026 08:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Threat Intelligence analytic rule query to deduplicate TI indicator records earlier (via arg_max) and streamline filtering/join logic, with a corresponding template version bump.

Changes:

  • Deduplicate URL and domain TI indicators earlier using summarize arg_max(TimeGenerated, *) ....
  • Reorder/adjust active/expiration filtering and simplify join-related logic with EmailUrlInfo_.
  • Bump analytic rule template version from 1.0.5 to 1.0.6.

Comment on lines +43 to +44
| summarize arg_max(TimeGenerated, *) by Id, Url // Get latest record per Id FIRST
| where IsActive == true and ValidUntil > now() // THEN filter for active indicators that haven't expired
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordering to deduplicate before applying IsActive/ValidUntil can drop valid indicators: if the most recent record for a given Id, Url is inactive/expired but an earlier record is still active/valid, the arg_max keeps the inactive one and the subsequent where removes it entirely. To preserve the latest valid indicator, apply the IsActive/ValidUntil filter before arg_max, or compute arg_max over only the valid subset (e.g., filter first, then summarize).

Copilot uses AI. Check for mistakes.
Comment on lines 43 to 45
| summarize arg_max(TimeGenerated, *) by Id, Url // Get latest record per Id FIRST
| where IsActive == true and ValidUntil > now() // THEN filter for active indicators that haven't expired
| join kind=innerunique (EmailUrlInfo_) on Url // Join with email URL info on URL
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After switching to join kind=innerunique without re-selecting the latest EmailUrlInfo_TimeGenerated per indicator/URL, the chosen EmailUrlInfo_ row becomes effectively arbitrary when multiple email-URL records exist for the same Url. This can yield nondeterministic results and may associate older/newer email evidence incorrectly. Consider deduplicating EmailUrlInfo_ (e.g., arg_max(EmailUrlInfo_TimeGenerated, *) by Url) before the join, or restoring a post-join summarize arg_max(EmailUrlInfo_TimeGenerated, *) ... to make the selection deterministic.

Copilot uses AI. Check for mistakes.
Comment on lines 53 to 56
| summarize arg_max(TimeGenerated, *) by Id, ObservableValue // Get latest record per Id FIRST
| where IsActive == true // THEN filter for active indicators
| extend TrafficLightProtocolLevel = AdditionalFields.TLPLevel
| extend DomainName = tolower(ObservableValue) // Convert domain names to lowercase
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the domain branch, tolower(ObservableValue) is applied after deduplication/grouping by ObservableValue. If the same domain appears with different casing, it will be treated as distinct groups during summarize, potentially producing duplicates downstream even though DomainName is later normalized. Normalize first (create a lowercased domain column) and then summarize ... by Id, <lowercasedDomain> (or summarize by the normalized key you actually use later).

Suggested change
| summarize arg_max(TimeGenerated, *) by Id, ObservableValue // Get latest record per Id FIRST
| where IsActive == true // THEN filter for active indicators
| extend TrafficLightProtocolLevel = AdditionalFields.TLPLevel
| extend DomainName = tolower(ObservableValue) // Convert domain names to lowercase
| extend DomainName = tolower(ObservableValue) // Normalize domain names to lowercase before summarization
| summarize arg_max(TimeGenerated, *) by Id, DomainName // Get latest record per Id and normalized domain
| where IsActive == true // THEN filter for active indicators
| extend TrafficLightProtocolLevel = AdditionalFields.TLPLevel

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@rahul0216 rahul0216 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address discussed pointers.

@v-atulyadav v-atulyadav added Solution Solution specialty review needed Analytic Rules labels Mar 25, 2026
Reorder and clean up ThreatIntelIndicators handling for URL and domain matches. For URLs, summarize (arg_max) by Id/ObservableValue first, then filter for active/unexpired indicators, and normalize ObservableValue to lowercase before joining with EmailUrlInfo_. For domains, normalize DomainName to lowercase earlier and correct the projected active field name to IsActive. These changes ensure joins use the latest, active indicators and consistent casing for reliable matching.
rahul0216
rahul0216 previously approved these changes Mar 25, 2026
Copy link
Copy Markdown
Collaborator

@rahul0216 rahul0216 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Release update for Threat Intelligence (NEW): add packaged artifact 3.0.16.zip and bump solution/template versions to 3.0.16. Update analytic rule 6 version to 1.0.6 and refine its KQL query logic to use arg_max(TimeGenerated,*) and improved filtering for URL/domain indicators. Standardize/roll back several resource apiVersion values (various Microsoft.OperationalInsights and Microsoft.SecurityInsights providers) and update resource descriptions and field mappings formatting across templates.
@v-atulyadav v-atulyadav merged commit 3a5fea0 into master Mar 25, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Analytic Rules Solution Solution specialty review needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants