Dev.numeric group by #18

tiberiu44 · 2022-06-15T08:49:49Z

This update adds group_by support for multinomial, numeric and combiner fields

Description

Added an optional field (group_by) that can be specified for MultinomialField, MultinomialFieldCombiner and NumericalField, which changes the behaviour of OSAS to build the statistical models around mini-groups of data. This enables better statistical modeling.

Related Issue

This PR is based on an internal change request

Motivation and Context

Previously, OSAS had issues modeling and tagging anomalies for under-represented classes. For instance, if you would try to build a model for login anomalies based on username and origin country (MultinomialField), or average CPU/memory usage based on host (NumericalField), you would find it difficult to cope for users that have a small number of events, when compared to the other users. An example could be a dataset, with 99 users that each have 5000 events and a user with only 10 events. Though all his login could originate from the same country, they will always be tagged as anomalies, because they are under-represented in the overall dataset. With the group_by option, you can simply group the login country based on the username and the statistical models will be relative per user, thus better modeling anomalies.

How Has This Been Tested?

This change has been validated by our TH team internally, using real datasets. We checked that the statistical model are correctly build and that the tags are assigned as expeted.

Screenshots (if appropriate):

Types of changes

[-] Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
[-] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

I have signed the Adobe Open Source CLA.
My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
[-] I have added tests to cover my changes.
All new and existing tests passed.

…combiner label generators

tiberiu44 and others added 4 commits June 9, 2022 18:04

Group by added

812e72b

Handling of rare keys for numeric group_by

3075551

fixed typo bug in numericfield

46cf478

Updated documentation for group_by field on numeric, multinomial and …

72f894c

…combiner label generators

tiberiu44 requested a review from wilsontang06 June 15, 2022 08:49

tiberiu44 assigned wilsontang06 Jun 15, 2022

wilsontang06 merged commit 0d15387 into main Jun 15, 2022

wilsontang06 deleted the dev.numeric_group_by branch June 15, 2022 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev.numeric group by #18

Dev.numeric group by #18

tiberiu44 commented Jun 15, 2022

Dev.numeric group by #18

Dev.numeric group by #18

Conversation

tiberiu44 commented Jun 15, 2022

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist: