Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev.numeric group by #18

Merged
merged 4 commits into from Jun 15, 2022
Merged

Dev.numeric group by #18

merged 4 commits into from Jun 15, 2022

Conversation

tiberiu44
Copy link
Collaborator

This update adds group_by support for multinomial, numeric and combiner fields

Description

Added an optional field (group_by) that can be specified for MultinomialField, MultinomialFieldCombiner and NumericalField, which changes the behaviour of OSAS to build the statistical models around mini-groups of data. This enables better statistical modeling.

Related Issue

This PR is based on an internal change request

Motivation and Context

Previously, OSAS had issues modeling and tagging anomalies for under-represented classes. For instance, if you would try to build a model for login anomalies based on username and origin country (MultinomialField), or average CPU/memory usage based on host (NumericalField), you would find it difficult to cope for users that have a small number of events, when compared to the other users. An example could be a dataset, with 99 users that each have 5000 events and a user with only 10 events. Though all his login could originate from the same country, they will always be tagged as anomalies, because they are under-represented in the overall dataset. With the group_by option, you can simply group the login country based on the username and the statistical models will be relative per user, thus better modeling anomalies.

How Has This Been Tested?

This change has been validated by our TH team internally, using real datasets. We checked that the statistical model are correctly build and that the tags are assigned as expeted.

Screenshots (if appropriate):

Types of changes

  • [-] Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • [-] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have signed the Adobe Open Source CLA.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • [-] I have added tests to cover my changes.
  • All new and existing tests passed.

@wilsontang06 wilsontang06 merged commit 0d15387 into main Jun 15, 2022
@wilsontang06 wilsontang06 deleted the dev.numeric_group_by branch June 15, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants