Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AtomFilterMatchAction YAML support #1063

Closed
landauermax opened this issue Mar 14, 2022 · 8 comments · Fixed by #1073
Closed

AtomFilterMatchAction YAML support #1063

landauermax opened this issue Mar 14, 2022 · 8 comments · Fixed by #1073
Assignees
Labels
enhancement New feature or request high High priority issue
Milestone

Comments

@landauermax
Copy link
Contributor

There should be a way to use a MatchRule so that only logs that match are forwarded to a specific detector, using the AtomFilterMatchAction. This can be done in python configs, but not in yaml configs. Also, tests and documentation is missing.

@landauermax landauermax added the enhancement New feature or request label Mar 14, 2022
@landauermax landauermax added this to the 2.5.0 milestone Mar 14, 2022
@landauermax landauermax added the high High priority issue label Mar 17, 2022
@ernstleierzopf
Copy link
Contributor

Could you please specify how this config looks in python? It should be possible to use all MatchRules in the yaml configs.
Please look at this excerpt from the demo-config.yml:

        - type: PathExistsMatchRule
          id: path_exists_match_rule1
          path: "/model/LoginDetails/PastTime/Time/Minutes"
        - type: DebugMatchRule
          id: debug_match_rule
          debug_mode: True
        - type: PathExistsMatchRule
          id: path_exists_match_rule2
          path: "/model/LoginDetails"
        - type: ValueMatchRule
          id: value_match_rule
          path: "/model/LoginDetails/Username"
          value: "root"
        - type: NegationMatchRule
          id: negation_match_rule1
          sub_rule: "value_match_rule"
        - type: NegationMatchRule
          id: negation_match_rule2
          sub_rule: "path_exists_match_rule2"
        - type: AndMatchRule
          id: and_match_rule1
          sub_rules:
            - "path_exists_match_rule1"
            - "negation_match_rule1"
            - "debug_match_rule"
        - type: AndMatchRule
          id: and_match_rule2
          sub_rules:
            - "negation_match_rule1"
            - "path_exists_match_rule2"
            - "debug_match_rule"
        - type: OrMatchRule
          id: or_match_rule
          sub_rules:
            - "and_match_rule1"
            - "and_match_rule2"
            - "negation_match_rule2"
        - type: AllowlistViolationDetector
          id: Allowlist
          allowlist_rules:
            - "or_match_rule"

Is this the use case you need?

@landauermax
Copy link
Contributor Author

I do not think the example you provide covers the required case as it does not contain any match actions. I was able to achieve the desired result in python config as follows:

config_properties = {}
config_properties['LogResourceList'] = ['file:///home/ubuntu/test.log']
config_properties['AminerUser'] = 'aminer'
config_properties['AminerGroup'] = 'aminer'

learn_mode = True

def build_analysis_pipeline(analysis_context):
    from aminer.parsing.SequenceModelElement import SequenceModelElement
    from aminer.parsing.FixedDataModelElement import FixedDataModelElement
    from aminer.parsing.VariableByteDataModelElement import VariableByteDataModelElement

    parsing_model = SequenceModelElement('model', [
        VariableByteDataModelElement("first", b"abcdefghijklmnopqrstuvwxyz"),
        FixedDataModelElement("space", b" "),
        VariableByteDataModelElement("second", b"abcdefghijklmnopqrstuvwxyz")
        ])

    from aminer.analysis import AtomFilters
    atom_filter = AtomFilters.SubhandlerFilter(None)
    anomaly_event_handlers = []
    from aminer.input.SimpleByteStreamLineAtomizerFactory import SimpleByteStreamLineAtomizerFactory
    analysis_context.atomizer_factory = SimpleByteStreamLineAtomizerFactory(
        parsing_model, [atom_filter], anomaly_event_handlers, default_timestamp_paths='/model/accesslog/time')
    from aminer.analysis.UnparsedAtomHandlers import SimpleUnparsedAtomHandler
    atom_filter.add_handler(SimpleUnparsedAtomHandler(anomaly_event_handlers), stop_when_handled_flag=True)
    from aminer.analysis.NewMatchPathDetector import NewMatchPathDetector
    new_match_path_detector = NewMatchPathDetector(analysis_context.aminer_config, anomaly_event_handlers, auto_include_flag=learn_mode)
    analysis_context.register_component(new_match_path_detector, component_name=None)
    atom_filter.add_handler(new_match_path_detector)
    value_detector_subhandler = AtomFilters.SubhandlerFilter(None)
    from aminer.analysis.Rules import AtomFilterMatchAction
    afma = AtomFilterMatchAction([value_detector_subhandler])
    from aminer.analysis.Rules import ValueMatchRule
    vmr = ValueMatchRule("/model/first", b"b", afma)
    from aminer.analysis.NewMatchPathValueDetector import NewMatchPathValueDetector
    new_match_path_value_detector = NewMatchPathValueDetector(analysis_context.aminer_config, ["/model/second"], anomaly_event_handlers, auto_include_flag=learn_mode)
    analysis_context.register_component(new_match_path_value_detector, component_name=None)
    value_detector_subhandler.add_handler(new_match_path_value_detector)
    from aminer.analysis.AllowlistViolationDetector import AllowlistViolationDetector
    avd = AllowlistViolationDetector(analysis_context.aminer_config, [vmr], [])
    atom_filter.add_handler(avd)

    from aminer.events.StreamPrinterEventHandler import StreamPrinterEventHandler
    anomaly_event_handlers.append(StreamPrinterEventHandler(analysis_context))

The log file:

a x
a y
b u
b v
a z

The value match ensures that only the values of the /model/second path are learned where the /model/first path has a value of "b". Log events where the /model/first is "a" (or any other value than "b") are not analyzed by the value detector, which is exactly what I want. This is also reflected in the output and persistency of the value detector (the path detector does not matter for this use-case):

root@ubuntu-VirtualBox:/home/ubuntu# aminer -C -c config.py
2022-03-28 10:16:35 New path(es) detected
NewMatchPathDetector: "NewMatchPathDetector0" (1 lines)
  /model: a x
  /model/first: a
  /model/space:  
  /model/second: x
['/model', '/model/first', '/model/space', '/model/second']
a x

2022-03-28 10:16:35 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector1" (1 lines)
  {'/model/second': 'u'}
b u

2022-03-28 10:16:35 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector1" (1 lines)
  {'/model/second': 'v'}
b v

^Caminer: caught signal, shutting down
aminer: caught signal, shutting down
root@ubuntu-VirtualBox:/home/ubuntu# cat /var/lib/aminer/NewMatchPathValueDetector/Default 
["bytes:v", "bytes:u"]

@landauermax
Copy link
Contributor Author

I should add that I had to use the allowlist detector with an empty list for event handlers, which is maybe not ideal. We could also consider adding a new detector that only has the purpose to trigger rule matches and connected match actions.

@ernstleierzopf
Copy link
Contributor

Thank you for preparing such a great code example!
I have added a pull request to fix the issue here: #1073

Therefore I have created the same config within yaml:

AminerUser: 'aminer'  # optional default: aminer
AminerGroup: 'aminer' # optional default: aminer
LogResourceList:
        - 'file:///home/user/Downloads/log.txt'
LearnMode: True

Parser:
        - id: first
          type: VariableByteDataModelElement
          name: 'first'
          args: 'abcdefghijklmnopqrstuvwxyz'

        - id: space
          type: FixedDataModelElement
          name: 'space'
          args: ' '

        - id: second
          type: VariableByteDataModelElement
          name: 'second'
          args: 'abcdefghijklmnopqrstuvwxyz'

        - id: model
          type: SequenceModelElement
          name: 'model'
          start: True
          args:
            - first
            - space
            - second

Input:
        timestamp_paths: ["/model/DailyCron/DTM"]
        adjust_timestamps: True

Analysis:
        - type: NewMatchPathValueDetector
          id: NewMatchPathValueDetector1
          paths:
            - "/model/second"

        - type: AtomFilterMatchAction
          id: afma
          subhandler_list:
            - NewMatchPathValueDetector1
          stop_when_handled_flag: True
          delete_components: True

        - type: ValueMatchRule
          id: vmr
          path: "/model/first"
          value: "b"
          match_action: afma

        - type: AllowlistViolationDetector
          id: Allowlist
          allowlist_rules:
            - vmr
          suppress: True

This produces exactly the same output like you have posted. (in the new branch)
I have also added a new parameter for the AtomFilterMatchAction: delete_components. With this parameter the components are removed from the default atom_filter.subhandler_list, when added to the AtomFilterMatchAction.subhandler_list. In python you have used a different SubhandlerFilter - this is not possible in yaml at the moment and it is also not necessary IMO.

@landauermax
Copy link
Contributor Author

Thanks, looks good! I noticed that the default value for the delete_components parameter is False. Is it correct that if this parameter is False, we will get an anomaly as usual from the value detector for all new values in /model/second, and then we get another anomaly for the new values if the value match rule triggers the match action? So we would have the anomalies twice in case that /model/first="b"? Because in this case, I would say that the default value of delete_components should be True, because if we use a match rule then usually we want to achieve that only anomalies are generated if the match rule triggers, and otherwise not.

Also, please update the documentation in the CONFIGURATION.rst to include the new parameter and functionality, maybe also add the relevant lines of the configuration as an example.

@ernstleierzopf
Copy link
Contributor

Yes, that's how it works. Okay I will update the value to True and add the Documentation.
Today I do not have time any more, but I will look into that tomorrow.

@landauermax
Copy link
Contributor Author

Ok, thanks!

@ernstleierzopf
Copy link
Contributor

I have changed the default value and added Documentation for all MatchActions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high High priority issue
Projects
None yet
2 participants