Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging should take into account confidence of input attributes #7

Closed
vaclavbartos opened this issue Jul 15, 2017 · 1 comment
Closed

Comments

@vaclavbartos
Copy link
Collaborator

Currently, rules ("condition") for tagging assumes input attributes to be binary (i.e. strictly true or false, no confidence value), confidence of output can only be set by explicitly multiplying input values by numbers.

It should be added a support for inputs with confidence value set. So, a term in a condition based on an attribute with confidence c will also have confidence c (which can be further modified by arithmetic operations, of course). For example:

  • Record contains: "hostname_class": {v: "dynamic", c: 0.8}
  • Tag condition: 0.9*('dynamic' in hostname_class)
  • Result: tag is set with confidence = 0.72

Of course, it the input attribute is simple, i.e. it has no confidence value assigned, confidence=1 is assumed.

Confidence of a combination of individual terms in a condition is computed as follows:

  • Arithmetic operations behave normally.
  • Logical and: A and B = A * B
    • Example:
      • Rule: ('dynamic' in hostname_class) and bl.tor --> tag "dynamic_tor"
      • Inputs: ('dynamic' in hostname_class).c = 0.9, bl.tor = 1.0 (implicitly, since blacklists have no confidence values)
      • Result: dynamic_tor.c = 0.9
  • Logical or: A or B = 1 - ((1 - A) * (1 - B))
    • Example:
      • Rule: 0.9*('dynamic' in hostname_class) or 0.2*('dsl' in hostname_class) --> tag "dynamic"
      • Inputs: ('dynamic' in hostname_class).c = 1.0, ('dsl' in hostname_class).c = 1.0
      • Result: dynamic.c = 0.9*1.0 or 0.2*1.0 = 0.9 or 0.2 = 1 - (0.1 * 0.8) = 0.92

See Data model for proposed specification of how values with confidence should be stored in database. The tagging scheme should automatically recognize if given input value is plain or with confidence (by its data type and presence of ".c" attribute).

Note: The definition of confidnce combinations may change, I'll need to prepare some real use-cases and find out if these operations are OK. So, do other issues first. I wrote this so you have an idea what you will work on in the future.

@vaclavbartos
Copy link
Collaborator Author

Closing ages old issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants