You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, rules ("condition") for tagging assumes input attributes to be binary (i.e. strictly true or false, no confidence value), confidence of output can only be set by explicitly multiplying input values by numbers.
It should be added a support for inputs with confidence value set. So, a term in a condition based on an attribute with confidence c will also have confidence c (which can be further modified by arithmetic operations, of course). For example:
Record contains: "hostname_class": {v: "dynamic", c: 0.8}
Tag condition: 0.9*('dynamic' in hostname_class)
Result: tag is set with confidence = 0.72
Of course, it the input attribute is simple, i.e. it has no confidence value assigned, confidence=1 is assumed.
Confidence of a combination of individual terms in a condition is computed as follows:
Arithmetic operations behave normally.
Logical and: A and B = A * B
Example:
Rule: ('dynamic' in hostname_class) and bl.tor --> tag "dynamic_tor"
Inputs: ('dynamic' in hostname_class).c = 0.9, bl.tor = 1.0 (implicitly, since blacklists have no confidence values)
Result: dynamic_tor.c = 0.9
Logical or: A or B = 1 - ((1 - A) * (1 - B))
Example:
Rule: 0.9*('dynamic' in hostname_class) or 0.2*('dsl' in hostname_class) --> tag "dynamic"
Inputs: ('dynamic' in hostname_class).c = 1.0, ('dsl' in hostname_class).c = 1.0
Result: dynamic.c = 0.9*1.0 or 0.2*1.0 = 0.9 or 0.2 = 1 - (0.1 * 0.8) = 0.92
See Data model for proposed specification of how values with confidence should be stored in database. The tagging scheme should automatically recognize if given input value is plain or with confidence (by its data type and presence of ".c" attribute).
Note: The definition of confidnce combinations may change, I'll need to prepare some real use-cases and find out if these operations are OK. So, do other issues first. I wrote this so you have an idea what you will work on in the future.
The text was updated successfully, but these errors were encountered:
Currently, rules ("condition") for tagging assumes input attributes to be binary (i.e. strictly true or false, no confidence value), confidence of output can only be set by explicitly multiplying input values by numbers.
It should be added a support for inputs with confidence value set. So, a term in a condition based on an attribute with confidence
c
will also have confidencec
(which can be further modified by arithmetic operations, of course). For example:"hostname_class": {v: "dynamic", c: 0.8}
0.9*('dynamic' in hostname_class)
confidence = 0.72
Of course, it the input attribute is simple, i.e. it has no confidence value assigned,
confidence=1
is assumed.Confidence of a combination of individual terms in a condition is computed as follows:
A and B = A * B
('dynamic' in hostname_class) and bl.tor --> tag "dynamic_tor"
('dynamic' in hostname_class).c = 0.9
,bl.tor = 1.0
(implicitly, since blacklists have no confidence values)dynamic_tor.c = 0.9
A or B = 1 - ((1 - A) * (1 - B))
0.9*('dynamic' in hostname_class) or 0.2*('dsl' in hostname_class) --> tag "dynamic"
('dynamic' in hostname_class).c = 1.0
,('dsl' in hostname_class).c = 1.0
dynamic.c = 0.9*1.0 or 0.2*1.0 = 0.9 or 0.2 = 1 - (0.1 * 0.8) = 0.92
See Data model for proposed specification of how values with confidence should be stored in database. The tagging scheme should automatically recognize if given input value is plain or with confidence (by its data type and presence of ".c" attribute).
Note: The definition of confidnce combinations may change, I'll need to prepare some real use-cases and find out if these operations are OK. So, do other issues first. I wrote this so you have an idea what you will work on in the future.
The text was updated successfully, but these errors were encountered: