Skip to content

Support ordinal types #10

@anjsimmo

Description

@anjsimmo

The code for determining the similarity of two condition thresholds is shown below:

PERMISSIBLE_DELTA = 0.1
…
def condition_similarity(condition1: Condition, condition2: Condition):
    # Different attributes
    if condition1.attribute != condition2.attribute:
        return 0

    # Different operators
    # TODO: Extend???
    if condition1.operator != condition2.operator:
        return 0

    # Handle <= as a special case as per paper
    if condition1.operator == Operator.LE and condition2.operator == Operator.LE:
        t = abs(PERMISSIBLE_DELTA * condition1.threshold)
        x = abs(condition1.threshold - condition2.threshold)
        if x == 0:
            return 1
        return 1 - (x / t) if x < t else 0
    return 1

(The original code also contained a bug in the calculation of the tollerance, t, which was fixed in PR #6)

This threshold logic is not appropriate in case of ordinal numbers. For example, the UCI Poker Hand dataset represents the rank of cards as numbers between 1-13. As PERMISSIBLE_DELTA = 1.1, a Queen (12) is has a threshold, t, of 12 * 0.1 = 1.2, which means it would be considered similar to a Jack (11) or King (13), but an Ace (1) would have a threshold, t, of 1 * 0.1 = 0.1 so wouldn’t be considered similar to any other card.

The similar_tree module needs to be modified to allow a list of attributes to be treated as ordinal numbers, and tollerance threshold logic adjusted accordingly. The condition similarity should be 1 if the thresholds represent the same partitioning (e.g. <= 2.0 is the same as <= 2.9 as they both split {1, 2} vs {3, 4, ..}), and 0 otherwise.

Secondly, the code only deals with the case of two <= operators, not two > operators. In the case of two > operators it will return 1 (perfect similarity) even if the thresholds differ.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions