`AnchorBase` coverage update for proposed anchor after computation. #914

RobertSamoilescu · 2023-04-24T13:55:20Z

Before describing the issue that I found, I will just define the state key names so we know what they mean:

state['t_idx'][anchor] - represents a list of indices from the sampled data where the anchor applies.
state['t_nsamples'][anchor] - represents the length of the list of indices from the sampled data where the anchor applies. Basically len(state['t_idx'][anchor].
state['t_positive'][anchor] - represents the number of instances from the sampled data which have the same label as the instance to be explained.
state['t_order'][anchor] - ordered instance of predicates. I believe that it can be used to compute the contributions as discussed above.
state['t_coverage_idx'][anchor] - list of indices from the coverage dataset where the anchor applies.
state['t_coverage'][anchor] - the actual coverage of the anchor. In other words it is the ratio between the state['t_coverage_idx'][anchor] and length of the coverage dataset.

Note that the coverage dataset it is fixed and sampled at the beginning of the anchor_beam algorithm (see link here). The coverage dataset is constructed by sampling with replacement the original dataset. we can observe that the coverage of an anchor (i.e., state['t_coverge'][anchor] ) is modified by the kllucb algorithm here.

Tracing it to the source, we can observe that the kllucb calls draw_samples method in multiple places (here and here) which eventually calls update_state here. The update_state function should update the quantities relate with the computation of the precision (that's what the kllucb algorithm is concerned with - informally, finding the arm with the best precision), but here is a line which updates the coverage too, computed by default in the sampling function. The line which updates the coverage is here.

In my opinion, here is no reason to update coverage there since as mentioned before, the coverage is computed on a fixed dataset when the anchor is constructed (see here and here). Commenting those lines should fix the error.

To explain with an example what was happening:

the algorithm select first feature 5, but the precision constraint is not satisfied. Thus, best_coverage = -1
then the algorithm selects feature 7, thus having the anchor (5, 7). In this case the coverage is 0.145.
for (5, 7) the precision constraint is satisfied, but the kllucb algorithm modifies the coverage to 0.139 because of the computation of the newly sampled data.
at the next step, the anchor proposed is (5, 7, 11) which also has a coverage of 0.145. Because the 0.145 > 0.139, the algorithm selects (5, 7, 11) instead of (5, 7). Of course the precision constraint is satisfied in this case too.

The text was updated successfully, but these errors were encountered:

jklaise · 2023-04-26T15:25:22Z

Closed via #915.

jklaise · 2023-04-28T13:11:08Z

Closed via #919.

RobertSamoilescu mentioned this issue Apr 24, 2023

Fixed AnchorBased coverage updated for proposed anchors #915

Merged

jklaise closed this as completed Apr 26, 2023

jklaise reopened this Apr 26, 2023

jklaise closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AnchorBase` coverage update for proposed anchor after computation. #914

`AnchorBase` coverage update for proposed anchor after computation. #914

RobertSamoilescu commented Apr 24, 2023

jklaise commented Apr 26, 2023

jklaise commented Apr 28, 2023

AnchorBase coverage update for proposed anchor after computation. #914

AnchorBase coverage update for proposed anchor after computation. #914

Comments

RobertSamoilescu commented Apr 24, 2023

jklaise commented Apr 26, 2023

jklaise commented Apr 28, 2023

`AnchorBase` coverage update for proposed anchor after computation. #914

`AnchorBase` coverage update for proposed anchor after computation. #914