explore_eval gives systematically different average loss than cb_explore_adf against the same "simulator" data #2621

maxpagels · 2020-11-01T19:04:35Z

Describe the bug

explore_eval gives systematically different average loss than cb_explore_adf on a bandit dataset synthetically constructed from a supervised dataset.

I've created a dataset as follows:

Ten actions per round
All share 2 features
Actions 0-4 always receive a cost of -1, actions 4-9 always receive a cost of 1
All actions are recorded with probability 0.1 (10%)
In other words, the CB module should learn that shared features don't matter here, only the indicator feature per action

See attached dataset: ldf-test.txt.gz

To Reproduce

Steps to reproduce the behavior:

Run vw --cb_explore_adf -d ldf-test.txt.gz --cb_type ips --epsilon 1.0
Observe average loss of around 0 (because the system is exploring 100% of the time, and half of feedback is +1 and the other -1, this is expected)
Run vw --explore_eval -d ldf-test.txt.gz --cb_type ips --epsilon 1.0
Observe loss of -0.098420

Expected behavior

I'd expect the loss when running explore_eval with 100% random exploration (--epsilon 1.0) in this particular case to give a loss similar to --cb_explore_adf with the same exploration. Interestingly, running vw --cb_explore_adf -d ldf-test.txt.gz --cb_type ips --epsilon 1.0 with --progress 1 and looking at the printout, it seems that explore_eval accepts less of the examples with +1 cost and more with -1 cost, and this proportion not being 50/50 would explain the behaviour, though the root cause is a mystery to me.

Observed Behavior

Losses are different, and in this case I would expect them to be approximately the same.

Changing costs to e.g. 0 and 1 doesn't change things (avg. loss is expected to be 0.5, which it is with --cb_explore_adf but not --explore_eval. It seems like the average loss in explore_eval is either systematically overestimating, or then the loss interpretation is different. If the latter, there could perhaps be some explanation as to why this happens, or a "for dummies" tutorial explaining the rejection sampling approach i think is used under the hood.

Environment

Version: 8.8.1 (git commit: 5ff219e)
OS: Mac Catalina
Reproduced via CLI

The text was updated successfully, but these errors were encountered:

cheng-tan · 2020-11-06T15:07:01Z

Hi @maxpagels, in this case, the dataset has uniform probabilities for each played action (1/#actions) and the (stochastic) policy being evaluated is the uniform distribution (--epsilon 1), explore_eval should return the on-policy estimate which is the average of the costs in the file. With the provided dataset, average loss of explore_eval should equal to 0.00138. We found a bug during investigating this issue, will get it fixed shortly.

maxpagels · 2020-11-06T16:16:54Z

@cheng-tan thanks! wasn't just me imagining that the loss should equal approx 0 in this case.

jackgerrits · 2020-11-12T17:00:13Z

Looks like this was fixed in #2631, which made it into the 8.9.0 release. Let us know if there are any issues here still.

maxpagels added the Bug Bug in learning semantics, critical by default label Nov 1, 2020

jackgerrits closed this as completed Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explore_eval gives systematically different average loss than cb_explore_adf against the same "simulator" data #2621

explore_eval gives systematically different average loss than cb_explore_adf against the same "simulator" data #2621

maxpagels commented Nov 1, 2020 •

edited

cheng-tan commented Nov 6, 2020

maxpagels commented Nov 6, 2020

jackgerrits commented Nov 12, 2020

explore_eval gives systematically different average loss than cb_explore_adf against the same "simulator" data #2621

explore_eval gives systematically different average loss than cb_explore_adf against the same "simulator" data #2621

Comments

maxpagels commented Nov 1, 2020 • edited

Describe the bug

To Reproduce

Expected behavior

Observed Behavior

Environment

cheng-tan commented Nov 6, 2020

maxpagels commented Nov 6, 2020

jackgerrits commented Nov 12, 2020

maxpagels commented Nov 1, 2020 •

edited