Deterministically Randomized Feedback for A/B Testing #98

acbart · 2023-06-09T13:49:51Z

We desire to have different students exposed to different kinds of feedback for research purposes. The system should support this easily and cleanly, both in terms of making the feedback and tracking the feedback.

My general thought here is that there is a core setting like set_participant_group or something that can either take an explicit group, or a seed value (e.g., a user's unique ID) to assign the user to the participant group. I suppose we'll need to know how many groups there should be (perhaps default to two?).

What kind of variations should be possible with this system?

At the minimum, you should be able to randomly have users get slightly different messages depending on their group.
There are probably cases where feedback should be muted or perhaps not even tested depending on their group.
There are cases where the actual logic of the FF should probably change internally.

For (3), I think that we will need some kind of convenient function for checking which group this user is in and behaving differently. Maybe something like is_group('A') or something?

For (1), I'm currently thinking that Feedback functions should have a helper class method like add_alternate_message(group, message) or add_variant_message_template(group, message_template). This works well for anything subclassing Feedback. For function literal feedbacks, it's a bit trickier. We should probably inventory those and see what their deal is anyway. Do we have them?

For (2), I guess a similar mechanism could work. The Feedback class could have a mute_for_variant or deactivate_for_variant method..? Or perhaps this should be a more general mechanism like this:

syntax_error.for_variant(group='B', message_template='Oops you made a mistake')
type_error.for_variant(group='C', score="-10%")

Another question: when does the system decide to "pick" the appropriate message? Given the way that some parts delay their decision making to the very end, we probably need some flexibility. It might be dependent on the kind of feedback and the nature of the variation. But I think we pick either "earliest as possible" or "latest as possible" and stick with it. My gut says "earliest" will be easier to reason about: once you've setup A/B groups, you want to start assuming that any feedback is what it is. I don't think it's good to try to do reflective feedback, anyway.

Major point of order, we need to lock down the terminology for this. Are they "participant groups"? Are these "variations"? What is the concept we're building on, from the theoretical model?

The text was updated successfully, but these errors were encountered:

acbart · 2023-06-09T13:57:13Z

Default group names could just be 'A', 'B', 'C', etc., but with the option to override to more descriptive names (e.g., "regular_feedback", "bad_feedback", "native_language_feedback"). Or instead of names, you can specify the number of groups (defaulting to 2, in the spirit of A/B testing).

With things like gently and other instructor-level messages, I think you should be able to pass in a list of strings, and it will match the groups in order. So 'A' gets the first one, 'B' gets the second, etc.

We definitely should have a weighting system. Users might want to say "only 10% of users should be in Group B".

We also probably need a way to let users override the random value selection for the group assignment? My thought is that we want it to be easy for an instructor to have a "custom setup file" for their environment that they can use generically for experimental runs.

acbart · 2023-06-09T13:58:07Z

So far, natural conversation flow suggests the following terms:

groups
variants
participants
variations
alternates

acbart added enhancement New feature or request Pedal Core Issues pertaining to Reports, Submissions, and other core infrastructure labels Jun 9, 2023

acbart mentioned this issue Jul 2, 2023

exp: openai integration THRALLab/pedal-kennel#12

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministically Randomized Feedback for A/B Testing #98

Deterministically Randomized Feedback for A/B Testing #98

acbart commented Jun 9, 2023

acbart commented Jun 9, 2023

acbart commented Jun 9, 2023

Deterministically Randomized Feedback for A/B Testing #98

Deterministically Randomized Feedback for A/B Testing #98

Comments

acbart commented Jun 9, 2023

acbart commented Jun 9, 2023

acbart commented Jun 9, 2023