Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministically Randomized Feedback for A/B Testing #98

Open
acbart opened this issue Jun 9, 2023 · 2 comments
Open

Deterministically Randomized Feedback for A/B Testing #98

acbart opened this issue Jun 9, 2023 · 2 comments
Labels
enhancement New feature or request Pedal Core Issues pertaining to Reports, Submissions, and other core infrastructure

Comments

@acbart
Copy link
Collaborator

acbart commented Jun 9, 2023

We desire to have different students exposed to different kinds of feedback for research purposes. The system should support this easily and cleanly, both in terms of making the feedback and tracking the feedback.

My general thought here is that there is a core setting like set_participant_group or something that can either take an explicit group, or a seed value (e.g., a user's unique ID) to assign the user to the participant group. I suppose we'll need to know how many groups there should be (perhaps default to two?).

What kind of variations should be possible with this system?

  1. At the minimum, you should be able to randomly have users get slightly different messages depending on their group.
  2. There are probably cases where feedback should be muted or perhaps not even tested depending on their group.
  3. There are cases where the actual logic of the FF should probably change internally.

For (3), I think that we will need some kind of convenient function for checking which group this user is in and behaving differently. Maybe something like is_group('A') or something?

For (1), I'm currently thinking that Feedback functions should have a helper class method like add_alternate_message(group, message) or add_variant_message_template(group, message_template). This works well for anything subclassing Feedback. For function literal feedbacks, it's a bit trickier. We should probably inventory those and see what their deal is anyway. Do we have them?

For (2), I guess a similar mechanism could work. The Feedback class could have a mute_for_variant or deactivate_for_variant method..? Or perhaps this should be a more general mechanism like this:

syntax_error.for_variant(group='B', message_template='Oops you made a mistake')
type_error.for_variant(group='C', score="-10%")

Another question: when does the system decide to "pick" the appropriate message? Given the way that some parts delay their decision making to the very end, we probably need some flexibility. It might be dependent on the kind of feedback and the nature of the variation. But I think we pick either "earliest as possible" or "latest as possible" and stick with it. My gut says "earliest" will be easier to reason about: once you've setup A/B groups, you want to start assuming that any feedback is what it is. I don't think it's good to try to do reflective feedback, anyway.

Major point of order, we need to lock down the terminology for this. Are they "participant groups"? Are these "variations"? What is the concept we're building on, from the theoretical model?

@acbart acbart added enhancement New feature or request Pedal Core Issues pertaining to Reports, Submissions, and other core infrastructure labels Jun 9, 2023
@acbart
Copy link
Collaborator Author

acbart commented Jun 9, 2023

Default group names could just be 'A', 'B', 'C', etc., but with the option to override to more descriptive names (e.g., "regular_feedback", "bad_feedback", "native_language_feedback"). Or instead of names, you can specify the number of groups (defaulting to 2, in the spirit of A/B testing).

With things like gently and other instructor-level messages, I think you should be able to pass in a list of strings, and it will match the groups in order. So 'A' gets the first one, 'B' gets the second, etc.

We definitely should have a weighting system. Users might want to say "only 10% of users should be in Group B".

We also probably need a way to let users override the random value selection for the group assignment? My thought is that we want it to be easy for an instructor to have a "custom setup file" for their environment that they can use generically for experimental runs.

@acbart
Copy link
Collaborator Author

acbart commented Jun 9, 2023

So far, natural conversation flow suggests the following terms:

  • groups
  • variants
  • participants
  • variations
  • alternates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Pedal Core Issues pertaining to Reports, Submissions, and other core infrastructure
Projects
None yet
Development

No branches or pull requests

1 participant