Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to allow repeats #90

Closed
gureckis opened this issue Apr 23, 2014 · 14 comments
Closed

option to allow repeats #90

gureckis opened this issue Apr 23, 2014 · 14 comments

Comments

@gureckis
Copy link
Member

currently psiturk defaults to database-level blocking of workers such that a worker who appears in the database can't take the task again. in some cases this is not desired behavior. we should create a config option 'allow_repeat' which toggles this. also, might be good to allow option 'always_show_instructions' because sometimes if a worker has already done a task once they don't need to read the instructions part again.

@lindauer
Copy link

I tried to make this change (at least in a hacky hard-coded way) last night and my changes seemed to have no effect. Upon further investigation, it seems like the blocking might be occurring via a copy of experiment.py on the ad server, rather than the local one that I am changing. Is that an accurate assessment? If so, is there a way to bypass the ad server or otherwise test a change like this?

@deargle
Copy link
Collaborator

deargle commented Feb 8, 2017

Reopening because it's still not implemented on the psiTurk ad server.

@deargle deargle reopened this Feb 8, 2017
@gureckis
Copy link
Member Author

gureckis commented Feb 9, 2017

moving discussion of this over here.

i added a new field to the psiturk cloud for 'allow_repeats'. i don't think it broke any thing but let me know (please test your branch @deargle). it is a property of ads both in the sandbox and regular ad server now. however the sandbox ad server basically ignores this field because it always lets you repeat (helpful for debugging). the live ad server should assume if you don't specify that you want to block repeats.

now the question is what the behavior should be concerning allow repeats. the logic in the section of the code can be kind of complex with so many different conditions that can come up. in addition the status codes are impacted by the hitId/assignementId values as well). what do people want?

quick summary: I think there are four conditions

  1. STARTED = subject started the experiment, meaning they got past the instructions part. psiturk includes a way to indicate this because often instructions include the critical manipulation and once you "START" you can't start over.
  2. COMPLETED = subject is finished with whole task, displays a thanks page
  3. status >= SUBMITTED: currently they get an error message for trying to repeat. I am thinking this one should be removed if allow_repeats enabled. but what should happen?
  4. either NOT_ACCEPTED or ALLOCATED: participant has not yet agreed to the consent form. they might not have even accepted the HIT yet. this just displays the ad.

i guess the proposal is that everything stays the same except if allow_repeats is turned on you can see the ad if your status >= SUBMITTED.

but is that right? I'm thinking that maybe the issue lies in the psiturk client check_worker_status function. here it looks in your local db for a particular worker and returns information about this person. although the function itself is passed the workerId and assignmentId from psiturk.org the later is currently ignored.
https://github.com/NYUCCL/psiTurk/blob/master/psiturk/experiment.py#L180

since i've never done this the questions is if you do the task multiple times you get a new assignmentId as well. so really there needs to be something more complex in check_worker_status where if allow_repeats is false, it looks for the workerId ignoring assignmentId and behaves as usual. In contrast if allow_repeats is true then you always lookup workerId and assignmentId and so repeats with different assignmentIds just act as usual?

phew.

@deargle
Copy link
Collaborator

deargle commented Feb 9, 2017

please test your branch @deargle

Just tested, had great success. ad id if that's helpful.

i guess the proposal is that everything stays the same except if allow_repeats is turned on you can see the ad if your status >= SUBMITTED.

The way @jhamrick implemented it is nice. She only changed 19 lines in experiment.py to get it working. It's not a simple check for if status >= SUBMITTED, because people who quit early aren't allowed back in, even if repeats are allowed.

since i've never done this the questions is if you do the task multiple times you get a new assignmentId as well. so really there needs to be something more complex in check_worker_status where if allow_repeats is false, it looks for the workerId ignoring assignmentId and behaves as usual. In contrast if allow_repeats is true then you always lookup workerId and assignmentId and so repeats with different assignmentIds just act as usual?

Her code does just what you're describing. If I understand you.

I'm sure I missed something important in what you said.

@gureckis
Copy link
Member Author

gureckis commented Feb 9, 2017

the sandbox ad you just posted did not set allow_repeats to true, was that what you expected?

@deargle
Copy link
Collaborator

deargle commented Feb 9, 2017

Yes. Here's another that does set it to true.

@gureckis
Copy link
Member Author

gureckis commented Feb 9, 2017

perfect, your tests works well.

yes @jhamrick approach is in the right flavor, especially the part that selectively checks against either workerId or workerId+assignmentId. my only concern about her code is that what happens if you first run the task with allow_repeats false then true? or vice versa? run against the same table you'd have to do a bit more error checking to avoid problems.

@deargle
Copy link
Collaborator

deargle commented Feb 9, 2017

my only concern about her code is that what happens if you first run the task with allow_repeats false then true? or vice versa?

Switching back and forth should be fine in theory, yeah? Whether allow_repeats is True or False, it checks if the number of matches (workerId or workerId+assignmentId) == 0. This should work regardless of the pattern of past settings for allow_repeats for an experiment.

@gureckis
Copy link
Member Author

gureckis commented Feb 9, 2017

it's a little confusing to me. if you allow_repeats (so there are multiple copies of the same worker in the db) and then disallow repeats, what status should be returned for the worker when the ad server queries? basically in the disallow repeats case the check_worker_status is if the worker has done the task before. could there be a situation where the worker accepted but did not complete multiple assignments? in this case they should technically be allowed to run in the new assignment even though the number of matches at the worker level is >0. its more like number of matches where the status is above some threshold maybe... these are perhaps unlikely cases but certainly possible.

gureckis added a commit that referenced this issue Feb 9, 2017
if allow_repeats = False after a period where it is True then use the highest status associated with that worker as the exclusion criteria.
@deargle
Copy link
Collaborator

deargle commented Feb 9, 2017 via email

@deargle
Copy link
Collaborator

deargle commented Feb 11, 2017

Forget everything I said in the previous comment here, there's no checking of the previous submission. Maybe that can be implemented later, but as it stands, it's a fresh start every time. I vote we go ahead with the code as it currently is (including this commit) until moar feature requests are made.

I'll update the docs if you give a 👍

@deargle
Copy link
Collaborator

deargle commented Feb 15, 2017

@gureckis docs are ready. Is the ad server ready to go --

wait, hold that, I guess the takeaway from this whole conversation is that all the work is done in check_worker_status, and that the ad server doesn't need to do anything with the allow_repeats param sent to it. Oh well, we can pass it anyways.

I'll merge.

@gureckis
Copy link
Member Author

I think that is right. The new ad server field isn't terrible to have though in case we want to alter some of the error messages in the future based on the type of experiment it is.

This is one of the features that making an automated test for is a little tricky i think.

@gureckis
Copy link
Member Author

Oh actually that reminds me that in my unfinished branch on this I included something like always_show_instructions which allows you to configure if the instruction phase should run every time or only the first time the task is completed. the idea being on repeats you might want to skip the instruction phase (if you have it). possible a new feature request but it naturally ties to the allow repeats issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants