# Introduction

This notebook will walk you through creating and monitoring HITs, specifically for the importance labeling task.

It provides methods to create HITs, pretty-print HIT and assignment status, expire/edit HITs, create qualifications, and download collected data. 

Before continuing, make sure that you have read the README and set all config fields to their desired values.

## Requirements: 

This code requires Python3 and the following packages: 
- boto3 
- beautiful soup 4

Before using, you will have to set up an authentication key to use the Amazon API and include it in a credentials file. See here: https://aws.amazon.com/developers/getting-started/python/

# Setup

Read the config file and establish a connection to MTurk.

A connection is made to production or to the sandbox based on values in the config. 

In [116]:
import datetime
import boto3
import json
import copy
import pprint
import os
from bs4 import BeautifulSoup as bs 
from uuid import uuid4

In [117]:
# Constants

# # Path to config file
# CONFIG_PATH = "../../mturk-importance-labeling/config.json"

# # Path to the html task
# TASK_PATH = '../../mturk-importance-labeling'

# # where to save downloaded results 
# SAVE_PATH = "./result.csv" 


CONFIG_PATH=None

# Path to the html task
TASK_PATH = './vidrank'

if TASK_PATH == './verifyvidrank':
    title = "Video Ranking Verification"
    description = 'You will be shown a set of 1 to 4 videos as well as an automatically generated ranking of 5 unknown videos based on their similarity with the first set. You will have to indicate if the ranking is good or bad.'
    task_url = "https://allenjlee.github.io/verifyvidrank/?task="
else:
    title = "Rank the Videos"
    description = "You will be shown a set of 1 to 4 videos, and you will have to rank another set of 5 videos according to how similar they are with the first set."
    task_url = "https://allenjlee.github.io/vidrank/?task="

# where to save downloaded results 
SAVE_PATH = "./result.csv"

In [118]:
# Safety flags that prevent you from accidentally messing up your HITs. 
# Set to False except when you are performing these specific tasks.
ALLOW_HIT_CREATION = True
ALLOW_ASSIGNMENT_ADDITION = False
ALLOW_CREATE_QUAL = True
ALLOW_UPDATE_EXPIRATION = True

In [119]:
# Read config and extract relevant settings

if CONFIG_PATH:
    with open(CONFIG_PATH, 'r') as f:
        config = json.loads(f.read())
        1
else:
    config = {"hitCreation": {
            "title": title,
            "description": description,
            "numTasks": 1,
            "numAssignments": 20,
            "rewardAmount": "0.66",
            "keywords": "videos, ranking, game, watch",
            "duration": 1200,
            "lifetime": 186400,
            "taskUrl": task_url,
            "production": False
        },

        "advanced": {
            "includeDemographicSurvey": False,
            "hideIfNotAccepted": False,
            "externalSubmit": False,
            "externalSubmitUrl": ""
        }}

    
hit_config = config['hitCreation']

external_submit = config['advanced']['externalSubmit']
    
# Sandbox or Production? You only spend money in Production.
# hit_config['production'] = 'false'
USING_PROD = hit_config['production']

if USING_PROD:
    print("USING PROD")
    endpoint_url = 'https://mturk-requester.us-east-1.amazonaws.com'
    origin="production"
else:
    print("USING SANDBOX")
    endpoint_url = 'https://mturk-requester-sandbox.us-east-1.amazonaws.com'
    origin="sandbox"

# If using an external link, add a querystring origin=sandbox or origin=production 
# for use in your js logic if you want. Not done for MTurk submits because it breaks the submit link
if external_submit: 
    hit_config['taskUrl'] = "%s?origin=%s" % (hit_config['taskUrl'], origin)

if external_submit:
    print("Configuring task as external link with data submitted to: %s" % config['advanced']['externalSubmitUrl'])
else:
    print("Configuring task as an iframe within Mturk")

session = boto3.session.Session(profile_name='default')
cl = session.client('mturk', region_name='us-east-1', endpoint_url=endpoint_url)

USING SANDBOX
Configuring task as an iframe within Mturk


## Define task URL with folds to use in HIT
The folds are txt files containing paths to the images to be used in the hit. This notebooks allows for creation of multiple hits, each with a different fold.txt .
In the folder "files", multiple txt files (folds) can be found. Each one contains 10 images, and there are no repeats among folds. Each fold has a piece of the full canva_scraping2 dataset.

In [124]:
# Whether to launch a hit per fold.txt in folder "files"
LAUNCH_HITS_FOR_ALL_FOLDS = True
FOLD_REQ_STR = 'kinetics_words_r1corrected_refs1'
fold_numbers = list(range(10))

# # If need to manually specify fold, can be done here
# fld = 'sentinel_large.txt'


if LAUNCH_HITS_FOR_ALL_FOLDS:
    all_folds = [f[:-5] for f in os.listdir(os.path.join(TASK_PATH,'src/hit_jsons')) if (FOLD_REQ_STR in f and int(f.split('.')[0].split('_')[-1]) in fold_numbers)]        
    print('Using folds:',all_folds)
    print('Num folds to use:', len(all_folds))
else:
    if hit_config['fold']:
        hit_config['taskUrl'] = hit_config['taskUrl'] + "?url=%s" % hit_config['fold']
    elif fld:
        hit_config['taskUrl'] = hit_config['taskUrl'] + "?url=%s" % fld
    else:
        fld = input('Define fold (input name+.txt)')
        hit_config['taskUrl'] = hit_config['taskUrl'] + "?url=%s" % fld    
    
# print("TASK URL:", hit_config['taskUrl'])

Using folds: ['kinetics_words_r1corrected_refs1_fold_2', 'kinetics_words_r1corrected_refs1_fold_4', 'kinetics_words_r1corrected_refs1_fold_8', 'kinetics_words_r1corrected_refs1_fold_1', 'kinetics_words_r1corrected_refs1_fold_7', 'kinetics_words_r1corrected_refs1_fold_5', 'kinetics_words_r1corrected_refs1_fold_0', 'kinetics_words_r1corrected_refs1_fold_6', 'kinetics_words_r1corrected_refs1_fold_9', 'kinetics_words_r1corrected_refs1_fold_3']
Num folds to use: 10


# Make new HIT

In [125]:
# List of qualifications that you will use to filter potential workers. 
# These require that workers come from the US and have an approval rating >= 95%
# Edit this list to specify different qualifications for workers 
QUALS = [
    {
        'QualificationTypeId': '00000000000000000071',
        'Comparator': 'EqualTo',
        'LocaleValues': [{
            'Country': 'US',
        }],
    },
    {
        'QualificationTypeId': '000000000000000000L0',
        'Comparator': 'GreaterThanOrEqualTo',
        'IntegerValues': [
            99
        ],
    },
]

In [126]:
# Helpers for creating HITs. 

# generic helper that sets metadata fields based on the config file.
def create_hit(task, questionText, quals=QUALS): 
    response = cl.create_hit(
        MaxAssignments=task['numAssignments'],
        AutoApprovalDelayInSeconds=604800,
        LifetimeInSeconds=task['lifetime'],
        AssignmentDurationInSeconds=task['duration'],
        Reward=task['rewardAmount'],
        Title=task['title'],
        Keywords=task['keywords'],
        Description=task['description'],
        Question=questionText,
        QualificationRequirements=quals,
    )
    print(response)
    print("\n")

# creates a HIT in the form of an External Question inside an iFrame
def create_hit_iframe(task):
    questionText = "<ExternalQuestion xmlns=\"http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/"
    questionText += "2006-07-14/ExternalQuestion.xsd\">\n<ExternalURL>" + task['taskUrl']
    questionText += "</ExternalURL>\n  <FrameHeight>700</FrameHeight>\n</ExternalQuestion>"
    create_hit(task, questionText)
    
# Helper to create a HIT in the form of a simple UI with a link to an external page and an
# input box for a completion code 
def create_hit_external(task):
    with open('questionform_template.xml', 'r') as myfile:
        template=myfile.read() 
    question_xml = template % (hit_config["title"], hit_config["description"], task['taskUrl'])
    create_hit(task, question_xml)

In [127]:
# Use this cell to launch your HIT! 
if ALLOW_HIT_CREATION: 
    if not (hit_config.get('variants', False) or hit_config.get('numTasks', False)): 
        raise RuntimeError("You must specify either hitCreation.numTasks or hitCreation.variants in your config.json file")
    
    hit_creation_function = create_hit_external if external_submit else create_hit_iframe
    
    if LAUNCH_HITS_FOR_ALL_FOLDS:
        print("creating", len(all_folds), "tasks with folds %s" % all_folds)
        for fold in all_folds:
            hit_config['taskUrl'] = hit_config['taskUrl'].split('?')[0] + "?task=%s" % fold
            print("Creating HIT with %d assignments and url %s" % (hit_config['numAssignments'], hit_config['taskUrl']))
            hit_creation_function(hit_config)
    
    elif hit_config.get('numTasks', False): 
        print("creating " + str(hit_config['numTasks']) + " tasks")
        for i in range(hit_config['numTasks']):
            hit_creation_function(hit_config)
    else: 
        print("creating " + str(len(config['variants'])) + " variants")
        for var in hit_config['variants']: 
            task = copy.deepcopy(config)
            task.update(var)
            hit_creation_function(task)
    
else: 
    raise RuntimeError("This action is not currently enabled; set `ALLOW_HIT_CREATION` to true to proceed with this action")

creating 10 tasks with folds ['kinetics_words_r1corrected_refs1_fold_2', 'kinetics_words_r1corrected_refs1_fold_4', 'kinetics_words_r1corrected_refs1_fold_8', 'kinetics_words_r1corrected_refs1_fold_1', 'kinetics_words_r1corrected_refs1_fold_7', 'kinetics_words_r1corrected_refs1_fold_5', 'kinetics_words_r1corrected_refs1_fold_0', 'kinetics_words_r1corrected_refs1_fold_6', 'kinetics_words_r1corrected_refs1_fold_9', 'kinetics_words_r1corrected_refs1_fold_3']
Creating HIT with 20 assignments and url https://allenjlee.github.io/vidrank/?task=kinetics_words_r1corrected_refs1_fold_2
{'HIT': {'HITId': '3K3G488TR4NVGRPZT8PTHUCI3B2Q5R', 'HITTypeId': '3H63IRCKX5V8WXQFC76DJDQMA5JE8E', 'HITGroupId': '3Z26YXPIJS3UDPRV5O7V00Z0TO6RGH', 'CreationTime': datetime.datetime(2019, 11, 13, 6, 38, 16, tzinfo=tzlocal()), 'Title': 'Rank the Videos', 'Description': 'You will be shown a set of 1 to 4 videos, and you will have to rank another set of 5 videos according to how similar they are with the first set.', 

{'HIT': {'HITId': '3L21G7IH49B51BF2JV4BRQ23Q4EY1K', 'HITTypeId': '3H63IRCKX5V8WXQFC76DJDQMA5JE8E', 'HITGroupId': '3Z26YXPIJS3UDPRV5O7V00Z0TO6RGH', 'CreationTime': datetime.datetime(2019, 11, 13, 6, 38, 17, tzinfo=tzlocal()), 'Title': 'Rank the Videos', 'Description': 'You will be shown a set of 1 to 4 videos, and you will have to rank another set of 5 videos according to how similar they are with the first set.', 'Question': '<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">\n<ExternalURL>https://allenjlee.github.io/vidrank/?task=kinetics_words_r1corrected_refs1_fold_7</ExternalURL>\n  <FrameHeight>700</FrameHeight>\n</ExternalQuestion>', 'Keywords': 'videos, ranking, game, watch', 'HITStatus': 'Assignable', 'MaxAssignments': 20, 'Reward': '0.66', 'AutoApprovalDelayInSeconds': 604800, 'Expiration': datetime.datetime(2019, 11, 15, 10, 24, 57, tzinfo=tzlocal()), 'AssignmentDurationInSeconds': 1200, 'QualificationRe

# HIT monitoring helpers

Helper functions that will be useful for monitoring the status of your HIT. See next section for how to use them.

In [44]:
# Contacts MTurk API to get all assignments for a HIT
# Returns them in a list. 
def get_all_assignments(hitid): 
    assignments = []
    should_continue = True
    next_token = False
    while (should_continue): 
        args = {'HITId': hitid, 
                'MaxResults': 100}
        if (next_token): 
            args['NextToken'] = next_token
        r = cl.list_assignments_for_hit(**args)
        next_token = r.get('NextToken', False)
        assignments.extend(r["Assignments"])
        should_continue = len(r["Assignments"]) > 0
    return assignments


def get_hits(max_results=200):
    hits = []
    mr = min(max_results, 100)
    should_continue = True
    next_token=False
    c=0
    while(should_continue):
        args = {
            'MaxResults': mr
        }
        if (next_token): 
            args['NextToken'] = next_token
        r = cl.list_hits(**args)
        next_token = r.get('NextToken', False)
        hits.extend(r["HITs"])
        c += len(r["HITs"])
        should_continue = next_token and (c<max_results)
#         mr = mr-100
    return hits

# Summarizes all hits in `hits` in a human-readable way. 
# Prints out the HIT Title, id, if it is expired, and how many assignments it has
# completed, pending, and left for work. 
def summarize_hits(hits, get_submitted=True): 
    print(len(hits))
    ret = ""
    for hit in hits: 
        expiration = hit['Expiration'].replace(tzinfo=None)
        is_expired = expiration < datetime.datetime.now()
        description = ("Title: {title}\n" 
        "ID: {hid}\n"
        "\tAssignments left: {left}\n"
        "\tAssignments completed: {complete}\n"
        "\tAssignments pending: {pending}\n"
        ).format(
            title=hit['Title'], 
            hid=hit['HITId'], 
            left=hit['NumberOfAssignmentsAvailable'], 
            complete=hit['NumberOfAssignmentsCompleted'], 
            pending=hit['NumberOfAssignmentsPending']
            
        )
        
        if get_submitted:
            submitted=0
            assignments = get_all_assignments(hit['HITId'])
            if assignments:
                for a in assignments: 
                    if a['AssignmentStatus'] == 'Submitted':
                        submitted+=1
                    
            description+='\tAssignments submitted: %d\n' % submitted
        
        description += "\tExpired: {exp}\n\n".format(exp=str(is_expired))
        
        ret += description
    print(ret)
    
# Prints a human-readable summary of all pending/submitted/approved assignments for all hits in `hits`
def summarize_assignments(hits):
    ret = ""
    for hit in hits: 
        hid = hit['HITId']
        title =  hit['Title']
        name = "HIT %s: %s" % (hid, title)
        ret += name + "\n"
        assignments = get_all_assignments(hid)
        if len(assignments) == 0: 
            ret += "\tNo pending/submitted/approved assignments for this HIT\n"
        for a in assignments: 
            desc = "\tAssignment {aid}\n\t\tStatus: {status}\n".format(aid=a['AssignmentId'], status=a['AssignmentStatus'])
            ret += desc
    print(ret)
    
# Refreshes data about the requested hits
def refresh_hits(): 
    global hits 
    global MAX_RESULTS
    hits = cl.list_hits(MaxResults=MAX_RESULTS)['HITs']

# HIT monitoring

In [112]:
# API call to grab HIT data from MTurk 
hits = get_hits(max_results=80)
print(len(hits))

80


In [113]:
# Summarizes all outstanding HITs
# refresh_hits()
summarize_hits(hits)

80
Title: Rank the Videos
ID: 3U18MJKL1W1VCFLMSQMEWWXG3ZENC6
	Assignments left: 0
	Assignments completed: 0
	Assignments pending: 0
	Assignments submitted: 20
	Expired: False

Title: Rank the Videos
ID: 373L46LKP9LHCN6P55BKR8T7ZOIKJC
	Assignments left: 0
	Assignments completed: 0
	Assignments pending: 0
	Assignments submitted: 20
	Expired: False

Title: Rank the Videos
ID: 31ANT7FQNAHI3YQWAI4TD6UAP6J5H7
	Assignments left: 0
	Assignments completed: 0
	Assignments pending: 0
	Assignments submitted: 20
	Expired: False

Title: Rank the Videos
ID: 3VADEH0UHECBMHMV5RP6FQ6TPH1PSY
	Assignments left: 0
	Assignments completed: 0
	Assignments pending: 0
	Assignments submitted: 20
	Expired: False

Title: Rank the Videos
ID: 37PGLWGSJVLLHCMJNNQKF7E7M9TKI0
	Assignments left: 0
	Assignments completed: 0
	Assignments pending: 0
	Assignments submitted: 20
	Expired: False

Title: Rank the Videos
ID: 33P2GD6NRP7LLHBZH58ZFVDSMWEKH7
	Assignments left: 0
	Assignments completed: 0
	Assignments pending: 0
	As

In [19]:
# Summarizes assignments for all oustanding HITs 
# refresh_hits()
summarize_assignments(hits)

HIT 30ZKOOGW2WFJRUTSVTBQ1WO4R60A1Z: Annotate the most important regions on graphic designs
	Assignment 3WLEIWSYHPR8DMHWA5QXBKHZ49QH29
		Status: Approved
	Assignment 3G0WWMR1UWUHF15SFEBBCMCY6RDNQ2
		Status: Approved
	Assignment 33PPUNGG39FB8RYBVHB5CZTCCSWZR7
		Status: Approved
	Assignment 3IFS6Q0HJJT1EG9EA2NO2EVI2HDISH
		Status: Approved
	Assignment 3NLZY2D53QZRR12731VWZU891XOQLG
		Status: Approved
	Assignment 3GA6AFUKOPYXY4DFE542UUMRAO2H3Y
		Status: Approved
	Assignment 3IGI0VL648UEUPSC01J7NHHYRR6ON8
		Status: Approved
	Assignment 3Q8GYXHFEQC9VBP36WEUY0NVBSOC5A
		Status: Approved
	Assignment 33PPUNGG39FB8RYBVHB5CZTCCSZZRA
		Status: Approved
	Assignment 38F5OAUN5OMFZNE2GH7S7BY7FQQ7HV
		Status: Approved
	Assignment 36U2A8VAG29PGXBB9B0PXGYWB64KYE
		Status: Approved
	Assignment 3VE8AYVF8N7ZL5SFVWBYIQIPAVJ8FL
		Status: Approved
	Assignment 3TDXMTX3CC4WRIBH3PKL19ZQKSMI61
		Status: Approved
	Assignment 34T446B1C1OA04329Q8IH5R4UKH0CH
		Status: Approved
	Assignment 3H8DHMCCWALMIMGKDFDBKVOVTI1DK

# Approve HITs

Approves all outstanding assignments for the HITs displayed above. 

In [114]:
def approve_all(hits): 
    num_approved = 0
    for hit in hits: 
        # make sure you keep getting assignments 
        assignments = get_all_assignments(hit["HITId"])
        #print(assignments)
        for a in assignments: 
            if a['AssignmentStatus'] != 'Approved':
                print("Approving assignment")
                num_approved += 1
                cl.approve_assignment(AssignmentId=a['AssignmentId'])
    print("Approved %d assignments" % num_approved)

In [115]:
# refresh_hits()
approve_all(hits)

Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving ass

Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving ass

Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving ass

Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving ass

Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approving assignment
Approved 1600 assignments


# Update expiration or num tasks

In [39]:
# changes the expiration date on a HIT to days_from_now days in the future
def update_expiration(hitid, days_from_now): 
    if ALLOW_UPDATE_EXPIRATION: 
        days = days_from_now*datetime.timedelta(days=1)
        expire_time = datetime.datetime.now() + days

        response = cl.update_expiration_for_hit(HITId=hitid, ExpireAt=expire_time)
        print(response)
        return response
    else: 
        raise RuntimeError("This action is not currently enabled; set `ALLOW_UPDATE_EXPIRATION` to true to proceed with this action")
    
def expire_hit(hit): 
    return update_expiration(hit, -10)

In [40]:
def add_assignments(hitid, num_assignments): 
    if ALLOW_ASSIGNMENT_ADDITION: 
        response = cl.create_additional_assignments_for_hit(
            HITId=hitid,
            NumberOfAdditionalAssignments=num_assignments
        )
        print(response)
        return response
    else: 
        raise RuntimeError("This action is not currently enabled; set `ALLOW_ASSIGNMENT_ADDITION` to true to proceed with this action")

In [21]:
# Use this cell to expire a HIT 
HIT_id_to_expire = "3R5OYNIC2CIRNIB3MVDXXXWT6GHPTP" 
expire_hit(HIT_id_to_expire)

{'ResponseMetadata': {'RequestId': 'e8c14e88-e069-49c4-a2ae-0a20a687026e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'e8c14e88-e069-49c4-a2ae-0a20a687026e', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'date': 'Thu, 21 Feb 2019 20:47:39 GMT'}, 'RetryAttempts': 0}}


{'ResponseMetadata': {'RequestId': 'e8c14e88-e069-49c4-a2ae-0a20a687026e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'e8c14e88-e069-49c4-a2ae-0a20a687026e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '2',
   'date': 'Thu, 21 Feb 2019 20:47:39 GMT'},
  'RetryAttempts': 0}}

In [None]:
# Use this cell to add assignments to a HIT 
HIT_id_to_add_assignments = "FILL THIS IN"
num_assignments_to_add = 0
add_assignments(HIT_id_to_add_assignments, num_assignments_to_add)

# Add custom qualifications 

## Add a qualification to disqualify workers who have done work before

- uses "negative qualification" method from https://github.com/cloudyr/MturkR/wiki/qualifications-as-blocks

#### NOTE: quals are kept separate for the sandbox and prod. Make sure you are creating and assigning your quals in prod. 

### Structure of a new qualification

In [41]:
NEW_QUAL = {
    'Name': 'qualName',
    'Keywords': 'Keywords for qual',
    'Description': 'What is this qual, and why are you assigning it?',
    'QualificationTypeStatus': 'Active',
    'AutoGranted': False
}

### Helpers for creating, viewing, and assigning qualifications

In [19]:
# Registers a custom qualification with MTurk 
def create_qual(new_qual):
    if ALLOW_CREATE_QUAL: 
        response = cl.create_qualification_type(**new_qual)
        print(response)
        Id = response['QualificationType']['QualificationTypeId']
        print("id", Id)
        return Id
    else: 
        raise RuntimeError("This action is not currently enabled; set `ALLOW_CREATE_QUAL` to true to proceed with this action")
        
# Gets all the custom quals you have created and prints them
def list_quals(): 
    response = cl.list_qualification_types(
            Query='hasCompletedVisualGraphRecallTask',
            MustBeRequestable=False
    )
    print(response)
    
# Assigns a qualification to a worker 
def assign_qual(qual_id, worker_ids): 
    for worker in worker_ids: 
        response = cl.associate_qualification_with_worker(
                QualificationTypeId=qual_id, 
                WorkerId=worker,
                IntegerValue=1,
                SendNotification=False
        )
        print(response)
        assert response
        
# Gets the ids of all workers who worked on a particular hit 
def get_workers_for_hit(hitid): 
    a = get_all_assignments(hitid)
    workers = [a_['WorkerId'] for a_ in a]
    return workers
    
# Confirms that every worker in worker_ids has qual with qual_id
def confirm_quals(qual_id, worker_ids): 
    for w in worker_ids: 
        response = cl.get_qualification_score(
                QualificationTypeId=qual_id,
                WorkerId=w
        )
        response = response['Qualification']
        assert response['Status'] == 'Granted'
        assert response['IntegerValue'] == 1
        
# Assigns qual with `qual_id` to every worker who has completed an assignment for the hit with `hitid`
def assign_qual_for_hit(hitid, qual_id): 
    workers = get_workers_for_hit(hitid)
    print("got workers")
    assign_qual(qual_id, workers)
    print("assigned qual")
    confirm_quals(qual_id, workers)
    print("confirmed qual")

### Use the following cells to manipulate qualifications

In [None]:
# Use this cell to view the custom qualifications you have created
list_quals()

In [None]:
# Use this cell to create a new qual 
qual_to_create = {}
create_qual(qual_to_create)

In [None]:
# Use this cell to assign a custom qual to every worker who has done a specific HIT
hit_id = "FILL THIS IN"
qual_id_to_assign = "FILL THIS IN"
assign_qual_for_hit(hit_id, qual_id_to_assign)

# Create Compensation HIT

Mistakes happen, and sometimes they can lead to a worker who put in an honest effort being unable to complete a task and get paid. It's a good idea to compensate these workers when they reach out because it helps maintain relations with workers and is the right thing to do.

However, workers can only be paid upon completing a task. The workaround is to create a custom qualification, assign it to the worker you want to compensate, and create a no-work HIT requiring the custom qualification. This code does that.

In [20]:
# worker_ids is str[]
# compensation is str but should match the regex ^\d*\.\d\d$ (e.g. "1.00")
# for_hit_id is str -- optional, but helpful for records
def compensate_workers(worker_ids, compensation, for_hit_id=""):
    with open('compensation.xml', 'r') as myfile:
        question_xml=myfile.read()

    keywords = 'compensation'
    description = 'Compensation for HIT'
    if for_hit_id:
        keywords += ', ' + for_hit_id
        description += ' ' + for_hit_id

    # create qual, assign to workers
    custom_qual = {
        'Name': str(uuid4()), # a qual must have a unique name
        'Keywords': keywords,
        'Description': description,
        'QualificationTypeStatus': 'Active',
        'AutoGranted': False
    }
    qual_id = create_qual(custom_qual)
    assign_qual(qual_id, worker_ids)

    # create HIT requiring qual
    task = {
        'numAssignments': len(worker_ids),
        'lifetime': 3 * 24 * 60 * 60, # 3 days
        'duration': 5 * 60, # 5 min
        'rewardAmount': compensation,
        'title': description,
        'keywords': keywords,
        'description': description,
    }
    quals = [{
        'QualificationTypeId': qual_id,
        'Comparator': 'Exists',
        'ActionsGuarded': 'DiscoverPreviewAndAccept'
    }]
    create_hit(task, question_xml, quals)

In [47]:
worker_ids = ['A11LNK1U3DT08V'] # worker_id strings in a list
compensation = "1.20" # change to the amount of dollars you want to give
for_hit_id = "38DNTK7MFNCPVHTMMXO7M8XEDG910H" # hit_id string (what you are compensating for)
compensate_workers(worker_ids, compensation, for_hit_id)

{'QualificationType': {'QualificationTypeId': '3PIG4EQXCCW840UU1VRHERAU0FC33V', 'CreationTime': datetime.datetime(2019, 4, 4, 21, 7, 10, tzinfo=tzlocal()), 'Name': '9ac6025f-2082-43ff-aef9-b9fde7ba17d9', 'Description': 'Compensation for HIT 38DNTK7MFNCPVHTMMXO7M8XEDG910H', 'Keywords': 'compensation, 38DNTK7MFNCPVHTMMXO7M8XEDG910H', 'QualificationTypeStatus': 'Active', 'IsRequestable': True, 'AutoGranted': False}, 'ResponseMetadata': {'RequestId': '8629fb0b-4640-48dc-b30e-c9671b2b0136', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '8629fb0b-4640-48dc-b30e-c9671b2b0136', 'content-type': 'application/x-amz-json-1.1', 'content-length': '354', 'date': 'Fri, 05 Apr 2019 01:07:10 GMT'}, 'RetryAttempts': 0}}
id 3PIG4EQXCCW840UU1VRHERAU0FC33V
{'ResponseMetadata': {'RequestId': 'b39f291b-bcf4-4dc6-9172-43b3d2f38a3c', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'b39f291b-bcf4-4dc6-9172-43b3d2f38a3c', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2',

# Download data

Helper to download data from MTurk 

In [107]:
def pretty_print(obj):
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(obj)
    pp = None

# Downloads all the assignments completed for `hits` as a list of dictionaries. 
# If a download_path is given, also saves that data as json 
def get_assignment_content(hits, download_path="", should_print=False): 
    all_responses = []
    for hit in hits: 
        hitid = hit['HITId']
        assignments = get_all_assignments(hitid)
        for a in assignments:
            a_xml = a['Answer']
            #print(a_xml)
            soup = bs(a_xml, "html.parser")
            answers = soup.find_all("answer")
            #print(answers)
            results = {'HITId': a['HITId'], 'AssignmentId': a['AssignmentId'], 'WorkerId': a['WorkerId']}
            for ans in answers: 
                identifier = ans.find('questionidentifier').string
                answer = ans.find('freetext').string
                try: 
                    results[identifier] = json.loads(answer)
                except:
                    results[identifier] = answer
            all_responses.append(results)
    if should_print: 
        pretty_print(all_responses)
    if download_path: 
        with open(download_path, 'w') as outfile: 
            json.dump(all_responses, outfile)
    return all_responses
            

In [111]:
# Use this cell to download data
responses = get_assignment_content(hits, download_path='./mturk_responses/kinetics_40folds_20assigs_balanced_1refcorrected.json', should_print=False)
print('Individual assignments downloaded:',len(responses))

Individual assignments downloaded: 800
