# Table of Contents
* [Submitting HITs](#Submitting-HITs)
	* [Building URLs for images on s3](#Building-URLs-for-images-on-s3)
	* [submitting HITs in groups](#submitting-HITs-in-groups)
* [Reviewing HITs](#Reviewing-HITs)
* [Writing annotation results](#Writing-annotation-results)
* [Ignore](#Ignore)


In [2]:
%%capture
from __future__ import division
import numpy as np
import pandas as pd
import scipy.stats as st
import itertools
import math
from collections import Counter, defaultdict
%load_ext autoreload
%autoreload 2

In [3]:
import pickle
import boto
import json
from copy import deepcopy
import boto.mturk.connection as tc
import boto.mturk.question as tq
from boto.mturk.qualification import PercentAssignmentsApprovedRequirement, Qualifications, Requirement

from keysTkingdom import mturk_ai2
from keysTkingdom import aws_tokes

import pdfextraction.amt_boto_modules as amt_util

# Submitting HITs

## Building URLs for images on s3

In [4]:
book_groups,ranges = amt_util.load_book_info()

In [5]:
daily_sci_urls = amt_util.make_book_group_urls(book_groups, 'daily_sci', ranges)
spectrum_sci_urls = amt_util.make_book_group_urls(book_groups, 'spectrum_sci', ranges)

## submitting HITs in groups

In [6]:
sandbox_host = 'mechanicalturk.sandbox.amazonaws.com' 
mturk = tc.MTurkConnection(
    aws_access_key_id = aws_tokes.access_key,
    aws_secret_access_key = aws_tokes.access_secret_key,
    host = sandbox_host,
    debug = 1 # debug = 2 prints out all requests.
)
mturk.get_account_balance() # a reminder of sandbox

[$10,000.00]

In [7]:
static_params = {
    'title': "Annotate Science Textbook",
    'description': "Choose which category text from a grade-school science book best belongs to",
    'keywords': ['image', 'science', 'text', 'labeling' ],
    'frame_height': 800,
    'amount': 0.04,
    'duration': 3600 * 24 *3,
    'max_assignments': 2
}

In [40]:
amt_util.create_hits_from_pages(mturk, daily_sci_urls[700:702], static_params)

In [39]:
amt_util.delete_all_hits(mturk)

# Reviewing HITs

there are 1100 pages from daily science

In [41]:
r_hits = amt_util.get_completed_hits(mturk)

In [42]:
r_hits

[<boto.mturk.connection.HIT at 0x10d1b12d0>,
 <boto.mturk.connection.HIT at 0x1092bd290>]

In [44]:
assignment_results = amt_util.get_assignments(mturk, r_hits)

In [45]:
raw_hit_results = amt_util.process_raw_hits(assignment_results)

In [225]:
# amt_util.accept_hits(mturk, assignment_results)

In [94]:
results_df = amt_util.make_results_df(raw_hit_results)

In [95]:
results_df.head(2)

Unnamed: 0,page,category,hit_id,assignment_id,box_id
0,Daily_Science_Grade_4_Evan_Moor_159.jpeg,Discussion,3SR6AEG6W5UEEFKDA71U2ZF6UG2YH0,3A4TN5196KJ4YWH24HO7USI4BZ2HCV,T14
1,Daily_Science_Grade_4_Evan_Moor_159.jpeg,unlabeled,3SR6AEG6W5UEEFKDA71U2ZF6UG2YH0,3A4TN5196KJ4YWH24HO7USI4BZ2HCV,T15


In [96]:
grouped_by_page = results_df.groupby(['page', 'box_id'])

In [137]:
agg_res = grouped_by_page.agg(pd.DataFrame.mode)
agg_res.drop(['assignment_id', 'page', 'box_id'], axis=1, inplace=True)
agg_res = agg_res.fillna('Answer')
agg_res = agg_res.reset_index()
agg_res.drop('level_2', axis=1, inplace=True)

In [152]:
agg_res.head(2)

Unnamed: 0,page,box_id,category,hit_id
0,Daily_Science_Grade_4_Evan_Moor_159.jpeg,T1,unlabeled,3SR6AEG6W5UEEFKDA71U2ZF6UG2YH0
1,Daily_Science_Grade_4_Evan_Moor_159.jpeg,T10,Answer,3SR6AEG6W5UEEFKDA71U2ZF6UG2YH0


In [191]:
amt_util.write_results_df(agg_res)

In [194]:
to_review = ['start_seq'] + list(pd.unique(agg_res['page']))

In [195]:
amt_util.review_results(to_review)

# Ignore

Choosing the right price for your HITs is crucial, and it can be tricky to figure out when you’re first starting. It’s here that those using Mechanical Turk as a digital sweatshop are separated from those using Mechanical Turk as fair and equitable way to employ of other people. Many turkers consider it unethical to pay under $0.10 per minute. This amount works out to a $6.00 hourly wage or the minimum wage in the US (though many states pay higher). Turkers specifically pay attention to price when determining whether or not a HIT is worth their time. As one turker said in a survey “…I figure a good task is one I can make 10 to 12 cents a minute on.” If you’re looking to get your HITs done quickly and have high-quality turkers work on them (and trust me, you are!) then you should make sure you pay your turkers fairly. If you want a quick rule of thumb it’s:

Fair Pay = $0.10 x (Average Number Of Minutes Per Assignment)

In [41]:
review_api_endpoint = 'http://localhost:8080/api/review'
payload = {'pages_to_review': str(annotation_results.keys())}
headers = {'content-type': 'application/json'}
requests.post(review_api_endpoint, data=json.dumps(payload), headers=headers)

<Response [200]>

In [153]:
def most_common_strict(turk_responses_single_page):
    """
    returns the consensus response of the three raw response strings for a given page
    """
    most_common = turk_responses_single_page[1]['Answer.NumberOfItems'].mode()
    if most_common.empty:
        most_common = pd.Series(['NO AGREEMENT'])
    return most_common

grouped_results_df = batch_results_df.groupby('Input.image_url')
for turk_response in grouped_results_df:
    print(image_response[1]['Answer.NumberOfItems'])

NameError: name 'batch_results_df' is not defined

In [154]:
for hit in r_hits[30:42]:
    assignments = mturk.get_assignments(hit.HITId)
    for assignment in assignments:
        print int((assignment.SubmitTime).split('-')[2].split('T')[0])

In [155]:
for page_name, results in annotation_results.iteritems():
    unaltered_annotations = amt_util.load_local_annotation(page_name)
    amt_util.process_annotation_results(page_name, results, unaltered_annotations, './ai2-vision-turk-data/textbook-annotation-test/test-annotations/', page_schema)

NameError: name 'annotation_results' is not defined