# Table of Contents
* [Submitting HITs](#Submitting-HITs)
	* [Building URLs for images on s3](#Building-URLs-for-images-on-s3)
	* [submitting HITs in groups](#submitting-HITs-in-groups)
* [Reviewing HITs](#Reviewing-HITs)
* [Writing annotation results](#Writing-annotation-results)
* [Ignore](#Ignore)


In [1]:
%%capture
from __future__ import division
import numpy as np
import pandas as pd
import scipy.stats as st
import itertools
import math
from collections import Counter, defaultdict
%load_ext autoreload
%autoreload 2

In [2]:
import pickle
import boto
import json
from copy import deepcopy
import boto.mturk.connection as tc
import boto.mturk.question as tq
from boto.mturk.qualification import PercentAssignmentsApprovedRequirement, Qualifications, Requirement

from keysTkingdom import mturk_ai2
from keysTkingdom import aws_tokes

import pdfextraction.amt_boto_modules as amt_util

# Submitting HITs

## Building URLs for images on s3

In [5]:
book_groups,ranges = amt_util.load_book_info()

In [6]:
daily_sci_urls = amt_util.make_book_group_urls(book_groups, 'daily_sci', ranges)
spectrum_sci_urls = amt_util.make_book_group_urls(book_groups, 'spectrum_sci', ranges)

## submitting HITs in groups

In [8]:
sandbox_host = 'mechanicalturk.sandbox.amazonaws.com' 
mturk = tc.MTurkConnection(
    aws_access_key_id = aws_tokes.access_key,
    aws_secret_access_key = aws_tokes.access_secret_key,
    host = sandbox_host,
    debug = 1 # debug = 2 prints out all requests.
)

In [7]:
static_params = {
    'title': "Annotate Science Textbook",
    'description': "Choose which category text from a grade-school science book best belongs to",
    'keywords': ['image', 'science' ],
    'frame_height': 1000,
    'amount': 0.03,
    'duration': 3600 * 3,
    'max_assignments': 3
}

In [14]:
# amt_util.create_single_hit(mturk, daily_sci_urls[87], static_params)

In [11]:
amt_util.create_hits_from_pages(mturk, daily_sci_urls[602:642], static_params)

In [12]:
amt_util.delete_all_hits(mturk)

# Reviewing HITs

there are 1100 pages from daily science

In [113]:
r_hits = mturk.get_reviewable_hits(page_size=100)

In [110]:
annotation_results = {}
for hit in r_hits:
    assignments = mturk.get_assignments(hit.HITId)
    for assigment in assignments:
#         print(assignment.SubmitTime)
#         print int((assignment.SubmitTime).split('-')[2].split('T')[0])
        for answers in assigment.answers:
            annotation_results[answers[0].fields[0]] = answers[1].fields            

# Writing annotation results

In [98]:
import json
import jsonschema
import requests 
from pdfextraction.annotation_schema import page_schema
from flask import request

In [293]:
review_api_endpoint = 'http://localhost:8080/api/review'
payload = {'pages_to_review': str(annotation_results.keys())}
headers = {'content-type': 'application/json'}
requests.post(review_api_endpoint, data=json.dumps(payload), headers=headers)

<Response [200]>

In [99]:
for page_name, results in annotation_results.iteritems():
    unaltered_annotations = lamt_util.load_local_annotation(page_name)
    amt_util.process_annotation_results(page_name, results, unaltered_annotations, './test_results/', page_schema)

KeyError: u'T21'

# Ignore

In [84]:
# batch_results_df = pd.read_csv(data_dir+results_csv)
# print(batch_results_df.shape)
# batch_results_df.head(2)

In [6]:
# grouped_results_df = batch_results_df.groupby('Input.image_url')
# for image_response in grouped_results_df:
#     print(image_response[1]['Answer.NumberOfItems'])

Choosing the right price for your HITs is crucial, and it can be tricky to figure out when you’re first starting. It’s here that those using Mechanical Turk as a digital sweatshop are separated from those using Mechanical Turk as fair and equitable way to employ of other people. Many turkers consider it unethical to pay under $0.10 per minute. This amount works out to a $6.00 hourly wage or the minimum wage in the US (though many states pay higher). Turkers specifically pay attention to price when determining whether or not a HIT is worth their time. As one turker said in a survey “…I figure a good task is one I can make 10 to 12 cents a minute on.” If you’re looking to get your HITs done quickly and have high-quality turkers work on them (and trust me, you are!) then you should make sure you pay your turkers fairly. If you want a quick rule of thumb it’s:

Fair Pay = $0.10 x (Average Number Of Minutes Per Assignment)

In [None]:
def most_common_strict(turk_responses_single_page):
    """
    returns the consensus response of the three raw response strings for a given page
    """
    most_common = turk_responses_single_page[1]['Answer.NumberOfItems'].mode()
    if most_common.empty:
        most_common = pd.Series(['NO AGREEMENT'])
    return most_common

grouped_results_df = batch_results_df.groupby('Input.image_url')
for turk_response in grouped_results_df:
    print(image_response[1]['Answer.NumberOfItems'])

In [85]:
for hit in r_hits[30:42]:
    assignments = mturk.get_assignments(hit.HITId)
    for assignment in assignments:
        print int((assignment.SubmitTime).split('-')[2].split('T')[0])

2
2
2
2
6
6
6
6
6
6
6
6
