incorporate some info from old blog posts

TurkServer · Jun 29, 2016 · 558d11f · 558d11f
1 parent 66f5749
commit 558d11f
Show file tree

Hide file tree

Showing 8 changed files with 181 additions and 6 deletions.
diff --git a/run-local.sh b/run-local.sh
@@ -0,0 +1,2 @@
+#!/bin/bash
+sphinx-autobuild --host 0.0.0.0 source _build_html
diff --git a/source/arch/research-methods.md b/source/arch/research-methods.md
@@ -21,3 +21,4 @@ Amazon’s Mechanical Turk.** *Behavior research methods* 44.1 (2012): 1-23.]
  [mturk-methods]
 
 [mturk-methods]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=UK-VpDoAAAAJ&citation_for_view=UK-VpDoAAAAJ:u-x6o8ySG0sC
+
diff --git a/source/arch/why-meteor.md b/source/arch/why-meteor.md
@@ -6,9 +6,36 @@ different languages communicated via AJAX for dynamic pages, good software
 abstractions usually went out the window, resulting in things like jQuery 
 spaghetti and therefore buggy apps.
 
-Meteor is at the forefront of a new generation of web technologies aimed at 
-tidying up this mess. It has a many useful features, but the following are 
-primarily useful for researchers using web apps to study social behavior.
+What is a researcher to do, then, when most social science experiments actually
+aren't *social* (since they study individuals), and to actually do social
+experiments requires continual real-time interaction between large numbers of
+participants? Meteor is at the forefront of a new generation of web technologies
+aimed at tidying up the mess describe above, and can simplify building social
+experiments significantly. As of now (June 2016) it is the [most popular web
+framework on Github][meteor-stars], with an active community and extensive
+documentation, and therefore great staying power.
+
+[meteor-stars]: https://github.com/showcases/web-application-frameworks
+
+Meteor's development paradigm is significantly different from almost all other
+web frameworks, which generally focus on either the front-end or back-end. It is
+full-stack from the ground up, using one language (Javascript), where the
+server-side code runs on Node.js and the client-side runs in the browser. UI
+code is mostly written declaratively rather than imperatively: instead of saying
+_how_ you want to do something, you just write _what_ you want to do. Most
+importantly, the core of Meteor is a distributed data framework (which the
+declarative style hooks into) that greatly simplifies synchronizing state
+between multiple clients and a server. This is another level of abstraction over
+AJAX: instead of worrying about passing messages or calling remote procedures,
+you can just specify the data that needs to be shared and it will be updated, in
+real time, transparently. As a result, this can be an order of magnitude
+reduction in the amount of code (and complexity) for a real-time or
+collaborative app. As one article put it, [Meteor has made MVC obsolete](http://newcome.wordpress.com/2012/04/14/the-future-of-web-development-isnt-mvc-its-mvm/).
+When leveraging front-end frameworks such as Twitter's Bootstrap that
+integrate well with Meteor, we can also get started with much less custom CSS.
+
+Meteor has many useful features, but the following are primarily useful for
+researchers using web apps to study social behavior.
 
 - **Easy development and fast prototyping.** Meteor is a one-stop shop for 
 setting up development for a web app, pulling in dependencies automatically. 
@@ -36,6 +63,15 @@ deployment means you can get on with your research.
 [so]: http://stackoverflow.com/
 [forums]: https://forums.meteor.com/ 
 
+TurkServer is designed as a package (add-on) for Meteor, taking a Meteor app and
+plugging it into MTurk. The advantage of this approach is that it leverages the
+many examples of web apps built in Meteor and Javascript, even those not
+specific to social experiments, without depending on much proprietary
+functionality. Moreover, being able to contribute full-stack code makes
+components such as a waiting room, chat, and other common paradigms much easier
+to share. The future of online experiments is coming, and the next generation of
+web technologies is a great force multiplier for getting things done!
+
 Find out more about Meteor:
 
 - [Getting started](http://guide.meteor.com/#quickstart)

diff --git a/source/design/faq.md b/source/design/faq.md
@@ -1,7 +1,43 @@
 # Frequently Asked Questions
 
-> I don't want to or can't write code. Can I hire a developer to build my web
- application (experiment) for me? 
+> *Where can I find more resources for how to do online experiments?*
+
+[Andrew Mao] has given a tutorial at [IC2S2 2016], partially based on the 
+content of this site, and focused on how to run social experiments involving 
+group interaction.
+
+- [Andrew Mao. **Experiments of collective social behavior in the online lab.**][ic2s2-tutorial]
+
+[IC2S2 2016]: http://www.kellogg.northwestern.edu/news-events/conference/ic2s2/2016.aspx
+[ic2s2-tutorial]: https://dl.dropboxusercontent.com/u/13229094/papers/IC2S216_experiments.pdf
+
+At [WINE 2013], [Sid Suri] and [Andrew Mao] gave a tutorial on the opportunities
+and practice of online behavioral experiments, targeted at a computer science
+audience. Studying online behavior is a compelling opportunity for computer
+scientists, and we believe that experimental work nicely complements existing
+modeling and data mining approaches by providing a way to verify hypotheses in a
+controlled setting and allowing for causal claims about interventions and
+observations, which are important for all types of system and mechanism design.
+Simply put, the best way to study online behavior experimentally is to do
+experiments with online participants.
+
+[WINE 2013]: http://wine13.seas.harvard.edu/
+[Sid Suri]: http://www.sidsuri.com/
+[Andrew Mao]: http://www.andrewmao.net/
+
+- [Andrew Mao and Siddharth Suri. **How, When, and Why to Do Online Behavioral Experiments.**][wine13-tutorial]
+
+[wine13-tutorial]: https://dl.dropboxusercontent.com/u/13229094/papers/WINE13_experiments.pdf
+
+This tutorial consists of three sections: a review of how the experimental
+approach fits in and complements other methods for studying behavior; an
+overview of experimental design concepts from the abstract to the practical,
+ including examples of pitfalls from our own experiments; and a survey of
+methods and tools that have been used in different fields to conduct experiments
+online.
+
+> *I don't want to or can't write code. Can I hire a developer to build my web
+ application (experiment) for me?* 
 
 In theory, yes. **But there are many issues that can arise when this is 
 process is poorly managed.** Here are some important ones. 

diff --git a/source/img/launching/parallel-pixelated.png b/source/img/launching/parallel-pixelated.png
diff --git a/source/img/launching/remaining-panel.png b/source/img/launching/remaining-panel.png
diff --git a/source/img/launching/starting-panel.png b/source/img/launching/starting-panel.png
diff --git a/source/launching/coordinating-groups.md b/source/launching/coordinating-groups.md
@@ -1,3 +1,103 @@
 # Recruiting Large Groups 
 
-To be completed.
+Running social experiments often requires many users to participate at the same
+ time. So when are the greatest number of workers simultaneously available on 
+ Mechanical Turk?
+
+The answer to this question can help in general to guide when to post HITs,
+However, social experiments often use tasks where many workers must be online at
+the same time, such as for studying simultaneous interaction between many
+participants and sometimes this number must be pretty large. In order to do
+this, we typically schedule a time in advance and notify users to show up at
+that time.
+
+This scheduling has generally been ad hoc, but in a few cases we've collected
+some extra data from workers about their availability, normalized by timezone.
+The following graph shows the distribution of over 1,200 workers available in
+each hour, per their reports:
+
+![starting panel](/img/launching/starting-panel.png)
+
+The buckets are shown from 9AM to 11PM GMT -5, but are computed from the users'
+original timezones. A few caveats: the graph is for a few hundred US workers
+only, and the method of collection could be biased by time of day effects (the
+time of day that we collected the data will affect the time preferences of
+users.) However, the pattern squares with previous anecdotal observations that
+either the mid-afternoon or late evening are the best time to post group tasks,
+and that people don't tend to be online as much in the morning or at
+dinner time.
+
+What happens when one starts using this panel of workers? After running a few
+dozen synchronous experiments all at 2PM EDT, the distribution of the remaining
+times now looks like the following, with just over 900 workers available. (For
+now, we are using each worker only once.)
+
+![remaining panel](/img/launching/remaining-panel.png)
+
+As you can see, we've drastically reduced the number of workers available at
+2PM, while not really affecting the number of workers available at 9PM. In order
+to get the maximum number of workers online simultaneously for our next
+synchronous batch, we'd do best to shoot for 9PM instead.
+
+Keep in mind that there can be strong time-of-day effects here, as there are
+dissimilar populations likely to be online at certain times. Because of this,
+it's best to randomize over all of our possible experimental treatments
+simultaneously, so that the effect hits all of them. Collecting this time-of-day
+data is almost essential for overcoming the significant challenges in
+scheduling a large number of unique users all online at the same time.
+
+The graphs above were generated by the panel recruiting section of 
+TurkServer's admin interface.
+
+## Example of simultaneous recruiting
+
+The following screenshot shows a recent study we deployed using this method,
+pulling out all the stops for this one, resulting in sessions with 100
+participants arriving within 2 minutes of each other. They all participated for
+over one hour.
+
+![parallel](/img/launching/parallel-pixelated.png)
+
+In our case, we randomized them into different-sized groups for [this
+study][cm]. The simultaneous recruitment was necessary for all of our treatments
+to experience the same population, and the biggest group of users had 32
+participants.
+
+[cm]: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0153048 
+
+## The experiment design triangle
+
+Perhaps you've heard of the [project management triangle][pmt]: it's often been
+encapsulated as **"fast, good, or cheap: pick two"** because it's been
+invariably impossible to satisfy all three in executing projects.
+
+[pmt]: http://en.wikipedia.org/wiki/Project_management_triangle
+
+In designing web-based behavioral experiments, there is a generally a similar
+triangle of three desirable properties that are very difficult to satisfy
+simultaneously:
+
+  * **Large sample size**: desirable to increase experiment power
+  * **Large number of simultaneous arrivals**: whether for synchronous 
+  experiments or to minimize time-of-day effects across treatments (as above)
+  * **Unique participants**:i.e. those who haven't seen the study before
+
+When running large experiments that are synchronous or require extensive
+randomization, and that are constrained to unique participants, one will
+naturally run into the sample size ceiling. It is feasible to recruit about 1500
+to 2000 active workers on MTurk on any given week, but fewer and fewer of them
+are available at the same time as you schedule your experiments.
+
+Alternatively, this means that if you can design your experiments to be less
+sensitive to repeat participation, it's possible to have a lot of people
+participate and also gather a lot of data. This is possible for some 
+experiments, but it can be challenging to ensure that the design is answering
+the right question, and that participants aren't being primed from the past
+or sensitive to experience.
+
+Finally, the vast majority of online studies that are done don't require
+simultaneous arrivals because either they are for single users or they are not
+scheduled consistently at the same time of day. Without this constraint,
+it's possible to have many unique participants with a sample size of a
+couple thousand or more, but one should be careful that region and time-of-day
+effects are being controlled for.