Skip to content

Commit

Permalink
incorporate some info from old blog posts
Browse files Browse the repository at this point in the history
  • Loading branch information
mizzao committed Jun 29, 2016
1 parent 66f5749 commit 558d11f
Show file tree
Hide file tree
Showing 8 changed files with 181 additions and 6 deletions.
2 changes: 2 additions & 0 deletions run-local.sh
@@ -0,0 +1,2 @@
#!/bin/bash
sphinx-autobuild --host 0.0.0.0 source _build_html
1 change: 1 addition & 0 deletions source/arch/research-methods.md
Expand Up @@ -21,3 +21,4 @@ Amazon’s Mechanical Turk.** *Behavior research methods* 44.1 (2012): 1-23.]
[mturk-methods]

[mturk-methods]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=UK-VpDoAAAAJ&citation_for_view=UK-VpDoAAAAJ:u-x6o8ySG0sC

42 changes: 39 additions & 3 deletions source/arch/why-meteor.md
Expand Up @@ -6,9 +6,36 @@ different languages communicated via AJAX for dynamic pages, good software
abstractions usually went out the window, resulting in things like jQuery
spaghetti and therefore buggy apps.

Meteor is at the forefront of a new generation of web technologies aimed at
tidying up this mess. It has a many useful features, but the following are
primarily useful for researchers using web apps to study social behavior.
What is a researcher to do, then, when most social science experiments actually
aren't *social* (since they study individuals), and to actually do social
experiments requires continual real-time interaction between large numbers of
participants? Meteor is at the forefront of a new generation of web technologies
aimed at tidying up the mess describe above, and can simplify building social
experiments significantly. As of now (June 2016) it is the [most popular web
framework on Github][meteor-stars], with an active community and extensive
documentation, and therefore great staying power.

[meteor-stars]: https://github.com/showcases/web-application-frameworks

Meteor's development paradigm is significantly different from almost all other
web frameworks, which generally focus on either the front-end or back-end. It is
full-stack from the ground up, using one language (Javascript), where the
server-side code runs on Node.js and the client-side runs in the browser. UI
code is mostly written declaratively rather than imperatively: instead of saying
_how_ you want to do something, you just write _what_ you want to do. Most
importantly, the core of Meteor is a distributed data framework (which the
declarative style hooks into) that greatly simplifies synchronizing state
between multiple clients and a server. This is another level of abstraction over
AJAX: instead of worrying about passing messages or calling remote procedures,
you can just specify the data that needs to be shared and it will be updated, in
real time, transparently. As a result, this can be an order of magnitude
reduction in the amount of code (and complexity) for a real-time or
collaborative app. As one article put it, [Meteor has made MVC obsolete](http://newcome.wordpress.com/2012/04/14/the-future-of-web-development-isnt-mvc-its-mvm/).
When leveraging front-end frameworks such as Twitter's Bootstrap that
integrate well with Meteor, we can also get started with much less custom CSS.

Meteor has many useful features, but the following are primarily useful for
researchers using web apps to study social behavior.

- **Easy development and fast prototyping.** Meteor is a one-stop shop for
setting up development for a web app, pulling in dependencies automatically.
Expand Down Expand Up @@ -36,6 +63,15 @@ deployment means you can get on with your research.
[so]: http://stackoverflow.com/
[forums]: https://forums.meteor.com/

TurkServer is designed as a package (add-on) for Meteor, taking a Meteor app and
plugging it into MTurk. The advantage of this approach is that it leverages the
many examples of web apps built in Meteor and Javascript, even those not
specific to social experiments, without depending on much proprietary
functionality. Moreover, being able to contribute full-stack code makes
components such as a waiting room, chat, and other common paradigms much easier
to share. The future of online experiments is coming, and the next generation of
web technologies is a great force multiplier for getting things done!

Find out more about Meteor:

- [Getting started](http://guide.meteor.com/#quickstart)
Expand Down
40 changes: 38 additions & 2 deletions source/design/faq.md
@@ -1,7 +1,43 @@
# Frequently Asked Questions

> I don't want to or can't write code. Can I hire a developer to build my web
application (experiment) for me?
> *Where can I find more resources for how to do online experiments?*
[Andrew Mao] has given a tutorial at [IC2S2 2016], partially based on the
content of this site, and focused on how to run social experiments involving
group interaction.

- [Andrew Mao. **Experiments of collective social behavior in the online lab.**][ic2s2-tutorial]

[IC2S2 2016]: http://www.kellogg.northwestern.edu/news-events/conference/ic2s2/2016.aspx
[ic2s2-tutorial]: https://dl.dropboxusercontent.com/u/13229094/papers/IC2S216_experiments.pdf

At [WINE 2013], [Sid Suri] and [Andrew Mao] gave a tutorial on the opportunities
and practice of online behavioral experiments, targeted at a computer science
audience. Studying online behavior is a compelling opportunity for computer
scientists, and we believe that experimental work nicely complements existing
modeling and data mining approaches by providing a way to verify hypotheses in a
controlled setting and allowing for causal claims about interventions and
observations, which are important for all types of system and mechanism design.
Simply put, the best way to study online behavior experimentally is to do
experiments with online participants.

[WINE 2013]: http://wine13.seas.harvard.edu/
[Sid Suri]: http://www.sidsuri.com/
[Andrew Mao]: http://www.andrewmao.net/

- [Andrew Mao and Siddharth Suri. **How, When, and Why to Do Online Behavioral Experiments.**][wine13-tutorial]

[wine13-tutorial]: https://dl.dropboxusercontent.com/u/13229094/papers/WINE13_experiments.pdf

This tutorial consists of three sections: a review of how the experimental
approach fits in and complements other methods for studying behavior; an
overview of experimental design concepts from the abstract to the practical,
 including examples of pitfalls from our own experiments; and a survey of
methods and tools that have been used in different fields to conduct experiments
online.

> *I don't want to or can't write code. Can I hire a developer to build my web
application (experiment) for me?*

In theory, yes. **But there are many issues that can arise when this is
process is poorly managed.** Here are some important ones.
Expand Down
Binary file added source/img/launching/parallel-pixelated.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/img/launching/remaining-panel.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/img/launching/starting-panel.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
102 changes: 101 additions & 1 deletion source/launching/coordinating-groups.md
@@ -1,3 +1,103 @@
# Recruiting Large Groups

To be completed.
Running social experiments often requires many users to participate at the same
time. So when are the greatest number of workers simultaneously available on
Mechanical Turk?

The answer to this question can help in general to guide when to post HITs,
However, social experiments often use tasks where many workers must be online at
the same time, such as for studying simultaneous interaction between many
participants and sometimes this number must be pretty large. In order to do
this, we typically schedule a time in advance and notify users to show up at
that time.

This scheduling has generally been ad hoc, but in a few cases we've collected
some extra data from workers about their availability, normalized by timezone.
The following graph shows the distribution of over 1,200 workers available in
each hour, per their reports:

![starting panel](/img/launching/starting-panel.png)

The buckets are shown from 9AM to 11PM GMT -5, but are computed from the users'
original timezones. A few caveats: the graph is for a few hundred US workers
only, and the method of collection could be biased by time of day effects (the
time of day that we collected the data will affect the time preferences of
users.) However, the pattern squares with previous anecdotal observations that
either the mid-afternoon or late evening are the best time to post group tasks,
and that people don't tend to be online as much in the morning or at
dinner time.

What happens when one starts using this panel of workers? After running a few
dozen synchronous experiments all at 2PM EDT, the distribution of the remaining
times now looks like the following, with just over 900 workers available. (For
now, we are using each worker only once.)

![remaining panel](/img/launching/remaining-panel.png)

As you can see, we've drastically reduced the number of workers available at
2PM, while not really affecting the number of workers available at 9PM. In order
to get the maximum number of workers online simultaneously for our next
synchronous batch, we'd do best to shoot for 9PM instead.

Keep in mind that there can be strong time-of-day effects here, as there are
dissimilar populations likely to be online at certain times. Because of this,
it's best to randomize over all of our possible experimental treatments
simultaneously, so that the effect hits all of them. Collecting this time-of-day
data is almost essential for overcoming the significant challenges in
scheduling a large number of unique users all online at the same time.

The graphs above were generated by the panel recruiting section of
TurkServer's admin interface.

## Example of simultaneous recruiting

The following screenshot shows a recent study we deployed using this method,
pulling out all the stops for this one, resulting in sessions with 100
participants arriving within 2 minutes of each other. They all participated for
over one hour.

![parallel](/img/launching/parallel-pixelated.png)

In our case, we randomized them into different-sized groups for [this
study][cm]. The simultaneous recruitment was necessary for all of our treatments
to experience the same population, and the biggest group of users had 32
participants.

[cm]: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0153048

## The experiment design triangle

Perhaps you've heard of the [project management triangle][pmt]: it's often been
encapsulated as **"fast, good, or cheap: pick two"** because it's been
invariably impossible to satisfy all three in executing projects.

[pmt]: http://en.wikipedia.org/wiki/Project_management_triangle

In designing web-based behavioral experiments, there is a generally a similar
triangle of three desirable properties that are very difficult to satisfy
simultaneously:

* **Large sample size**: desirable to increase experiment power
* **Large number of simultaneous arrivals**: whether for synchronous
experiments or to minimize time-of-day effects across treatments (as above)
* **Unique participants**:i.e. those who haven't seen the study before

When running large experiments that are synchronous or require extensive
randomization, and that are constrained to unique participants, one will
naturally run into the sample size ceiling. It is feasible to recruit about 1500
to 2000 active workers on MTurk on any given week, but fewer and fewer of them
are available at the same time as you schedule your experiments.

Alternatively, this means that if you can design your experiments to be less
sensitive to repeat participation, it's possible to have a lot of people
participate and also gather a lot of data. This is possible for some
experiments, but it can be challenging to ensure that the design is answering
the right question, and that participants aren't being primed from the past
or sensitive to experience.

Finally, the vast majority of online studies that are done don't require
simultaneous arrivals because either they are for single users or they are not
scheduled consistently at the same time of day. Without this constraint,
it's possible to have many unique participants with a sample size of a
couple thousand or more, but one should be careful that region and time-of-day
effects are being controlled for.

0 comments on commit 558d11f

Please sign in to comment.