# Experiment Design Concepts

How do you deduce **causality**? *Run an experiment!*

Covered in this lesson:
1. [What is an expirement](#what_is)<br>

> 1.1 [1.1 What types of studies are there](#study_types)<br>

2. [What types of experiments are there](#types)<br>

3. [How are outcomes measured](#measured)<br>

4. [What some common pitfalls of experiment design are](#pitfalls)<br>

By the end of this section, you will know what is required to create experiments that effectively address your goals.

## <a id="what_is">1: What is an experiment</a>

What makes an experiment... *an experiment*?

**Key Features of Experiments:**
1. Comparison between groups

> * The difference between groups that have, and the groups that have not, experienced an event (e.g. new web page)

2. Control other variables

> * The only feature that should be different between the groups should be the feature that we specifically manipulate (e.g. whether a customer views a web page diplayed one way and if a customer views it in another way)

Controlling for other variables can be done through random assignment (e.g. roll of the dice decides who sees what web page). This way, the distribution of other attributes, like age and gender, should be similar between groups. The only practical difference between the groups should be the feature we manipulate (e.g. new web page or the old one).

<img src="exp_des_0.png">

**Quasi-Experiment:**<br>
Example: A game company releases a "patch" for a new game. It's not possible to compare the people with and without the patch, but is possible to compare the people before the patch and the people after the patch. However, there may be influence of *other factors* on the outcome than the patch alone

<img src="exp_des_1.png">

* If the patch were applied randomly, it could lead to infulencing behavior (e.g. players with the patch would not be able to play with players without, etc.)
* If a $\beta$ is offered, then players self select into the patch, thus the assignment is no longer random and we cannot say that the other attributes will be similarly distributed (i.e. players in the patch are different than those in the general population)

### <a id="study_types">1.1 What types of studies are there</a>

**Types of Study**
There are many ways in which data can be collected in order to test or understand the relationship between two variables of interest. These methods can be put into three main bins, based on the amount of control that you hold over the variables in play:
* If you have a lot of control over features, then you have an experiment.
* If you have no control over the features, then you have an observational study.
* If you have some control, then you have a quasi-experiment.

While the experiment is the main focus of this lesson, it's also useful to know about the other types of study so that you can use them in effective ways, especially if an experiment cannot be run.

**Experiments**
In the *social* and *medical sciences*, an experiment is defined by **comparing outcomes between two or more groups**, and **ensuring equivalence between the compared groups except for the manipulation that we want to test**. Our interest in an experiment is to see if a change in one feature has an effect in the value of a second feature, like seeing if changing the layout of a button on a website causes more visitors to click on it. Having multiple groups is necessary in order to compare the outcome for when we apply the manipulation to when we do not (e.g. old vs. new website layout), or to compare different levels of manipulation (e.g. drug dosages). We also need equivalence between groups so that we can be as sure as possible that the differences in the outcomes were only due to the difference in our manipulated feature.

Equivalence between groups is typically carried out through some kind of randomization procedure. A **unit of analysis** is the entity under study, like a page view or a user in a web experiment. If we randomly assign our units of analysis to each group, then on the whole, we should expect the feature distributions between groups to be about the same. This theoretically isolates the changes in the outcome to the changes in our manipulated feature. Of course, we can always dig deeper afterwards to see if certain other features worked in tandem with, or against, our manipulation.

**Observational Studies**
In an experiment, we exert a lot of control on a system in order to narrow down the changes in our system from one source to one output. Observational studies, on the other hand, are defined by a lack of control. *Observational studies* are also known as *naturalistic or correlational studies*. In an observational study, no control is exerted on the variables of interest, perhaps due to ethical concerns or a lack of power to enact the manipulation. This often comes up in medical studies. For example, if we want to look at the effects of smoking on health, the potential risks make it unethical to force people into smoking behaviors. Instead, we need to rely on existing data or groups to make our determinations.

We typically cannot infer causality in an observational study due to our lack of control over the variables. Any relationship observed between variables may be due to unobserved features, or the direction of causality might be uncertain. But simply because an observational study does not imply causation does not mean that it is not useful. An interesting relationship might be the spark needed to perform additional studies or to collect more data. These studies can help strengthen the understanding of the relationship we're interested in by ruling out more and more alternative hypotheses.

**Quasi-Experiments**
In between the observational study and the experiment is the quasi-experiment. This is where some, but not all, of the control requirements of a true experiment are met. For example, rolling out a new website interface to all users to see how much time they spend on it might be considered a quasi-experiment. While the manipulation is controlled by the experimenter, there aren't multiple groups to compare. The experimenter can still use the behavior of the population pre-change and compare that to behaviors post-change, to make judgment on the effects of the change. However, there is the possibility that there are other effects outside of the manipulation that caused the observed changes in behavior. For the example earlier in this paragraph, it might be that users would have naturally gravitated to higher usage rates, regardless of the website interface.

As another example, we might have two different groups upon which to make a comparison of outcomes, but the original groups themselves might not be equivalent. A classic example of this is if a researcher wants to test some new supplemental materials for a high school course. If they select two different schools, one with the new materials and one without, we have a quasi-experiment since the differing qualities of students or teachers at those schools might have an effect on the outcomes. Ideally, we'd like to match the two schools before the test as closely as possible, but we can't call it a true experiment since the assignment of student to school can't be considered random.

While a quasi-experiment may not have the same strength of causality inference as a true experiment, the results can still provide a strong amount of evidence for the relationship being investigated. This is especially true if some kind of matching is performed to identify similar units or groups. Another benefit of quasi-experimental designs is that the relaxation of requirements makes the quasi-experiment more flexible and easier to set up.

**Examples of distinguishing types of experiments:**
<img src="exp_des_2.png" width="500">

**ASIDE:**
[This fascinating New York Times](https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html) article details different ways of investigating the claim that Nike's Vaporfly running shoes provide a significant advantage in running speed, despite not being able to run a true, randomized experiment.

## <a id="types">2: What is an experiment</a>

## <a id="measured">3: What types of experiments are there</a>

## <a id="pitfalls">4: What some common pitfalls of experiment design are</a>