---
---

# Running Experiment


## Bit by Bit: Social Research in the Digital Age


---
---

![image.png](img/chengjun.png)

https://www.bitbybitbook.com/en/1st-ed/running-experiments/

![image.png](img/bit.png)

In [14]:
%%html
<iframe src="//player.bilibili.com/player.html?aid=331250330&bvid=BV1DA411H7ZR&cid=287799025&page=5" 
    width=800 height=500 
    scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
    
<h3>赵鼎新 问题意识：发问的艺术与策略</h3>


## 4.1 Introduction

When researchers run experiments, they **systematically intervene in the world** to create data that is ideally suited to answering questions about **cause-and-effect** relationships.


![image.png](img/ex1.png)

Experiments enable researchers to **move beyond the correlations** in naturally occurring data in order to reliably answer certain cause-and-effect questions. 

- In the analog age, experiments were often logistically difficult and expensive. 
- Now, in the digital age, logistical constraints are gradually fading away. 
    - Not only is it easier to do experiments like those done in the past, 
    - it is now possible to run new kinds of experiments.

- Experiments 
    - Perturb-and-observe experiments, 
    - involve only a single group that has received the intervention
- **Randomized Controlled Experiments**
    - intervenes for some people and not for others
    - the researcher decides which people receive the intervention by randomization
        - create fair comparisons between two groups: one that has received the intervention and one that has not. 

### Question

比萨斜塔两个铁球同时落地的实验是"随机控制实验"吗?为什么？


In [2]:
from random import choice
pantheon = ['王宣哲','刘清和','曹铸华','钦号','陈思澎','邱桐','谢子彦','陆紫珺','陈芯颖']
choice(pantheon) 

'陆紫珺'

## 4.2 What are experiments?
Randomized controlled experiments have four main ingredients: 
- recruitment of participants, 
- randomization of treatment, 
- delivery of treatment, 
- measurement of outcomes.

The digital age does not change the fundamental nature of experimentation, but it does make it easier logistically. 

For example, in the past, it might have been difficult to measure the behavior of millions of people, but that is now routinely happening in many digital systems. Researchers who can figure out how to harness these new opportunities will be able to run experiments that were impossible previously.

**An experiment by Michael Restivo and Arnout van de Rijt (2012)**

They wanted to understand the effect of informal peer rewards on editorial contributions to Wikipedia. 
- they gave **barnstars** to 100 deserving Wikipedians.
- they also picked 100 top contributors as the control group to whom they did not give barnstars.
- the treatment group and control group was determined randomly.
- they tracked the recipients’ subsequent contributions. 

The recipients tended to make fewer edits after receiving one.--> 行为经济学胜出！


When Restivo and van de Rijt looked at the behavior of people in the control group, they found that their contributions were decreasing too. 

Further, they compared people in the treatment group (i.e., received barnstars) to people in the control group, they found that people in the treatment group contributed about 60% more. 

In other words, the contributions of both groups were deceasing, but those of the control group were doing so much faster.

The logistics of digital experiments can be completely different from those of analog experiments. In Restivo and van de Rijt’s experiment, 
- it was easy to give the barnstar to anyone, 
- it was easy to track the outcome—number of edits—over an extended period of time 
    - because edit history is automatically recorded by Wikipedia.

**Why don't they scale up their experiment to millions of people?**

## 4.3 Two dimensions of experiments: lab-field and analog-digital
Lab experiments offer control, field experiments offer realism, and digital field experiments combine control and realism at scale.

![image.png](img/ex2.png)
Schematic of design space for experiments. 

Field experiments combine the strong design of randomized control experiments with more representative groups of participants performing more common tasks in more natural settings.

Lab and field experiments are complementary, with different strengths and weaknesses. 



For example, Correll, Benard, and Paik (2007) used both a lab experiment and a field experiment in an attempt to find the sources of the “motherhood penalty.” 

- Mothers earn less money than childless women, even when comparing women with similar skills working in similar jobs. 
- Interestingly, the opposite seems to be true for fathers: they tend to earn more than comparable childless men.
- One explanation is that employers are biased against mothers.

First, in the lab experiment 
- they told college undergraduates that a company was conducting an employment search for a person to lead its new East Coast marketing department. 
- The students were told that the company wanted their help in the hiring process
- they were asked to review resumes of several potential candidates and to rate the candidates on a number of dimensions, such as their intelligence, warmth, and commitment to work. 
- Further, the students were asked if they would recommend hiring the applicant and what they would recommend as a starting salary. 

The lab experiment allowed Correll and colleagues to measure a causal effect and provide a possible explanation for that effect.
- Correll and colleagues found that the students were less likely to recommend hiring the mothers and that they offered them a lower starting salary. 
- Further, through a statistical analysis of both the ratings and the hiring-related decisions, Correll and colleagues found that mothers’ disadvantages were largely explained by the fact that they were rated lower in terms of competence and commitment. 



Correll and colleagues also conducted a complementary field experiment. 
- They responded to hundreds of advertised job openings with fake cover letters and resumes. Some resumes signaled motherhood and some did not. 
- Correll and colleagues found that mothers were less likely to get called back for interviews than equally qualified childless women. 

In other words, real employers making consequential decisions in a natural setting behaved much like the undergraduates. Did they make similar decisions for the same reason? Unfortunately, we don’t know. The researchers were not able to ask the employers to rate the candidates or explain their decisions.

Researchers who prefer Lab experiments argue that 

- Lab experiments offer researchers near-total control of the environment in which participants are making decisions. 
- Lab experiments can collect additional data that can help explain why participants are making their decisions.

Researchers who prefer field experiments argue that 
- participants in lab experiments could act very differently because they know that they are being studied.
- the lab experiment will overestimate the effect of motherhood on real hiring decisions. 
- lab experiments’ reliance on WEIRD participants: mainly students from Western, Educated, Industrialized, Rich, and Democratic countries

As digital devices become increasingly integrated into people’s lives and sensors become integrated into the built environment, these opportunities to run partially digital experiments in the physical world will increase dramatically. 

> Digital experiments are not just online experiments.

The digital age also creates the possibility of running lab-like experiments online. For example, researchers have rapidly adopted Amazon Mechanical Turk (MTurk) to recruit participants for online experiments 

![image.png](img/ex3.png)

First, whereas most analog lab and field experiments have hundreds of participants, digital field experiments can have millions of participants.

Second, whereas most analog lab and field experiments treat participants as indistinguishable widgets, digital field experiments often use background information about participants in the design and analysis stages of the research. 

Third, whereas many analog lab and field experiments deliver treatments and measure outcomes in a relatively compressed amount of time, some digital field experiments happen over much longer timescales. 

While digital field experiments offer many possibilities, they also share some weaknesses with both analog lab and analog field experiments. 
- experiments cannot be used to study the past, and they can only estimate the effects of treatments that can be manipulated. 
- although experiments are undoubtedly useful to guide policy, the exact guidance they can offer is somewhat limited because of complications such as environmental dependence, compliance problems, and equilibrium effects (Banerjee and Duflo 2009; Deaton 2010). 
- Digital field experiments also magnify the ethical concerns created by field experiments

## 4.4 Moving beyond simple experiments

- Validity 
- Heterogeneity of treatment effects
- Mechanisms

**Simple Experiments**

Simple experiment narrowly focused on a much more specific question: 
> What is the average effect of this specific treatment with this specific implementation for this population of participants at this time? 

Unfortunately, loose phrasing about what “works” obscures the fact that narrowly focused experiments don’t really tell you whether a treatment “works” in a general sense. 


Simple experiments can provide valuable information, but they fail to answer many questions that are both important and interesting, such as 

> whether there are some people for whom the treatment had a larger or smaller effect; 

> whether there is another treatment that would be more effective; 

> whether this experiment relates to broader social theories.

Field experiment by P. Wesley Schultz and colleagues on the relationship between social norms and energy consumption (Schultz et al. 2007).

![image.png](img/ex4.png)

Fortunately, Schultz and colleagues did not settle for this simplistic analysis. Before the experiment began, they reasoned that heavy users of electricity—people above the mean—might reduce their consumption, and that light users of electricity—people below the mean—might actually increase their consumption. 

![image.png](img/ex5.png)

With-in subjects design
![image.png](img/ex6.png)

### 4.4.1 Validity

Validity refers to the extent to which the results of a particular experiment support some more general conclusion. 

Social scientists split validity into four main types: 
- statistical conclusion validity, 
- internal validity, 
- construct validity, 
- external validity 

(Shadish, Cook, and Campbell 2001, chap. 2).

- Statistical conclusion validity centers around whether the statistical analysis of the experiment was done correctly. 
- Internal validity centers around whether the experimental procedures were performed correctly.
- Construct validity centers around the match between the data and the theoretical constructs.
- External validity centers around whether the results of this experiment can be generalized to other situations. 

The four types of validity provide a mental checklist to help researchers assess whether the results from a particular experiment support a more general conclusion. 

A company named Opower partnered with utilities in the United States to deploy the treatment more widely, inspired by Schultz et al. (2007). 
- In a first set of experiments involving 600,000 households from 10 different sites, Allcott (2011) found that the Home Energy Report lowered electricity consumption. 
- Further, in subsequent research (Allcott 2015) involving 8 million additional households from 101 different sites. 

![image.png](img/ex7.png)

![image.png](img/ex8.png)

Action Step Module

![image.png](img/ex9.png)

**The size of the effect declined in the later experiments.** 

Allcott (2015) argues that a major source of this pattern is that

> sites with more environmentally-focused customers were more likely to adopt the program earlier. 

### 4.4.2 Heterogeneity of treatment effects

Experiments normally measure the average effect, but the effect is probably not the same for everyone.

Costa and Kahn (2013) speculated that the effectiveness of the Home Energy Report could vary based on a participant’s political ideology. They merged the Opower data with data purchased from a third-party aggregator. 
![image.png](img/ex10.png)

### 4.4.3 Mechanisms
- Patterns (captured by experiments) measure what happened. 
- Mechanisms explain why and how it happened.

Although experiments are good for estimating causal effects, they are often not designed to reveal mechanisms. Digital experiments can help us identify mechanisms in two ways: 
- (1) they enable us to collect more process data 
- (2) they enable us to test many related treatments.

![image.png](img/ex11.png)

One way to test possible mechanisms is by <u>collecting process data</u> about how the treatment impacted possible mechanisms.

In a follow-up study, Allcott and Rogers (2014) partnered with a power company that, through a rebate program, had acquired information about which consumers upgraded their appliances to more energy-efficient models. 

- slightly more people receiving the Home Energy Reports upgraded their appliances. 
- But this difference was so small that it could account for only 2% of the decrease in energy use in the treated households. 

In other words, appliance upgrades were not the dominant mechanism through which the Home Energy Report decreased electricity consumption.



A second way to study mechanisms is to run experiments with slightly different versions of the treatment that <u>enables full factorial designs</u>.
- tips vs. peer effect
    - To assess the possibility that the tips alone might have been sufficient
        - Ferraro, Miranda, and Price (2011)




![image.png](img/ex12.png)

A full factorial design

| Treatment | Characteristics                  |
| :-------- | :------------------------------- |
| 1         | Control                          |
| 2         | Tips                             |
| 3         | Appeal                           |
| 4         | Peer Information                 |
| 5         | Tips + appeal                    |
| 6         | Tips + peer information          |
| 7         | Appeal + peer information        |
| 8         | Tips + appeal + peer information |


The digital age can enable full factorial designs.

## 4.5 Making it happen
- Even if you don’t work at a big tech company you can run digital experiments. 
- You can either do it yourself or partner with someone who can help you (and who you can help).

![image.png](img/ex13.png)

### 4.5.1 Use existing environments
You can run experiments inside existing environments, often without any coding or partnership.

Doleac and Stein’s iPod advertisements varied along three main dimensions:

- characteristics of the seller, which was signaled by the hand photographed holding the iPod [white, black, white with tattoo] 
- the asking price [90, 110, 130]. 
- the quality of the ad text [high-quality and low-quality (e.g., capitalization errors and speling errors)]. 

Thus, the authors had a 3 × 3 × 2 design which was deployed across more than 300 local markets, ranging from towns to mega-cities.

    Doleac, Jennifer L., and Luke C.D. Stein. 2013. “The Visible Hand: Race and Online Market Outcomes.” Economic Journal 123 (572):F469–F492. https://doi.org/10.1111/ecoj.12082.

Averaged across all conditions, the outcomes were better for the white sellers than the black sellers, with the tattooed sellers having intermediate results.
![image.png](img/ex14.png)

**Arnout van de Rijt and colleagues (2014): The keys to success**

    Puzzlement: In many aspects of life, seemingly similar people end up with very different outcomes. 

One possible explanation for this pattern is that small—and essentially random—advantages can lock in and grow over time, a process that researchers call **cumulative advantage**. 


Rijt, Arnout van de, Soong Moon Kang, Michael Restivo, and Akshay Patil. 2014. “Field Experiments of Success-Breeds-Success Dynamics.” PNAS. 111 (19):6934–9. https://doi.org/10.1073/pnas.1316836111.

In order to determine whether small initial successes lock in or fade away
- van de Rijt and colleagues (2014) intervened in four different systems bestowing success on randomly selected participants, and then measured the subsequent impacts of this arbitrary success.

![image.png](img/ex15.png)


| Topic                                                        | References                                                   |
| :----------------------------------------------------------- | :----------------------------------------------------------- |
| Effect of barnstars on contributions to Wikipedia            | Restivo and Rijt (2012); Restivo and Rijt (2014); Rijt et al. (2014) |
| Effect of anti-harassment message on racist tweets           | Munger (2016)                                                |
| Effect of auction method on sale price                       | Lucking-Reiley (1999)                                        |
| Effect of reputation on price in online auctions             | Resnick et al. (2006)                                        |
| Effect of race of seller on sale of baseball cards on eBay   | Ayres, Banaji, and Jolls (2015)                              |
| Effect of race of seller on sale of iPods                    | Doleac and Stein (2013)                                      |
| Effect of race of guest on Airbnb rentals                    | Edelman, Luca, and Svirsky (2016)                            |
| Effect of donations on the success of projects on Kickstarter | Rijt et al. (2014)                                           |
| Effect of race and ethnicity on housing rentals              | Hogan and Berry (2011)                                       |
| Effect of positive rating on future ratings on Epinions      | Rijt et al. (2014)                                           |
| Effect of signatures on the success of petitions             | Vaillant et al. (2015); Rijt et al. (2014); Rijt et al. (2016) |

### 4.5.2 Build your own experiment

Building your own experiment might be costly, but it will enable you to create the experiment that you want.

Gregory Huber et al. (2012) explored voters' three biases

- (1) they are focused on recent rather than cumulative performance; 
- (2) they can be manipulated by rhetoric, framing, and marketing;
- (3) they can be influenced by events unrelated to incumbent performance, such as the success of local sports teams and the weather. 

It was hard to isolate any of these factors. Therefore, they created a highly simplified voting environment in order to isolate, and then experimentally study, each of these three possible biases.

> Huber, Gregory A., Seth J. Hill, and Gabriel S. Lenz. 2012. Sources of Bias in Retrospective Decision Making: Experimental Evidence on Voters Limitations in Controlling Incumbents. American Political Science Review 106 (4):720–41.

Huber and colleagues used MTurk to recruit participants. 
- Once a participant provided informed consent and passed a short test, she was told that she was participating in a 32-round game to earn tokens that could be converted into real money. 
- At the beginning of the game, each participant was told that she had been assigned an “allocator” that would give her free tokens each round and that some allocators were more generous than others. 
- Further, each participant was also told that she would have a chance to either keep her allocator or be assigned a new one after 16 rounds of the game. 

In total, Huber and colleagues recruited about 4,000 participants who were paid about $1.25 for a task that took about eight minutes.

To assess whether participants voting decisions could be influenced by purely random events in their setting, Huber and colleagues added a lottery to their experimental system. At either the 8th round or the 16th round.

![image.png](img/ex16.png)

Centola (2010) built a digital field experiment to study the effect of social network structure on the spread of behavior. 

Centola built a web-based health community. Centola recruited about 1,500 participants through advertising on health websites. When participants arrived at the online community—which was called the Healthy Lifestyle Network—they provided informed consent and were then assigned “health buddies.” Because of the way Centola assigned these health buddies, he was able to knit together different social network structures in different groups. 

Centola, D. 2010. “The Spread of Behavior in an Online Social Network Experiment.” Science 329 (5996):1194–7.

![image.png](img/ex17.png)

A popular hypothesis states that networks with many clustered ties and a high degree of separation will be less effective for behavioral diffusion than networks in which locally redundant ties are rewired to provide shortcuts across the social space. A competing hypothesis argues that when behaviors require social reinforcement, a network with more clustering may be more advantageous, even if the network as a whole has a larger diameter.

Then, Centola introduced a new behavior into each network: the chance to register for a new website with additional health information. Whenever anyone signed up for this new website, all of her health buddies received an email announcing this behavior. 

Centola found that this behavior—signing up for the new website—spread further and faster in the clustered network than in the random network, a finding that was contrary to some existing theories.

Overall, building your own experiment gives you much more control; it enables you to construct the best possible environment to isolate what you want to study. Further, building your own system decreases ethical concerns around experimenting in existing systems. When you build your own experiment, however, you run into many of the problems that are encountered in lab experiments: recruiting participants and concerns about realism. A final downside is that building your own experiment can be costly and time-consuming

### 4.5.3 Build your own product
Building your own product is a high-risk, high-reward approach. But, if it works, you can benefit from a positive feedback loop that enables distinctive research

![image.png](img/ex18.png)

### 4.5.4 Partner with the powerful

Partnering can reduce costs and increase scale, but it can alter the kinds of participants, treatments, and outcomes that you can use.

While working on a commercial fermentation project to convert beet juice into alcohol, Pasteur discovered a new class of microorganism that eventually led to the germ theory of disease.

![image.png](img/ex19.png)

On November 2, 2010—the day of the US congressional elections—all 61 million Facebook users who lived in the United States and were 18 and older took part in an experiment about voting.

Bond, Robert M., Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer, Cameron Marlow, Jaime E. Settle, and James H. Fowler. 2012. “A 61-Million-Person Experiment in Social Influence and Political Mobilization.” Nature 489 (7415):295–98.


| Topic                                                        | References                                                   |
| :----------------------------------------------------------- | :----------------------------------------------------------- |
| Effect of Facebook News Feed on information sharing          | Bakshy, Rosenn, et al. (2012)                                |
| Effect of partial anonymity on behavior on online dating website | Bapna et al. (2016)                                          |
| Effect of Home Energy Reports on electricity usage           | Allcott (2011); Allcott and Rogers (2014); Allcott (2015); Costa and Kahn (2013); Ayres, Raseman, and Shih (2013) |
| Effect of app design on viral spread                         | Aral and Walker (2011)                                       |
| Effect of spreading mechanism on diffusion                   | S. J. Taylor, Bakshy, and Aral (2013)                        |
| Effect of social information in advertisements               | Bakshy, Eckles, et al. (2012)                                |
| Effect of catalog frequency on sales through catalog and online for different types of customers | Simester et al. (2009)                                       |
| Effect of popularity information on potential job applications | Gee (2015)                                                   |
| Effect of initial ratings on popularity                      | Muchnik, Aral, and Taylor (2013)                             |
| Effect of message content on political mobilization          | Coppock, Guess, and Ternovski (2016)                         |




## 4.6 Advices
- when you are doing an experiment is that you should think as much as possible before any data has been collected. 
- no single experiment is going to be perfect, and you should consider designing a series of experiments that reinforce each other.

Two pieces of advice that are more specific for designing digital age experiments: create zero variable cost data and build ethics into your design

### 4.6.1 Create zero variable cost data
The key to running large experiments is to drive your variable cost to zero. The best ways to do this are automation and designing enjoyable experiments.

If you want to create experiments with zero variable cost data, you’ll need to ensure that everything is fully automated and that participants don’t require any payment.

| Compensation                    | References                                                   |
| :------------------------------ | :----------------------------------------------------------- |
| Website with health information | Centola (2010)                                               |
| Exercise program                | Centola (2011)                                               |
| Free music                      | Salganik, Dodds, and Watts (2006); Salganik and Watts (2008); Salganik and Watts (2009b) |
| Fun game                        | Kohli et al. (2012)                                          |
| Movie recommendations           | Harper and Konstan (2015)                 

![image.png](img/ex20.png)

MusicLab was able to run at essentially zero variable cost because of the way that it was designed. First, everything was fully automated so it was able to run while I was sleeping. Second, the compensation was free music, so there was no variable participant compensation cost. 

![image.png](img/ex21.png)

**Musiclab Data Release** 

https://opr.princeton.edu/archive/cm/

Using a "multiple worlds" experimental design, four experiments involving a total of 27,267 participants. Included in this release are 167 data files containing the experimental results, mp3 files from the 48 songs, and the data documentation. The data files are in the ascii text, comma separated values(csv) format.

These data files are to reproduce, and hopefully expand upon, the analysis conducted in the dissertation project by Matthew J. Salganik, supervised by Duncan J. Watts. The experiments were conducted at the Department of Sociology at Columbia University between 2004 and 2007.

Please direct any questions to Prof. Matthew Salganik.

### 4.6.2 Build ethics into your design: replace, refine, and reduce
Make your experiment more humane by replacing experiments with non-experimental studies, refining the treatments, and reducing the number of participants.

**Kramer, Guillory, and Hancock (2014)** Facebook Emotion Contagion

Facebook News Feed, an algorithmically curated set of Facebook status updates from a user’s Facebook friends. 
- H0: because the News Feed has mostly positive posts—friends showing off their latest party—it could cause users to feel sad because their lives seemed less exciting in comparison. 
- H1: seeing your friend having a good time would make you feel happy. 

In order to address these competing hypotheses—and to advance our understanding of how a person’s emotions are impacted by her friends’ emotions—Kramer and colleagues ran an experiment. 
 


They placed about 700,000 users into four groups for one week: 
- a “negativity-reduced” group, for whom posts with negative words (e.g., “sad”) were randomly **blocked** from appearing in the News Feed; 
- a “positivity-reduced” group for whom posts with positive words (e.g., “happy”) were randomly **blocked**; 
- and two control groups. 

Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks.” Proceedings of the National Academy of Sciences of the USA 111 (24):8788–90.

![image.png](img/ex22.png)

- Construct Validity
    - counts of words
- There is no analysis on heterogeneity of treatment effects the mechanisms
- The effect size in this experiment was very small
    - 1/1000 words
- An enormous outcry from both researchers and the press
    - manipulations of information exposure

**The first R is replace**: researchers should seek to replace experiments with less invasive and risky techniques, if possible. 

Lorenzo Coviello et al. (2014) used random variation in the weather (instrumental variable) to study the effect of changes in the Facebook News Feed without the need to intervene at all.


> Coviello, Lorenzo, Yunkyu Sohn, Adam D. I. Kramer, Cameron Marlow, Massimo Franceschetti, Nicholas A. Christakis, and James H. Fowler. 2014. “Detecting Emotional Contagion in Massive Social Networks.” PLoS ONE 9 (3):e90315.

Prior to estimation we use an instrument $X_{gt}$, the aggregated rainfall of the friends of the people in subpopulation $g$, to predict exogenous variation in the friends' emotional expression $Y_{gt}$:


$$Y_{gt} = \theta_{t}^{'} + c_g^{'} + \beta_1 X_{gt} + \beta_2 x_{gt} + \epsilon_{gt}^{'}  \space \space \space \space (1)$$

Using the fittled value of $\hat Y_{gt}$, we fit the 2SLS regression:

$$y_{gt} = \theta_t + c_g + \beta x_{gt} + \gamma \hat Y_{gt} + \epsilon_{gt}  \space \space \space \space (2)$$

where for time $t$, $y_{gt}$ is the average emotion of all people in subpopulation (city) $g$; $θ_t$ and $c_g$ are time and subpopulation fixed effects; $x_{gt}$ is the average exogenous factor (rainfall) for people in subpopulation $g$; $Y_{gt}$ is a weighted average emotional expression of friends of people in subpopulation $g$; and $ε_{gt}$ is an error term.




![image.png](img/ex23.png)

The second of the three Rs is refine:researchers should seek to refine their treatments to make them as harmless as possible. 

- For example, rather than blocking content that was either positive or negative, the researchers could have boosted content that was positive or negative.



The third R is reduce: researchers should seek to reduce the number of participants in their experiment to the minimum needed to achieve their scientific objective. 

- An approach that is sometimes called a mixed design and sometimes called a difference-in-differences estimator. 

    For each participant, the researchers could:
    - create a change score (post-treatment behavior − pre-treatment behavior) 
    - compare the change scores of participants in the treatment and control conditions. 

This difference-in-differences approach is more efficient statistically, which means that researchers can achieve the same statistical confidence using much smaller samples.

![image.png](img/chengjun2.png)