# Sampling and observational studies

## Valid Claims

### Example 1: Valid Claims

Santos wants to know what percentage of people who get married in his county knew each other in school.

**Which of the following survey methods will allow Santos to make a valid conclusion about the percentage of couples married in his county who knew each other in school?**

Sort a list of couples married in his county into a random order, then ask the first $30$ couples on the list.

---

Asking $30$ couples whose names are chosen at random from Santos's county will produce information from which he can make a valid conclusion.

The $30$ couples married most recently in Santos's county may not be representative of the entire population of couples married in the county.

The valid method:

Sort a list of couples married in his county into a random order, then ask the first $30$ couples on the list.

# [Sampling (statistics)](https://en.wikipedia.org/wiki/Sampling_(statistics))

## [Sampling methods](https://en.wikipedia.org/wiki/Sampling_(statistics)#Sampling_methods)

- [Simple random sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Simple_random_sampling)
- [Systematic sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Systematic_sampling)
- [Stratified sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Stratified_sampling)
- [Probability-proportional-to-size sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Probability-proportional-to-size_sampling)
- [Cluster sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Cluster_sampling)
- [Quota sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Quota_sampling)
- [Minimax sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Minimax_sampling)
- [Convenience sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Accidental_sampling)
- [Voluntary sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Voluntary_Sampling)
- [Line-intercept sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Line-intercept_sampling)
- [Panel sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Panel_sampling)
- [Snowball sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Snowball_sampling)
- [Theoretical sampling](https://en.wikipedia.org/wiki/Sampling_(statistics)#Theoretical_sampling)

### Good ways to sample

**Simple random sample:** Every member and set of members has an equal chance of being included in the sample. Technology, random number generators, or some other sort of chance process is needed to get a simple random sample.

Example—A teachers puts students' names in a hat and chooses without looking to get a sample of students.

Why it's good: Random samples are usually fairly representative since they don't favor certain members.

**Stratified random sample:** The population is first split into groups. The overall sample consists of some members from every group. The members from each group are chosen randomly.

Example—A student council surveys $100$ students by getting random samples of $25$ freshmen, $25$ sophomores, $25$ juniors, and $25$ seniors.

Why it's good: A stratified sample guarantees that members from each group will be represented in the sample, so this sampling method is good when we want some members from every group.

**Cluster random sample:** The population is first split into groups. The overall sample consists of every member from some of the groups. The groups are selected at random.

Example—An airline company wants to survey its customers one day, so they randomly select $5$ flights that day and survey every passenger on those flights.

Why it's good: A cluster sample gets every member from some of the groups, so it's good when each group reflects the population as a whole.

**Systematic random sample:** Members of the population are put in some order. A starting point is selected at random, and every $n^{\text{th}}$ member is selected to be in the sample.

Example—A principal takes an alphabetized list of student names and picks a random starting point. Every $20^{\text{th}}$ student is selected to take a survey.

### Example 1: Voluntary sampling

A restaurant leaves comment cards on all of its tables and encourages customers to participate in a brief survey to learn about their overall experience.

### Example 2: Simple sampling
    
Each student at a school has a student identification number. Counselors have a computer generate $50$ random identification numbers and those students are asked to take a survey.

### Example 3: Cluster random sampling

A principal orders t-shirts and wants to check some of them to make sure they were printed properly. She randomly selects $2$ of the $10$ boxes of shirts and checks every shirt in those $2$ boxes.

### Example 4: Stratified random sampling

A school chooses $3$ randomly selected athletes from each of its sports teams to participate in a survey about athletics at the school.

### Example 5: Systematic random sampling

While students are lined up for school pictures, a teacher passes out a survey to every $10^{\text{th}}$ student.

---

## Errors in sample surveys

Survey results are typically subject to some error. Total errors can be classified into sampling errors and non-sampling errors. The term "error" here includes systematic biases as well as random errors.

### Sampling errors and biases

Sampling errors and biases are induced by the sample design. They include:

1.  **[Selection bias](https://en.wikipedia.org/wiki/Selection_bias "Selection bias")**: When the true selection probabilities differ from those assumed in calculating the results.
2.  **[Random sampling error](https://en.wikipedia.org/wiki/Sampling_error "Sampling error")**: Random variation in the results due to the elements in the sample being selected at random.

### Non-sampling error

Non-sampling errors are other errors which can impact final survey estimates, caused by problems in data collection, processing, or sample design. Such errors may include:

1.  **Over-coverage**: inclusion of data from outside of the population
2.  **Under-coverage**: sampling frame does not include elements in the population.
3.  **Measurement error**: e.g. when respondents misunderstand a question, or find it difficult to answer
4.  **Processing error**: mistakes in data coding
5.  **[Non-response or Participation bias](https://en.wikipedia.org/wiki/Participation_bias "Participation bias")**: failure to obtain complete data from all selected individuals


Biased wording of survey questions can cause people to favor certain responses over others. Suggesting that smoking is illegal might make it less likely for students who smoke to admit they do.

Response bias is when people are systematically dishonest when answering a question. High school students who smoke aren't likely to admit it to their counselor. At the same time, it's doubtful that students would lie in the other direction—students who don't smoke probably wouldn't say that they do.

Nonresponse is when people chosen for the sample cannot be reached or refuse to participate. In this scenario, a large majority of people did not respond to the phone calls.

Undercoverage is when the researcher systematically excludes members of the population from being in the sample. Random digit dialing will reach people with mobile phones and unlisted numbers, but people without phones are excluded from the sample. This might be an issue, but it doesn't seem as concerning as the very high rate of nonresponse.

Response bias isn't as concerning as a nonresponse rate of more than $90\%$, although it is possible that people may be concerned about privacy just from getting the phone call.

### Example 1: Voluntary response sampling

David hosts a podcast and he is curious how much his listeners like his show. He decides to start with an online poll. He asks his listeners to visit his website and participate in the poll.

The poll shows that $89\%$ of the $200$ respondents "love" his show.

**What is the most concerning source of bias in this scenario?**

Voluntary response sampling

**Which direction of bias is more likely in this scenario?**

$89\%$ is probably an overestimate of the percentage of all listeners that love the show.

### Example 2: Convenience samplling

David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the next $100$ listeners who send him fan emails.

They don't all respond, but $94$ of the $97$ listeners who responded said they "loved" his show.

**What is the most concerning source of bias in this scenario?**

Convenience sampling

**Which direction of bias is more likely in this scenario?**

The results are probably an overestimate of the percentage of all listeners that love the show.

---

# Types of statistical studies

> [Observational studies and experiments](https://www.khanacademy.org/math/statistics-probability/designing-studies/types-studies-experimental-observational/a/observational-studies-and-experiments)

- Sample Study
- Observational Study
- Experiment


- The purpose of a sample study is to estimate a certain parameter of a population while observational studies and experiments compare two parameters of a population.

- In an _observational study,_ we measure or survey members of a sample without trying to affect them.
- In a _controlled experiment,_ we assign people or things to groups and apply some treatment to one of the groups, while the other group does not receive the treatment.


|Type|Number of parameters|Result|
|:-|:-|:-|
|Sample Study|1|Correlation|
|Observational Study|2|Correlation|
|Experiment|2|Correlation or Causality|

### Example 1: Observational study

A research institute interested in factors that can either cause cardiovascular disease or prevent it conducted a long-term study of the $5209$ people of Framingham, Massachusetts. The study included extensive physical examinations and lifestyle interviews performed every $2$ years, which were analyzed for common patterns related to cardiovascular disease development.

**What type of statistical study did the researchers use?**

Observational study

The researchers found a negative association among males between the amount of daily physical activity and the occurrence of cardiovascular diseases.

**What valid conclusions can be made from this result?**

The result suggests that there is a positive correlation between high physical activity and low risk of cardiovascular disease among males in Framingham.

### Example 2: Observational study

Ana was interested in the relationship between funding sources for nutritional researches and the conclusions of those researches.
She gathered information on all researches concerning soft drinks that were published between $1$ January, $1999$ and $31$ December, $2003$, and compared the source of their funding with the results of the research.

**What type of a statistical study did Ana use?**

Observational study

Ana found that out of the researches that were funded by food industry companies, $%0$, percent had unfavorable results to the companies, while $\%37$, percent of the researches that were funded by other sources had unfavorable results to the soft drink industry.

**What valid conclusions can be made from this result?**

The result suggests that there is a correlation between the source of funding and the conclusions of soft drink researches.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population while observational studies and experiments _compare_ two parameters of a population.

The study was designed to check the connection between the funding source of a research and its conclusions. It's looking for a comparison, so this _isn’t_ a sample study. It could either be an experiment or an observational study.

In an experiment, one parameter of the population is actively changed to see the effect on the other parameter. For this study to be an experiment, Ana would have to actively assign funding sources to individual researches, which would be impractical. Therefore, this is an _observational study_.

An observational study cannot indicate a causal relationship between parameters. In this case, it's possible that a third factor is present in some of the researches that causes the ones that are funded by soft drink companies to obtain less unfavorable results. We can't claim that soft drink companies necessarily bias the researches they fund.

If this was a randomized experiment with active assignment of treatments, we could be confident that other effects would be smoothed out by the randomness, and that the correlation is evidence of causation. However, this is not the case.

Ana observed researches regarding soft drinks, so we can't really conclude anything about pharmaceutical researches from these results.

Ana used an _observational study_.

We can validly conclude that the result suggests that there is a correlation between the source of funding and the conclusions of soft drink researches.

### Example 3: Experiment

Researchers were interested in assessing the effect of meditation on work stress. They randomly assigned $200$ full-time employees to two groups. One group was instructed to meditate $10$-$20$ minutes twice per day, and to participate in weekly $1$-hour sessions, while the other group wasn't given any special instructions.

Just before the randomization and also after a period of $8$ weeks, all participants were required to fill out the Psychological Strain Questionnaire (PSQ), an accepted measure of work stress. The researchers calculated the difference in questionnaire scores for all participants, where a positive change corresponds to a reduction in work stress. Then, they compared the average differences of each group.

**What type of a statistical study did the researchers use?**

Experiment

The researchers found that the mean change in PSQ scores of the group who meditated is $21$ points greater than the mean change in PSQ scores of the group who didn’t. Based on some re-randomization simulations, they concluded that the result is significant and not due to the randomization of the groups.

**What valid conclusions can be made from this result?**

The result suggests that meditation can reduce work stress.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population while observational studies and experiments _compare_ two parameters of a population.

The study was designed to check the connection between meditation and work stress. It's looking for a comparison, so this _isn’t_ a sample study. It could either be an experiment or an observational study.

In an experiment, one parameter of the population is actively changed to see the effect on the other parameter. In this case, the meditation regimen was actively assigned by the researchers to each subject to assess its effect on the subject's work stress. Therefore, this is indeed an _experiment_.

Randomized experiments are designed to suggest causation. In this case, the randomization of the subjects between the groups was intended to smooth out any factors other than meditation that may affect work stress.

Therefore, it is valid to conclude that meditation can reduce work stress. It is also valid to conclude that there's a positive correlation between the parameters, but correlation is weaker than causation. So if causation can be concluded, it is more suitable than correlation.

Even though experiments suggest causation, it would be too far-reaching to conclude that "meditation reduces work stress," which implies that meditation would reduce work stress for anyone who practiced it.

Furthermore, the experiment deals with _work stress_ and not any kind of stress, so it's wrong to conclude that meditation can reduce stress in general.

The researchers used an _experiment_.

We can validly conclude that the result suggests that meditation can reduce work stress.

### Example 4: Observational study

British researchers were interested in the relationship between farmers’ approach to their cows and the cows’ milk yield. They prepared a survey questionnaire regarding the farmers' perception of the cows' mental capacity, the treatment they give to the cows, and the cows' yield. The survey was filled by all the farms in Great Britain.

**What type of a statistical study did the researchers use?**

Observational study

After analyzing their results, they found that on farms where cows were called by name, milk yield was $258$ liters higher on average than on farms where this was not the case.

**What valid conclusions can be made from this result?**

The result suggests that there is a positive correlation between calling British cows by name and the cows' milk yield.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population while observational studies and experiments _compare_ two parameters of a population.

The study was designed to check the connection between farmers' approach to their cows and the cows' milk yield. It's looking for a comparison, so this _isn’t_ a sample study. It could either be an experiment or an observational study.

In an experiment, one parameter of the population is actively changed to see the effect on the other parameter. For this study to be an experiment, the researchers would have to actively assign different approaches to the farmers. Therefore, this is an _observational study_.

An observational study cannot indicate a causal relationship between parameters. In this case, it's possible that a third factor is present in some of the farms that causes the farmers to give special attention to their cows, _and_ causes the cows to produce more milk. It's not necessarily true that naming a cow will make her produce more milk.

If this was a randomized experiment with active assignment of treatments, we could be confident that other effects would be smoothed out by the randomness, and that the correlation is evidence of causation. However, this is not the case.

The researchers observed farms in Great Britain, which means they obtained valid results for British cows. Since we can't tell whether British farms and cows are representative of _all_ farms or _all_ cows, we can't claim the results are valid for cows in general.

The researchers used an _observational study_.

We can validly conclude that the result suggests that there is a positive correlation between calling British cows by name and the cows’ milk yield.

### Example 5: Experiment

The taste engineers at “Drinksoft” have developed a new formula for their major brand “Cola-Loca.” They wanted to know how tasty it is to teenagers compared to the old formula, so they decided to set up a blind taste test.

They randomly assigned $300$ blindfolded, teenaged participants to two groups. One group was given the old formula of “Cola-Loca,” and the other was given the new formula. Each participant was asked to fill a formal taste 101010-point questionnaire, where 111 is considered "awful" and $10$ is considered "delicious."

**What type of a statistical study did the engineers use?**

Experiment

The engineers found that the mean taste score of the new formula is $4$ points lower than the mean taste score of the old formula. Based on some re-randomization simulations, they concluded that the result is significant and not due to the randomization of the groups.

**What valid conclusions can be made from this result?**

The result suggests that the old formula tastes better among teenagers than the new one.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population while observational studies and experiments _compare_ two parameters of a population.

The study was designed to check the connection between the type of formula and its tastefulness. It's looking for a comparison, so this _isn’t_ a sample study. It could either be an experiment or an observational study.

In an experiment, one parameter of the population is actively changed to see the effect on the other parameter. In this case, the type of formula was actively assigned by the engineers to each participant to assess its effect on the participant's taste sensation. Therefore, this is indeed an _experiment_.

Randomized experiments are designed to suggest causation. In this case, the randomization of the participants between the groups was intended to smooth out any factors others than the formula that may affect the perceived tastefulness.

Therefore, it is valid to conclude that the old formula tastes better among teenagers than the new one. It is also valid to conclude that there's a correlation between the parameters, but correlation is weaker than causation. So if causation can be concluded, it is more suitable than correlation.

Even though experiments suggest causation, it would be too far-reaching to conclude that "the old formula tastes better than the new one," which implies that the old formula would taste better for _anyone_ who tried the two formulas.

Furthermore, the experiment was conducted with teenagers, and it isn't mentioned how many of them were boys. So any valid conclusion that can be drawn from it must apply to teenagers, no more, no less. We can't assume to know how the formulas would taste among teenage boys, or people in general.

The engineers used an _experiment_.

We can validly conclude that the result suggests that the old formula tastes better among teenagers than the new one.

### Example 6: Sample study

The CEO of a major bank wanted to assess how many of the bank’s customers are satisfied, so he decided to conduct a small survey.
At the time of the survey, the bank had several millions of customers across $300$ bank branches of varying sizes. For each of the $300$ branches, he picked one of the branch's customers randomly and surveyed him.

**What type of statistical study did the CEO use?**

Sample study

**Is the study appropriate for the statistical questions it's supposed to answer?**

No, because the randomization of the sampling was flawed.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population while observational studies and experiments _compare_ two parameters of a population.

The study was designed to estimate the percentage of satisfied bank customers. There's no comparison, only an estimation of a single parameter, so this is a _sample study_. We can also conclude that the type of study chosen was certainly appropriate for the question.

While it is true that the study didn't have a treatment and a control group, these aren't necessary for a sample study. A treatment group and control group are a part of a randomized experiment.

The rule of thumb for sample studies is that the size of the sample should be greater than $30$. Therefore, the sample of customers the CEO used was indeed big enough. But what about the randomization process? Is it valid?

A random sample of a population should be completely random, which means every individual in the population has the same probability of getting picked. In this case, each _branch_ has the same probability - they all have exactly one customer that got picked. However, the probability of each _individual customer_ of each branch to get picked changes according to the size of the branch!

Suppose a branch has 1000 customers. Since exactly one of them gets randomly picked, each customer has a probability of $\displaystyle \frac{1}{1000}$ to get picked. Suppose another branch has only $100$ customers, then each customer there has a probability of $\displaystyle \frac{1}{100}$ to get picked. This means the randomization process is flawed.

The CEO used a _sample study_.

The study is _not_ appropriate for the question because the randomization of the sampling was flawed.

### Example 7: Experiment

A pedagogical research institute wanted to test a new method for teaching geometry by assessing whether it helps students achieve higher grades. They decided to conduct a study using $4$ volunteering high schools with a total of $750$ students.

At the beginning of the semester, $2$ out of the $4$ high schools were randomly picked to use the new teaching method, while the remaining $2$ schools kept teaching with traditional methods. The first pair had a total of $400$ students, and the second pair had a total of $350$ students.

By the end of the semester, the researchers compared the average scores in the geometry section of the final exam for each pair of schools.

**What type of statistical study did the researchers use?**

Experiment

**Is the study appropriate for the statistical questions it's supposed to answer?**

No, because the schools could be very different, meaning the randomization won't smooth out other effects.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population, while observational studies and experiments _compare_ two parameters of a population.

The study was designed to check the connection between the teaching method used and the students' grades. It's looking for a comparison, so this _isn’t_ a sample study. It could be either an experiment or an observational study.

In an experiment, one parameter of the population is actively changed to see the effect on the other parameter. In this case, the teaching method was actively assigned by the researchers to each school to assess its effect on the students' grades. Therefore, this is indeed an _experiment_.

Experiments are specifically designed to suggest causation, so we can't say the study is inappropriate due to its type.

Furthermore, while it is true that the groups don't have the same number of students, this isn't crucial for an experiment. The treatment group and the control group in an experiment don't have to be the same size.

The researchers indeed compared the grades in the geometry section only, which is good, but they neglected one important issue. For the randomization to really smooth out any possible factors other than the teaching method, the subjects should be individually randomized between the treatment group and the control group.

In this case, since only the schools themselves were randomized and not the students, the differences between the schools have too much influence on the test grades of their students. What if the schools in one of the groups are private schools with a generous budget and selection tests, while the schools of the other group are low-budget, public schools? Any obtained difference in grades couldn't be attributed to the teaching method alone.

The researchers used an _experiment_.

The study is _not_ appropriate for the question because the schools could be very different, meaning the randomization won't smooth out other effects (_i.e._, it is prone to bias due to flawed randomization).

### Example 8: Sample study

Alma has developed a new kind of antibiotic. For the antibiotic to be sufficiently effective, it has to kill at least $90\%$ of bacteria when applied to a harmful bacteria culture. She applied her antibiotic to a petri dish full of harmful bacteria, waited for it to take effect, and then tried to estimate the percentage of dead bacteria in it.

She took a random sample of $300$ bacteria and found that $94\%$ of them were dead. Then she calculated the margin of error and found that the true percentage of dead bacteria is most likely to be above $90\%$.

**What type of statistical study did Alma use?**

Sample study

**Is the study appropriate for the statistical questions it's supposed to answer?**

No, because the study didn't have a treatment and a control group.

---

The purpose of a sample study is to _estimate_ a certain parameter of a population, while observational studies and experiments _compare_ two parameters of a population.

The study was designed to estimate the percentage of dead bacteria in the petri dish. There's no comparison, only an estimation of a single parameter, so this is a _sample study_.

Alma is trying to figure out how effectively her antibiotic **causes** bacteria to die. To answer a question about a causal relationship, we need to perform an experiment with a treatment group and a control group.

It is also true that Alma can’t know _for certain_ that the true percentage of dead bacteria is above $90\%$, but the results of statistical studies are rarely completely certain. We can assign very high probability to our results, but we can't demand complete certainty.

Alma used a _sample study_.

The study _is not_ appropriate for the question because the study didn't have a treatment and a control group.

---

# [Experiment Design](https://en.wikipedia.org/wiki/Design_of_experiments)

The **design of experiments** (**DOE**, **DOX**, or **experimental design**) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation.

## Principles of experiment design

> [The language of experiments](https://www.khanacademy.org/math/statistics-probability/designing-studies/experiments-stats-library/a/the-language-of-experiments)
> [Principles of experiment design](https://www.khanacademy.org/math/statistics-probability/designing-studies/experiments-stats-library/a/principles-of-experiment-design)

- An **explanatory variable** explains changes in another variable. Karina is curious if a kale-based diet can predict in blood pressure.
- A **response variable** measures the result of a study. Karina is measuring the change in blood pressure at the end of the study.
- A **treatment** is the specific level of the explanatory variable given to individuals in an experiment. If there are multiple explanatory variables, a treatment is a combination of specific levels from each explanatory variable.
- An **experimental unit** is who or what we are assigning to a treatment.

### Example 1: Features

Karina wants to determine if kale consumption has an effect on blood pressure. She recruits $100$ households and randomly assigns each household to either a kale-free diet plan or a kale-based diet. At the end of two months, she plans to record the original and final blood pressures for members of each household.

- Explanatory variable: Kale consumption
- Response variable: Change in blood pressure
- Treatments: The kale-based and kale-free diets
- Experimental units: The households

### Example 2: Control group

A footwear company wants to test the effectiveness of its new insoles designed to prevent shin splints resulting from running. They hire a group of physical trainers and a statistician, who recruits $100$ healthy adults between the ages of $18$ and $24$ to participate in a study.
The statistician randomly assigns $50$ of the adults to follow a weekly running schedule with the new insoles and the other $50$ to the same running schedule with the existing insoles the company already sells. After $10$ weeks, the statistician records the number of runners from each group that have developed shin splints.

**What is the primary purpose of having a group of $50$ runners use the existing insoles?**

They serve as a control group.

---

## Random Assignment

> [Random sampling vs. random assignment (scope of inference)](https://www.khanacademy.org/math/statistics-probability/designing-studies/experiments-stats-library/a/scope-of-inference-random-sampling-assignment)

**Note:** In the real world, we can't ethically take a random sample of people and make them participate in a study involving drugs. However, there are more advanced methods for controlling for this type of selection bias. When we rely on volunteers for testing new drugs and we see significant results, we need to be willing to assume that the volunteers are representative of the larger population. We can also repeat the study on a different group of volunteers to see if we get the same results.

**Key idea:** If a sample isn't randomly selected, it may not be representative of the larger population. On the AP test, be ready to apply this concept and some nuance when it comes to discussing if a sample is representative of the larger population.

<table><thead><tr><th scope="col"></th><th scope="col">Random sampling</th><th scope="col">Not random sampling</th></tr></thead><tbody><tr><td><strong>Random assignment</strong></td><td>Can determine causal relationship in population. <em>This design is relatively rare in the real world.</em></td><td>Can determine causal relationship in that sample only. <em>This design is where most experiments would fit.</em></td></tr><tr><td><strong>No random assignment</strong></td><td>Can detect relationships in population, but cannot determine causality. <em>This design is where many surveys and observational studies would fit.</em></td><td>Can detect relationships in that sample only, but cannot determine causality. <em>This design is where many unscientific surveys and polls would fit.</em></td></tr></tbody></table>

### Example 1: Completely randomized design

Estelle's cornfield is divided into $6$ regions. She wants to know if a new pesticide is as effective as the pesticide that she currently uses. She randomly assigns each of the $6$ regions to either the new or current pesticide for a month, and she'll compare the effectiveness of the pesticides.

**What type of experiment design is this?**

A completely randomized design

### Example 2: Randomized block

A professor wants to study the effectiveness of a new study tool for a course. There are $150$ students registered for the course.
Half of the freshmen are randomly assigned to use the new study tool, and the other half are assigned to use the previous study tool. The same method is used to randomly assign half of the sophomores, half of the juniors, and half of the seniors to each study tool.

**What type of experiment design is this?**

Randomized block

The subjects are first split into groups—freshmen, sophomores, juniors, and seniors—before random assignment. This is called *blocking*. Random assignment is then carried out within each block.

### Example 3: Matched pairs

A group of researchers wants to study the effect of listening to music at different volumes on driving reaction times. They recruit $100$ volunteers.
While music plays in the car at one of two volume levels, each subject encounters a number of signs, some of which require braking, and their reaction times are recorded. The subjects repeat the experiment at the other volume level. The order of the conditions is randomly assigned for each subject.

**What type of experiment design is this?**

Matched pairs

Each subject is receiving *both* treatment conditions—the two volumes of music—in a random order.

### Example 4: Matched pairs

An insurance company wants to study whether offering incentives for preventative care reduces overall health care costs. They select a random sample of $200$ of their customers.

The company ranks the customers according to total health care costs for the previous year. For every $2$ customers, in order, from the list, the insurance company randomly assigns one customer to be offered the incentive and the other to the control group.

**What type of experiment design is this?**

Matched pairs

---

The subjects were first put into pairs with another similar customer, and it was randomly determined which member of each pair was offered the incentive and which one wasn't, so this study used a matched pairs design.

Customers are split into pairs before assignment, matched as nearly as possible on a common trait. Then they are randomly assigned from those pairs into the treatment or control group.

### Example 5: Randomized block

A certain disease is classified into $4$ stages that distinguish how developed the disease is. Researchers studying a new potential treatment recruited over $100$ patients with varying stages of the disease for their study. Half of the patients with stage $1$ were randomly assigned to receive the new treatment, and other half of stage $1$ patients received a placebo. A similar strategy was used for patients in each of the other stages.

**What type of experiment design is this?**

A randomized block design with the stages as the blocks

  
The subjects are first split into groups—stages $1$ thru $4$—before random assignment. This is called _blocking_. Random assignment is then carried out within each block. So each stage is acting as a block.

This is a randomized block design with the stages as the blocks.