
# Intoduction to Experimental Design

____

# Mixed models recap

- ### Has both fixed $\beta$ and random $u$ effects
  + ### $y = \mathbf{X}\beta + \mathbf{Z}u + e$
  
- ### Solutions for fixed effects are called **BLUEs**
- ### Solution for random effects are called **BLUPs**

  
---

- ## Like $e$ in a fixed effects model, we assume a distribution for random effects $u \sim N(0,\mathbf{G})$

- ## In the simplest case $\mathbf{G} = \sigma_{g}^{2}\mathbf{I}$

- ## Advantages of random effects 
 + ### Providing structures for information sharing
 + ### Proper accounting for covariance among observed phenotypes

# Mixed model history

![](images/henderson.png)

---
# The perfect experiment

### The goal is to compare the height of several different varieties.
- ### Make everything exactly the same between varieties 
 + ### Exactly the same amount of light
 + ### Exactly the same soil
 + ### Monitor fertilizer so every plant has exactly the same nutrients

---
# Key principles
1. ### Randomize
2. ### Replicate
3. ### Control for nuisance variation (blocking)
4. ### Avoid confounding!

# Completely Randomized Design


- ### Treatments are randomly allocated to experimental units
- ### Characterized by number of reps and treatments
- ### This design does not explicitly control for major sources of variation but avoids confounding with nuisance variation sources through randomization 



![](images/CRD.png)


---

# Randomized Complete Block Design


- ### Treatments are randomly allocated to experimental units within blocks

- ### Characterized by blocks (reps) and treatments

- ### Randomization is restricted to within blocks

- ### Blocks are used to control for major sources of variation (benches in a green house, control chambers, locations for field experiments.






![](images/RCBD.png)



---

# Latin Square


- ### Two-way blocking
- ### Number  of rows = number of columns = number of treatments
- ### Created by cycling and then permuting rows and columns
- ### Each treatment occurs in every row and column
- ### The Randomized Complete Block Design controls for one source of variation, Latin Square Controls for 2 sources of variation (row and column effects)



![](images/latin.png)


---

# Benefits of blocking

- ### More homogeneous conditions for comparing treatments
- ### Can ease experimental logistics
- ### General rule: block when you can; randomize when you can't.
---
# Maize RILs Data set





|location |rep |block | plot|RIL    | pollen| silking| ASI| height|
|:--------|:---|:-----|----:|:------|------:|-------:|---:|------:|
|ARC      |1   |4     |   28|RIL-1  |     73|      77|   4|  182.0|
|ARC      |2   |6     |   47|RIL-1  |     74|      79|   5|  169.2|
|ARC      |1   |1     |    6|RIL-11 |     69|      71|   2|  181.6|
|ARC      |2   |2     |    9|RIL-11 |     69|      72|   3|  178.0|
|ARC      |1   |7     |   52|RIL-12 |     73|      74|   1|  192.0|
|ARC      |2   |8     |   60|RIL-12 |     74|      74|   0|  186.0|

---
# CRD Analysis
 
### ANOVA Table

|term      | df|     sumsq|    meansq| statistic| p.value|
|:---------|--:|---------:|---------:|---------:|-------:|
|RIL       | 61| 44725.205| 733.20009|  9.042266|       0|
|Residuals | 62|  5027.324|  81.08588|        NA|      NA|

### Coefficient Estimates

|term        | estimate| std.error|  statistic|   p.value|
|:-----------|--------:|---------:|----------:|---------:|
|(Intercept) |    175.6|  6.367334| 27.5782623| 0.0000000|
|RILRIL-11   |      4.2|  9.004770|  0.4664195| 0.6425500|
|RILRIL-12   |     13.4|  9.004770|  1.4881002| 0.1417933|
|RILRIL-14   |     26.6|  9.004770|  2.9539900| 0.0044283|
|RILRIL-15   |     24.4|  9.004770|  2.7096751| 0.0086975|

---
# RCBD analysis

### ANOVA Table

|term      | df|     sumsq|     meansq| statistic| p.value|
|:---------|--:|---------:|----------:|---------:|-------:|
|RIL       | 61| 44725.205|  733.20009|  15.73300|       0|
|rep       |  1|  2184.561| 2184.56090|  46.87629|       0|
|Residuals | 61|  2842.764|   46.60268|        NA|      NA|

### Coefficient Estimates

|term        | estimate| std.error| statistic|   p.value|
|:-----------|--------:|---------:|---------:|---------:|
|(Intercept) | 179.7973|  4.865919| 36.950329| 0.0000000|
|RILRIL-11   |   4.2000|  6.826616|  0.615239| 0.5406855|
|RILRIL-12   |  13.4000|  6.826616|  1.962905| 0.0542216|
|RILRIL-14   |  26.6000|  6.826616|  3.896514| 0.0002451|
|RILRIL-15   |  24.4000|  6.826616|  3.574245| 0.0006937|



---

---
# Factorial designs
- ### Obtained by crossing different factors in all possible combinations

- ### Useful for studying interactions

- ### In the RIL Experiment we have locations, within location we have reps and within reps we have RILs 

---
# Interactions


|term         |  df|      sumsq|      meansq|  statistic|
|:------------|---:|----------:|-----------:|----------:|
|RIL          |  61| 154937.532|  2539.95954|  39.144816|
|location     |   3|  84931.331| 28310.44374| 436.308966|
|rep:location |   4|   3594.224|   898.55611|  13.848179|
|RIL:location | 183|  20999.404|   114.75084|   1.768493|
|Residuals    | 244|  15832.240|    64.88623|         NA|

---
# Nesting
## Sometimes it does not make sense to have a main effect for a factor.
## Rep 1 in location 1 has nothing to do with rep 1 in location 2. Having a main effect for rep makes no biological sense, therefore we nest rep in location.

|term         |  df|      sumsq|      meansq|  statistic|
|:------------|---:|----------:|-----------:|----------:|
|RIL          |  61| 154937.532|  2539.95954|  39.144816|
|location     |   3|  84931.331| 28310.44374| 436.308966|
|rep:location |   4|   3594.224|   898.55611|  13.848179|
|RIL:location | 183|  20999.404|   114.75084|   1.768493|
|Residuals    | 244|  15832.240|    64.88623|         NA|


![](images/nesting.png)


---

# Split-plot design

- ### Two different experimental units: whole plot and sub plot
- ### Usually done for logistical/practical reasons (it is not feasible to randomize water treatments in the same block in field experiments)




![](images/split-plot.png)


---

# Aside: least square means



- ### Getting the overall effect of a factor averaged over other factors 
- ### Can be obtained using R package `emmeans` for lm objects or `predict.mmer()` for sommer mmer objects.




|RIL    |  lsmean|       SE|  df| lower.CL| upper.CL|
|:------|-------:|--------:|---:|--------:|--------:|
|RIL-1  | 182.100| 2.847943| 244| 176.4903| 187.7097|
|RIL-11 | 182.875| 2.847943| 244| 177.2653| 188.4847|
|RIL-12 | 185.200| 2.847943| 244| 179.5903| 190.8097|
|RIL-14 | 194.250| 2.847943| 244| 188.6403| 199.8597|
|RIL-15 | 195.775| 2.847943| 244| 190.1653| 201.3847|



---

# Limitations of complete blocks

- ### Lots of treatments increases block size, which increases the heterogeneity within blocks.

- ### Same level of replication required for each treatment. May be unfeasible due to limited resources ( e.g. not enough seed for each variety)

- ### All treatments may not fit in the block (growth chamber, greenhouse bench)


---

# Incomplete block designs

- ### Useful when there are many treatments
- ### Can be balanced, meaning treatments occur with one another the same number of times
- ### Characterized by:
 +  ### $t$ = treatments
 + ### $k$ = block size
 + ### $b$ = number of blocks
 + ### $r$ = replication of treatments


---
# Problems with umbalanced data

- ### When data are unbalance the factor are not orthogonal
- ### Order of fitting the terms effects the results
- ### Using traditional `lm()` p-values (calculated using Type I ANOVA) is incorrect
- ### Use LRT instead

---
# Augmented designs

- ### Randomize controls (Checks) according to experimental design
- ### Augment design with unreplicated entries
- ### Replicated controls are used to correct for nuisance parameters and estimate error

---



# Analysis
## The key to modeling any of these experimental designs is to understand the layout of the experiment.
## Simple visualizations of the layout can help identify how factors may be nested, never assume that an expiriment design was implimented correctly.
## Failing to properly nest effects can lead to invalid results and conclusions.

# Power Analysis
## Prior to conducting a major experiment it is preferable to conduct a power analysis to determine optimal designs.
## When possible power analysis and experimental designs should be determined using pilot experiments or previously collected data. Pilot experiements provide information on:
### * Residual variance
### * Factors and nuisance varibbbles that need to be accounted for in the layout of the experiment.

# How to perform power analysis
## First you need to understand the statistical test(s) you will perform on the experimental data.
## * Using information (from a pilot experiment) or your best assumptions on residual variance you can construct the expected distribution of the test statistic.
## * Power is estimated by calculating the probability that a test statistic will exceed the corresponding critical value. This probability will be based on the residual variation, replication, and the magnitude of the effect you want to be able to detect.

# Your final project will be to analyze a pilot experiment and develop an appropriate experimental design.