### Learning objectives

1. AlphaSimR  
`AlphaSimR` is a package to simulate breeding populations and tasks. It is not
completely intuitive nor completely well-documented. We will want to use it
ultimately to optimize breeding schemes. As a way to introduce AlphaSimR, this
Rmarkdown script contains code to illustrate the "outbreak of variation" that
occurs when a heterozygous individual is self-fertilized. This phenomenon was
important in supporting the notion that quantitative traits were influenced by
many loci.  Here is an
[introduction](https://cran.r-project.org/web/packages/AlphaSimR/vignettes/intro.html)
showing code to simulate a very simple mass selection program.  Here is a
[deep dive](https://cran.r-project.org/web/packages/AlphaSimR/vignettes/traits.pdf)
into how `AlphaSimR` simulates traits.

2. Prepare a Homework  
Your homework will be to write a script that illustrates the interesting phenomenon of regression toward the mean between parents and progeny that we discussed in class.  

# Ordering of the script  
It's good to have all scripts in the same order with respect to standard tasks  

### Load packages first  
If your script depends on external packages, load them at the beginning. This shows users early on what the script dependencies are.  

In [None]:
#Loading libraries
req_packages<-c("AlphaSimR", "tidyverse")

for(i in c(1:length(req_packages))){
  if (!require(req_packages[i], character.only = TRUE)){
   install.packages(req_packages[i])
  }
}

Notice the conflicts report from loading tidyverse.  Two packages (`dplyr` and `stats`) both have a function called `filter`. Since dplyr was loaded *after* stats, if you use the function `filter`, it will go to the dplyr version.  It is dangerous to rely on what order packages have been loaded to determine which `filter` function you get.  R syntax to prevent ambiguity is to write either `dplyr::filter` or `stats::filter`.  Using that syntax will make your code more reproducible.

### Set random seed  
AlphaSimR generates many random numbers (e.g., to simulate Mendelian random segregation).  If you want the result of the analysis to come out the same each time (there are pros and cons) you need to set the random seed. Note that `workflowr` does this by default. If you are not using that package, then be explicit.

In [None]:
random_seed <- 45678
set.seed(random_seed)

### Script parameters  
If the behavior of your script depends on parameters that you set, initialize them early on.

In [None]:
nFounders <- 100
nChr <- 10 # Number of chromosomes
nSitesPerChr <- 1000 # Number of segregating sites _per chromosome_
nQTLperChr <- 10 # Vary this parameter to get oligo- versus poly- genic traits
nF1s <- 200 # We are going to make F1s to test outbreak of variation
nF2s <- 200 # We are going to make F2s to test outbreak of variation

# Simulating some classical results  
This script uses `AlphaSimR` to simulate the "outbreak of variation" that arises when you self-fertilize a hybrid.  

### AlphaSimR populations  
The basic object of `AlphaSimR` is the population. To make founders, you first make founder haplotypes from a coalescent simulation, then you define simulation parameters that will link their genetic variation to phenotypic variation, then you make a first diploid population from the founder haplotypes.  

In [None]:
# Create haplotypes for founder population of outbred individuals
# Note: default effective population size for runMacs is 100
founderHaps <- AlphaSimR::runMacs(nInd=nFounders, nChr=nChr,
                                  segSites=nSitesPerChr)
# founderHaps <- AlphaSimR::quickHaplo(nInd=nFounders, nChr=nChr,
#                                  segSites=nSitesPerChr)

# New global simulation parameters from founder haplotypes
SP <- AlphaSimR::SimParam$new(founderHaps)
# Additive trait architecture
# By default, the genetic variance will be 1
SP$addTraitA(nQtlPerChr=nQTLperChr)

# Create a new population of founders
founders <- AlphaSimR::newPop(founderHaps, simParam=SP)
str(founders)

### Population information  
The population has `id`s. The `@mother` and `@father` ids are all zero because this population was made from founder haplotypes, and so does not have diploid parents. The genotypic values `gv` of the population are calculated for the trait created using `SP$addTraitA(nQtlPerChr=nQTLperChr)`. Given that there are `r nChr` chromosome`r ifelse(nChr > 1, "s", "")` and `r nQTLperChr` QTL per chromosome, there are `3^(nChr*nQTLperChr)` = `r 3^(nChr*nQTLperChr)` different possible genotypic values. The realized genotypic values are accessible with the function `gv(founders)`

From here, you can treat this population like a named vector using the square braces extraction operator `[ ]`.  Extract individuals by their `@id` or just by their order in the population using an integer index. For example, pick three random individuals from a population and list their ids. Pick the one with the first id in alphabetical order.

In [None]:
test <- founders[c(2, 3, 5, 8, 13)]
testID <- test@id
firstID <- testID[order(testID)][1]
alphaInd <- test[firstID] # Extract individual with the first ID in order
print(testID)
print(alphaInd)
print(alphaInd@id)

### Outbreak of variation  
Emerson and East (1913) showed that if you crossed two inbreds, the hybrid had similar variation to each inbred, but if you then selfed the hybrid, the offspring varied more substantially. This code simulates that result. First, self the founders to homozygosity. The function `self` self-fertilizes individuals from the population. By default, it creates one selfed individual per parent (controllable with the parameter `nProgeny`), so this works nicely for single-seed descent.  

In [None]:
# Self-pollinate to for a few generations
nGenSelf <- 3
inbredPop <- founders
for (gen in 1:nGenSelf){
  inbredPop <- AlphaSimR::self(inbredPop)
}

### Check homozygosity  
Just a sanity check that this has, in fact, created a population of 100 individuals that are appropriately homozygous.  Loci are coded 0, 1, 2. So `qtl == 1` represents the case were a locus is heterozygous. `sum(qtl == 1)` counts those cases.

In [None]:
qtl <- AlphaSimR::pullQtlGeno(inbredPop)
if (nrow(qtl) != nFounders) stop("The number of individuals is unexpected")
if (ncol(qtl) != nChr * nQTLperChr) stop("The number of QTL is unexpected")
fracHet <- sum(qtl == 1) / (nFounders * nChr * nQTLperChr)
cat("Expected fraction heterozygous", 1 / 2^nGenSelf, "\n",
    "Observed fraction heterozygous", fracHet, "+/-",
    round(2*sqrt(fracHet*(1-fracHet)/(nFounders*nChr*nQTLperChr)), 3), "\n")

> What was wrong with my reasoning about the Expected fraction heterozygous?

### Simulate outbreak of variation  
We will  
1. pick a random pair of inbred individuals  
2. cross that pair  
3. find out the variation in genotypic value among the pair's progeny  
4. pick a random F1 progeny  
5. self-fertilize that F1  
6. find out the variation in genotypic value among the F1's progeny  
*It's often useful to write very high-level outlines of what you want your code to do before you write the code*  
We will assume a trait that has a heritability of 0.5 in the base, non-inbred population. An easy way to get that heritability is with genetic and error variances of 1.

### Pick random pair to make F1 hybrid

In [None]:
randomPair <- inbredPop[sample(nFounders, 2)]

### Cross the pair to make population of F1s  
The crossPlan is a **two-column matrix** with as many rows as the number of crosses you want to make, the first column is the `@id` or the index of the seed parent, and likewise for the pollen parent in the second column.  We will make `r nF1s` F1s from this random pair, so the matrix has `r nF1s` rows. You just want to cross individual 1 with individual 2, so each row contains 1:2.

In [None]:
crossPlan <- matrix(rep(1:2, nF1s), ncol=2, byrow=T)
f1_pop <- AlphaSimR::makeCross(randomPair, crossPlan)
head(crossPlan)

### Find out the phenotypic variation among F1s  
When you first make a population, `AlphaSimR` does not assume it has been phenotyped. You can phenotype it using the `setPheno` function.  Note that if you use `setPheno` on the same population a second time, that will overwrite the phenotypes from the first time. The genotypic variance can be retrieved using the function `varG`. Really, `varG` gives all we need to know, but of course that variance is generally not observable in non-simulated reality.

In [None]:
f1_pop <- AlphaSimR::setPheno(f1_pop, varE=1)
cat("Genotypic variance among F1s", round(AlphaSimR::varG(f1_pop), 3), "\n")
cat("Phenotypic variance among F1s", round(AlphaSimR::varP(f1_pop), 3), "\n")
hist(AlphaSimR::pheno(f1_pop), main="Histogram of F1 Phenotypes")

### Pick random F1

In [None]:
randomF1 <- f1_pop[sample(nFounders, 1)]

### Make F2 and observe phenotypic variance  
When you first make a population, `AlphaSimR` does not assume it has been phenotyped. You can phenotype it using the `setPheno` function.  Note that if you use `setPheno` on the same population a second time, that will overwrite the phenotypes from the first time. The genotypic variance can be retrieved using the function `varG`. Really, `varG` gives all we need to know, but of course that variance is generally not observable in non-simulated reality.

In [None]:
f2_pop <- AlphaSimR::self(randomF1, nProgeny=nF2s)
f2_pop <- AlphaSimR::setPheno(f2_pop, varE=1)
cat("Genotypic variance among F2s", round(AlphaSimR::varG(f2_pop), 3), "\n")
cat("Phenotypic variance among F2s", round(AlphaSimR::varP(f2_pop), 3), "\n")
hist(AlphaSimR::pheno(f2_pop), main="Histogram of F2 Phenotypes")

# Homework  
Illustrate regression to the mean between parents and offspring using AlphaSimR  
1. (**1 pt**) You know how to create a population -- create a population of parents
2. (**1 pt**) You know how to get phenotypes from that population  
3. (**2 pts**) You want to randomly mate that population to get progeny that will be (we assume)
regressed to the mean. Create a `crossPlan` matrix like I did to generate the
F1s, except that each row should have randomly-picked parents, rather than 1 and
2 like for the F1s. There is also a command `AlphaSimR::randCross`.  Check out
its documentation. If you use that command, you will have to find the pedigree
of the progeny using the `@mother` and `@father` ids of the progeny population
and match those up to the parent population. If you make the crossPlan, then it
gives you the seed and pollen parent ids.
4. (**1 pt**) Having made the progeny population, phenotype it also  
5. (**2 pts**) Use each row of the `crossPlan` to find the two parents and calculate their phenotypic mean  
6. (**1 pts**) Make a scatterplot of the progeny phenotypes against the parent mean phenotypes  
7. (**2 pts**) What is the regression coefficient? Was there regression towards the mean?

### Homework grading  
  
I will subtract 0.5 points for every day the homework is late  