# Central Limit Theorem as it applies to genetic effects

The Central Limit Theorem states that, with some conditions, when independent random variables are summed, the properly normalized sum tends towards a normal distribution. This has important implications for quantitative genetics. For complex traits, modeling the contributions of individual loci to the observed phenotype can be very complex and in most cases it is not possible; however, the sum of the effects of all loci contributing to a phenotype of interest will tend towards a normal distribution.

Here we implement some simple code to empirically demonstrate the Central Limit Theorem leading to normal distributions of complex genetic effects.

In [None]:
#Set the number of loci to sample
NLoci=30
#Initializing a vector to store effects
effects=rep(0,NLoci)
#First we sample allelic values from n loci using a uniform distribution (i.e. not normal)
for(i in c(1:NLoci)){
  #runif samples a number from a uniform distribution runif(n,min,max)
  effects[i]=runif(1,0,1)
}

## Allele Frequencies

Next we will simulate the allele frequencies for the beneficial allele at each loci. To do this we will treat the beneficial allele as the major allele (highest frequency) and sample the frequency form  a uniform distribution min=.5 and max = .95

In [None]:
# Initialize a vector to contain major allele frequencies
freqs=rep(0,NLoci)
for(i in c(1:NLoci)){
  freqs[i]=runif(1,.5,.95)
}

## Parental Genotypes

Now that we have allele effects and frequencies let's randomly generate to parental genotypes

In [None]:
# Here we simulated the genotypes of two diploid parents
# Rather than storing the genotypes we will store the simulated allele effects
# Initialize 2 column matrices for each parent to store the allele effects
p1=matrix(0,NLoci,2)
p2=matrix(0,NLoci,2)


for(i in c(1:NLoci)){
  #generate uniform random numbers to sample allele effects
  #each set of if/else statements determines if a parent carries the major or minor allele at each loci
  if(runif(1)<=freqs[i]){
     p1[i,1]=effects[i]
  } else {
     #the minor allele effect is set to be -1 x major allele effect (i.e. the heterozygote genotype effect will be zero)
    p1[i,1]=-1*effects[i]
  }
  if(runif(1)<=freqs[i]){
     p1[i,2]=effects[i]
  } else {
    p1[i,2]=-1*effects[i]
  }
   if(runif(1)<=freqs[i]){
     p2[i,1]=effects[i]
  } else {
    p2[i,1]=-1*effects[i]
  }
  if(runif(1)<=freqs[i]){
     p2[i,2]=effects[i]
  } else {
    p2[i,2]=-1*effects[i]
  }
}
print(p1)

# Question
1) What type of distribution are the genotypes sampled from?

# Recombination frequencies

As we know genotypes do not segregate completely independently. To account for the we will sample recombination frequencies for each pair of adjacent loci.

In [None]:
# The recombination frequency is sample as a uniform random variable between 0 and .5
# Initialize a vector to contain the recombination frequencies
recomb=rep(0,NLoci)
recomb[1]=.5
for(i in c(2:NLoci)){
  recomb[i]=runif(1,0,.5)
}

#Simulate Progeny

Now we will simulate progeny of the 2 simulate parents.We will use the simulated parental genotypes and simulated recombination frequencies to do this.

In [None]:
#The number of progeny we will simulate
NProg= 250
# We will use an indicator variable called switch to track recombination for each parent
# The switch variable will alternate between -1 and 1
# Rather than store the genotypes of the progeny will will store the genotypic value by summing the inherited allele effects
# Initialize a vector to store the genotype values of each progeny
prog=rep(0,NProg)
# Initializing the switch indicator variable
switchp1=.5
switchp2=.5
#loop through the progeny
for(i in c(1:NProg)){
  #loop through the loci
  for (j in c(1:NLoci)){
    #simulating recombination in each parent
    if(runif(1)<recomb[j]){
      switchp1=switchp1*-1
    }
    if(runif(1)<recomb[j]){
      switchp2=switchp2*-1
    }
    #setting the column value for the parent genotype matrices
    cp1=switchp1+1.5
    cp2=switchp2+1.5
    #sum allele effects
    prog[i]=prog[i]+p1[j,cp1]+p2[j,cp2]

  }
}



# Question
1) Does the simulation use additive, dominant, epistatic or some combination effects?

## Histogram of progeny genotypic values

In [None]:
hist(prog)