Skip to content

Extra features

Jakob Russel edited this page Dec 16, 2018 · 3 revisions

Pre-process data

It is advised to filter low abundant features before running any analyses. preDA is a convenient function for doing this. It accepts the same input as testDA. It is grouping all the counts of the filtered features in a single feature named "Others". This is ensures that the total counts per sample is retained, such that filtering doesn't affect non-filtered features after normalization. Filtering can be based on number of samples the features are present in, the total number of reads for the features, the mean relative abundance of the features, or a combination of all three.

data.new <- preDA(data, min.samples = 2, min.reads = 10, min.abundance = 0) 

The above retains features that are at least present in two samples and at least have at total count of 10 across all samples.

Plot p-value histograms for all the non-spiked features

# When mytest is the output from the testDA function:
plot(mytest, p = TRUE)

Post-hoc tests for linear models and ANOVAs

For anova and linear models we can run post-hoc tests for all pairwise comparisons of multi-class predictor/covars.

### Apply TukeyHSD for each feature for a selected variable and output the adjusted p-values:
# Works on "aov", "lao", "lao2", "aoc", "aoa"
results <- DA.aov(data, predictor, allResults = TRUE)
res.tukey <- DA.TukeyHSD(results, variable = "predictor") # variable can also be the name of a covar

### Apply lsmeans for each feature for a selected variable and output the adjusted p-values:
# This requires the lsmeans package.
# Works on "poi", "neb", "lrm", "llm", "llm2", "qpo", "znb", "zpo", "lmc", "lma"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor") # variable can also be the name of a covar

# For paired "lrm", "llm", "llm2", "lma", "lmc" the original predictor variable has to be supplied. For example:
results <- DA.lrm(data, predictor = mypred, paired = SubjectID,  allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor", predictor = mypred)

# and if covars are used, they also need to be supplied for paired "lrm", "llm", "llm2", "lma", "lmc". 
# For example:
results <- DA.lrm(data, predictor = mypred, paired = SubjectID, covars = list(covar1 = mycovar),  allResults = TRUE)
res.lsm <- DA.lsmeans(results, variable = "predictor", predictor = mypred, covars = list(covar1 = mycovar))

Test significance of covars

For linear models the drop1/anova functions can be used to test significance of the predictor and covars variables:

### Apply drop1 for each feature and output the adjusted p-values:
# Works on "zpo", "znb", "qpo", "neb", "poi". Non-paired "lrm", "llm", "llm2", "lma", "lmc"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.drop1 <- DA.drop1(results)

### Apply anova for each feature and output the adjusted p-values:
# Works on "lrm", "llm", "llm2", "lma", "lmc". Non-paired "neb"
results <- DA.lrm(data, predictor, allResults = TRUE)
res.anova <- DA.anova(results)

Adding user-defined methods

"zzz" (DA.zzz) is a placeholder for user-defined methods. You have to supply it with a function whose input is: A count_table (data.frame, samples are columns, rownames indicate features), a predictor (vector), a paired variable (factor), and a covars argument (named list with vectors). It is OK if your function doesn't use paired/covars, but they have to be there in the arguments. The output from the user-defined function should be a data.frame that at least includes: The names of the features ("Feature"), nominal p-values ("pval"), and adjusted p-values ("pval.adj"), and name of method ("Method").

See example below on how to include a simple t-test on relative abundances:

# Define our function
myfun <- function(count_table, predictor, paired, covars){ # These fours arguments should not be altered
  
  # Relative abundance
  rel <- apply(count_table, 2, function(x) x/sum(x))
  
  # t-test function
  ## Wrapping this function in tryCatch(..., error = function(e){NA}) 
  ## ensures that our main function won't fail if t.test fails on some features
  tfun <- function(x){
    tryCatch(t.test(x ~ predictor)$p.value, error = function(e){NA}) 
  }
  
  # P-values for each feature
  pvals <- apply(rel, 1, tfun)
  
  # Collect and return data
  df <- data.frame(Feature = rownames(count_table),
                   pval = pvals)
  df$pval.adj <- p.adjust(df$pval, method = "fdr")
  df$Method <- "My own t-test"
  return(df)
  
}

# Test that it works on real data
res <- DA.zzz(data, predictor, FUN = myfun)

# Use it in the testDA function
mytest <- testDA(data, predictor, tests = c("zzz","ttt","wil"), args = list(zzz = list(FUN = myfun)))

# Several user-defined methods can be included by simply putting numbers after "zzz" in both the "tests" and "args" arguments:
mytest <- testDA(data, predictor, tests = c("zzz","zzz2","ttt","wil"), args = list(zzz = list(FUN = myfun),
                                                                                   zzz2 = list(FUN = myfun2)))

Caution: If "zzz" is in the tests argument, there will be no checks of whether any of the supplied methods are suitable for the data. If e.g. your predictor is quantitative and "ttt" is in the tests argument, it will try to run a t-test and fail miserably.

Passing arguments to the different tests

Additional arguments can be passed to the internal functions with the args argument. It should be structured as a list with elements named by the tests:

E.g. passing to the DA.per function that it should only run 1000 iterations:

mytest <- testDA(...,args = list(per=list(noOfIterations=1000)))

Include that the log t.test should use a pseudocount of 0.1:

mytest <- testDA(...,args = list(per=list(noOfIterations=1000), ltt=list(delta=0.1)))

Additional arguments are simply seperated by commas.

Below is an overview of which functions get the arguments that are passed to a specific test:

  • per - Passed to DA.per
  • bay - Passed to getPriors, getLikelihoods and DA.bay
  • adx - Passed to aldex and DA.adx
  • wil - Passed to wilcox.test and DA.wil
  • ttt - Passed to t.test and DA.ttt
  • ltt - Passed to t.test and DA.ltt
  • tta - Passed to t.test and DA.lta
  • ttc - Passed to t.test and DA.ltc
  • ltt2 - Passed to t.test and DA.ltt2
  • neb - Passed to glm.nb, glmer.nb and DA.neb
  • erq - Passed to calcNormFactors, estimateDisp, glmQLFit, glmQLFTest and DA.erq
  • ere - Passed to calcNormFactors, estimateCommonDisp, estimateTagwiseDisp, exactTest and DA.ere
  • msf - Passed to fitFeatureModel and DA.msf
  • zig - Passed to fitZig and DA.zig
  • ds2 - Passed to DESeq and DA.ds2
  • ds2x - Passed to DESeq and DA.ds2x
  • lim - Passed to eBayes, lmFit and DA.lim
  • lia - Passed to eBayes, lmFit and DA.lia
  • lic - Passed to eBayes, lmFit and DA.lic
  • lli - Passed to eBayes, lmFit and DA.lli
  • lli2 - Passed to eBayes, lmFit and DA.lli2
  • kru - Passed to kruskal.test and DA.kru
  • aov - Passed to aov and DA.aov
  • aoa - Passed to aov and DA.aoa
  • aoc - Passed to aov and DA.aoc
  • lao - Passed to aov and DA.lao
  • lao2 - Passed to aov and DA.lao2
  • lrm - Passed to lm, lme and DA.lrm
  • lma - Passed to lm, lme and DA.lma
  • lmc - Passed to lm, lme and DA.lmc
  • llm - Passed to lm, lme and DA.llm
  • llm2 - Passed to lm, lme and DA.llm2
  • rai - Passed to raida and DA.rai
  • spe - Passed to cor.test and DA.spe
  • pea - Passed to cor.test and DA.pea
  • poi - Passed to glm, glmer and DA.poi
  • qpo - Passed to glm and DA.qpo
  • vli - Passed to voom, eBayes, lmFit and DA.vli
  • zpo - Passed to zeroinfl and DA.zpo
  • znb - Passed to zeroinfl and DA.znb
  • fri - Passed to friedman.test and DA.fri
  • qua - Passed to quade.test and DA.qua
  • sam - Passed to SAMseq and DA.sam
  • mva - Passed to manyglm and summary.manyglm