The goal of yarn is to expedite large RNA-seq analyses using a combination of previously developed tools. Yarn is meant to make it easier for the user to perform accurate comparison of conditions by leveraging many Bioconductor tools and various statistical and normalization techniques while accounting for the large heterogeneity and sparsity found in very large RNA-seq experiments.
You can install yarn from github with:
# install.packages("devtools")
devtools::install_github("quackenbushlab/yarn")
This is a basic workflow in terms of code:
- First always remember to have the library loaded.
library(yarn)
- Download the GTEx gene count data as an ExpressionSet object or load the sample skin dataset.
library(yarn)
data(skin)
- Check mis-annotation of gender or other phenotypes using group-specific genes
checkMisAnnotation(skin,"GENDER",controlGenes="Y",legendPosition="topleft")
- Decide what sub-groups should be merged
checkTissuesToMerge(skin,"SMTS","SMTSD")
- Filter lowly expressed genes
skin_filtered = filterLowGenes(skin,"SMTSD")
dim(skin)
dim(skin_filtered)
# Or group specific genes
tmp = filterGenes(skin,labels=c("X","Y","MT"),featureName = "chromosome_name")
# Keep only the sex names
tmp = filterGenes(skin,labels=c("X","Y","MT"),featureName = "chromosome_name",keepOnly=TRUE)
- Normalize in a tissue or group-aware manner
plotDensity(skin_filtered,"SMTSD",main="log2 raw counts")
skin_filtered = normalizeTissueAware(skin_filtered,"SMTSD")
plotDensity(skin_filtered,"SMTSD",normalized=TRUE,main="Normalized")