Andersen Lab Linkage Mapping Pipeline
This package includes all data and functions necessary to complete a mapping for the phenotype of your choice using the recombinant inbred lines from Andersen, et al. 2015 (G3). Included with this package are the cross and map objects for this strain set as well a
markers.rds file containing a lookup table for the physical positions of all markers used for mapping.
This package presents a simple workflow for massively parallel linkage mapping experiments. As input, the functions take a data frame in "long" format with the following columns (column names specified below).
The strain name for each individual row.
The name of the condition to which that strain was exposed during the particular experiment. For example
etoposide. This column is optional, but plot names will be affected by an absence of a
condition column (e.g. ".mean.TOF" vs "amsacrine.mean.TOF").
The trait being measured. For example
The measured quantitative value for the phenotype. For example
You will first need a cross object to complete the mapping. Included with this package are several cross objects to choose from:
AF16xHK104cross- AF16/HK104 (C. briggsae)
N2xCB4856cross_full- N2/CB4856 QX and ECA strains with SNPs from whole-genome sequencing
Cross objects are either included or are downloaded from the internet. In order to access the objects in a mutable fashion, they should be read in, then assigned to another variable name as below:
load_cross_obj("N2xCB4856cross_full") cross <- N2xCB4856cross_full
You must have
wget installed to download cross objects. You can use homebrew to install wget. Use:
brew install wget
Merging the Phenotype Data Into the Cross Object
Phenotype data in the format outlined above can be merged with the cross object using the mergepheno function:
# Assuming the phenotype data is in a data frame named "pheno" phenocross <- mergepheno(cross, pheno)
As of this writing (7/24/2015) the genotypes for the first two sets of RIAILs are unclear and these strains do not contribute to the mapping. Consequently, it is advised to subset to only the second set of RIAILs prior to mapping when using the N2/CB4856 cross object, as below:
# Assuming the phenotype data is in a data frame named "pheno" phenocross <- mergepheno(cross, pheno, set = 2)
As of 4/10/2017 the most recent cross object, N2xCB4856cross_full, contains SNPs from whole-genome sequence data from three sets: QX1-239, QX240-598, and ECA1-667. Select the set of strains applicable for your mapping
Completing the Mapping
Mapping is completed by the
fsearch function. This function performs a mapping with forward search and returns a long-format data frame of all LOD scores across all traits in the given phenotype data.
In order to calculate significance cutoffs, the function needs to permute the phenotypes and remap a number of times to get either false discovery rate or genome-wide error rate. This step can take a while with 1000 iterations (the default value), so it is suggested that the number of permutations for testing large datasets locally be reduced. This will reduce the accuracy of the mappings, but will expedite exploratory analyses.
fsearch(cross, permutations = 10)
If your computer has an NVIDEA graphics card and you have installed the gputools package, you can alternatively enable
doGPU which will massively expedite the mapping process and allow you to complete the recommended 1000 permutations.
fsearch(cross, permutations = 1000, doGPU = TRUE)
If you only need to map a subset of a much larger trait set, you can set a value to the
phenotype parameter. This parameter and the subsequent trait selection follow the rules outlined by the
contains functions from the
fsearch(cross, phenotype = "bleomycin", permutations = 1000, doGPU = TRUE)
Set the markerset to be used for marker conversion at the end of mapping. If you use the N2xCB4856cross_full cross object, set the markerset to NA. Default is "N2xCB4856", other options are "N2xLSJ2" or "AF16xHK104".
fsearch(cross, phenotype = "bleomycin", permutations = 1000, doGPU = TRUE, markerset = NA)
Choosing a Thresholding Method
Without overwhelming you with the details, you should generally choose false discovery rate (FDR) as your thresholding method if you are mapping more than 100 traits. Otherwise use genome-wide error rate (GWER).
Traits > 100
fsearch(cross, threshold = "FDR")
Traits < 100
fsearch(cross, phenotype = "bleomycin", threshold = "GWER")
Annotating LOD scores
The final step in completing a mapping is to annotate all peak LOD scores with confidence interval bounds, effect size, and variance explained. This can be completed as below:
Below is a complete example of a mapping script run start to finish
# Install and load the package devtools::install_github("AndersenLab/linkagemapping") library("linkagemapping") # Get the cross object data("N2xCB4856cross") cross <- N2xCB4856cross # Get the phenotype data pheno <- readRDS("~/Dropbox/AndersenLab/LabFolders/PastMembers/Tyler/ForTrip/RIAILs2_processed.rds") # Merge the cross object and the phenotype data cross <- mergepheno(cross, pheno) # Perform a mapping with only 10 iterations of the phenotype data for FDR calc map <- fsearch(cross, permutations = 10) # Annotate the LOD scores annotatedlods <- annotate_lods(map, cross)
In order too keep marker positions up to date, it will be necessary to use Dan's liftover utilities occasionally. For the marker positions to be updated, the markers data files in the
/data subdirectory of the package must be updated as well as the internal markers files in
R/sysdata.rda. Use the
use_data function from the devtools package in order to update both the user accessible as well as the internal data sets. Additionally, make sure to update the documentation (in
R/datasets.R) to reflect the new genome build from which the marker positions are taken.