v0.3.2 (March 23, 2021)
The wgsHLAfiltR R package extracts reads that map to classical HLA loci (HLA-A, -C, -B, -DRB1, -DRB3/4/5, -DQA1, -DQB1, -DPA1 and -DPB1) from paired or individual whole-genome or whole-exome sequencing (WGS/WES) fastg.gz files, and writes a new set of fastq.gz files that contain reads that map to the classical HLA loci.
The package was developed for use with the HLA|COVID-19 Database's Omixon CLI Explore Genotyping Portal, with the aim of minimizing FASTQ upload time and read processing time.
Currently, wgsHLAfiltR (version 0.3.2) only functions on Unix, Linux and macOS systems.
wgsHLAfiltR requires R v4.0.0 or higher to run.
Bowtie 2 (versions 2.3 - 2.4) must be installed on the system running R in order for wgsHLAfiltR to function.
To install the wgsHLAfiltR package in the R environment, first install the devtools package using the command install.packages("devtools") in the R console.
> install.packages("devtools")
With devtools installed, install wgsHLAfiltR using the command devtools::install_github(repo="COVID-HLA/wgsHLAfiltR/wgsHLAfiltRpackage",ref="main") in the R console.
> devtools::install_github(repo="COVID-HLA/wgsHLAfiltR/wgsHLAfiltRpackage",ref="main")
Please note that the package includes > 130MB of reference alignment data, which may result in longer than expected installation times.
While the package includes several functions, filterHLA() is the main function for filtering HLA reads. This function takes two arguments, inputDirectory, which identifies the path to the directory containing the WGS/WES FASTQ files, and outputDirectory, which specifies the directory into which the HLA-only FASTQ files should be written.
> readFilteringData <- filterHLA(inputDirectory=inputDir,outputDirectory=outputDir)
A value for inputDirectory is required, but a value for outputDirectory is optional. If outputDirectory is not specified, HLA-only FASTQ files will be written into a directory named "Results" in the R working directory. If the "Results" directory is not present in the R working directory, one will be created.
Note that Bowtie 2 requires file path and file names that do not include whitespaces. See the Bowtie 2 Manual for additional details.
The filterHLA() function extracts a "name" from each FASTQ file based on the position of the first underscore in the FASTQ filename.
For example, if a pair of fastq.gz files are named "ABC123-45DE_67FG_R1.fastq.gz" and "ABC123-45DE_67FG_R2.fastq.gz", filterHLA() will identify the name for that pair of files as "ABC123-45DE", and the resulting HLA-only fastq.gz files generated will be named "ABC123-45DE_HLA_1.fastq.gz" and "ABC123-45DE_HLA_2.fastq.gz".
The filterHLA() function returns a list object of named list elements for each FASTQ "name" that was processed. Each named list identifies parameters and file paths used in the read extraction process.
For additional information about the wgsHLAfiltR package, the HLA|COVID-19 Database or the COVID-19|HLA & Immunogenetics Consortium, email covid.hla@gmail.com.