msgen - R functions for interfacing with the Microsoft Genomics service on Azure
Colby T. Ford, Ph.D.
Description
The Microsoft Genomics service in Azure can power genome sequencing using a cloud implementation of the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) for secondary analysis. The pipeline can take in multiple FASTQ and BAM files and provides alignment and variant outputs. The msgen
package provides an interface to use the service from within R.
R Package Installation
You can install the latest stable version from GitHub using the following command:
remotes::install_github("colbyford/msgen")
library(msgen)
Usage
Submit a workflow
submit_workflow(subscription_key = "04afabfc...",
region = "eastus",
process = "snapgatk",
reference = "b37m1",
description = "Submission from cford/msgen R package.",
input_storage_account_name = "mygenomicsstorage",
input_storage_account_key= "6GyBAbvgw5sqo2...",
input_container_name = "myinputdata",
blob_name_1 = "chr21_1.fq.gz",
blob_name_2 = "chr21_2.fq.gz",
output_container_name = "myoutputdata")
List all your workflows
list_workflows(subscription_key = "04afabfc...",
region = "eastus")
Check the status of your workflow
get_workflow_status(subscription_key = "04afabfc...",
region = "eastus",
workflow_id = "12g3c5a...")
Cancel a workflow
cancel_workflow(subscription_key = "04afabfc...",
region = "eastus",
workflow_id = "12g3c5a...")
Links
- Medium Blog Post on this Package
- msgen Python 2.7 command-line client
- Microsoft Genomics service on Azure
- Microsoft Genomics Documentation
License
This open source R package/project is licensed under the Apache 2.0 License - see the LICENSE file for details
Note: The Microsoft Genomics service, Azure, and the msgen
Python command-line interface are all Copyright (c) Microsoft Corporation. All rights reserved.