Genomenon CVR BigQuery VCF Annotation Pipeline

Dependencies

The following dependencies are required to annotate your BigQuery variant data with the Mastermind Cited Variants Reference public dataset.

To install these dependencies on a Mac, you can use Homebrew.

These are needed to run Google Cloud queries from the command line:

Google Cloud Platform Billing Account

Google Cloud SDK

brew tap caskroom/cask
brew cask install google-cloud-sdk

Running the Pipeline

If you already have variant data in BigQuery tables, you can skip to step 2.

Setting up BigQuery and importing your data
Annotating your data with the Mastermind CVR
Exporting your annotated data

Commands are based on these instructions.

Use the -h flag to print help text for any of the commands.

All arguments that specify BigQuery tables should include both the dataset and table name, separated by a period. For example, if your dataset is dataset_1 and your table is table_1, the argument would be dataset_1.table_1.

1. Setting up BigQuery and importing your data

Create a new project: ./create-project [project-id] [billing-id]
1. List billing ids: ./list-billing-ids
Set active project: ./set-active-project [project-id].
Create a dataset: ./create-dataset [dataset-name]
Create a bucket: ./create-bucket [bucket-name]
Upload VCF to bucket
1. Local file: ./upload-to-bucket [bucket-name] [path-to-file]
2. From URL: ./upload-url-to-bucket [bucket-name] [URL]
Convert VCF to BigQuery table: ./vcf-to-bq [bucket-name] [bucket-vcf-path] [VCF-table]
1. Wait for task to finish ./watch-task [task-id]

2. Annotating your data with the Mastermind CVR

Set active project:
```
./set-active-project [project-id]
```

Annotate VCF BigQuery table with CVR:

./annotate-vcf [VCF-table] [output-table] [assembly-version] [reference-name-type]

Example:

./annotate-vcf my_dataset.my_table my_dataset.my_annotated_table GRCh37 chr

[VCF-table]: The input dataset table you want to annotate.

To list project datasets: bq ls

Then to list dataset tables: bq ls [dataset]
[output-table]: The output dataset table you want to create with the annotated variants
[assembly-version]: The assembly version your variants are using, either GRCh37 or GRCh38.
[reference-name-type]: The type of data defined in the reference_name of your input dataset table imported from your VCF data. This will be the same as in the original VCF file's #CHROM column, which is one of the following data types:
- contig: For example, NC_000014.9
- chr: For example, 14
- chr_prefix: For example, chr14

You can also list the available Genomenon Mastermind CVR public datasets available from which to annotate your data. These consist of a GRCh37 and GRCh38 version for each date the CVR was released as a BigQuery public dataset:

./list-cvr-tables -v [assembly-version]

3. Exporting your annotated data

This is only needed if you want to export your annotated variant data from BigQuery to an annotated VCF file.

You will need bcftools to run this, which can be installed on Mac with Homebrew:

brew install bcftools

Generate representative header:

bcftools merge [vcf-file] [cvr-file] --print-header -O z -o [header.vcf.gz]

Upload header file to bucket:

./upload-to-bucket [bucket-name] [path-to-header-file]

Convert table to VCF:

./bq-to-vcf [bucket-name] [annotated-table] [header-bucket-path]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomenon CVR BigQuery VCF Annotation Pipeline

Dependencies

Running the Pipeline

1. Setting up BigQuery and importing your data

2. Annotating your data with the Mastermind CVR

3. Exporting your annotated data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
annotate-vcf		annotate-vcf
bq-to-vcf		bq-to-vcf
create-bucket		create-bucket
create-dataset		create-dataset
create-project		create-project
list-billing-ids		list-billing-ids
list-cvr-tables		list-cvr-tables
set-active-project		set-active-project
upload-to-bucket		upload-to-bucket
upload-url-to-bucket		upload-url-to-bucket
vcf-to-bq		vcf-to-bq
watch-task		watch-task

Genomenon/mastermind-cvr-bigquery

Folders and files

Latest commit

History

Repository files navigation

Genomenon CVR BigQuery VCF Annotation Pipeline

Dependencies

Running the Pipeline

1. Setting up BigQuery and importing your data

2. Annotating your data with the Mastermind CVR

3. Exporting your annotated data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages