forked from Intel-HLS/GenomicsSampleAPIs
-
Notifications
You must be signed in to change notification settings - Fork 0
VCF to TileDB Import Usage
Nalini Ganapati edited this page Aug 3, 2018
·
1 revision
The VCF import process sorts, indexes, and compresses input VCFs, registers the appropriate items with MetaDB, produces the required configs for proper loading into TileDB, and (optionally) uses these configs to load the CSV into a specified TileDB array. A user can choose to either:
- Perform a complete create and load process using
vcf2tile.pyto populate metadb, generate the required configs, and load Genomics DB. - Use
vcf2tile.pyto populate metadb and generate the required configs, but load Genomics DB using Genomics DB commands - see Loading Genomics DB section for details.
- Create a TileDB Workspace if one doesn't already exist. Note that there can be multiple arrays in a given Workspace. Assuming GenomicsDB/bin is in your path (
export PATH=/path/to/GenomicsDB/bin:$PATH) run:create_tiledb_workspace /path/to/desired/Workspace. The workspace cannot already exist. - Edit
utils/example_configs/load_to_tile.cfgto reflect the correct paths, as well as optional preferred settings. - Edit the tile loader json (as identified in
load_to_tile.cfg) to specify the workspace and array name, as well as any desired optional settings. See GenomicsDB documetation for more information on these fields. - Specify a vcf config file as seen in the example, ie.
store/utils/example_configs/vcf_import.config. More information here. - Make sure the workspace and array names are consist across all config files.
- Inside the virtual environment,
cd utils -
python vcf2tile.py -hto get the usage help. - Run the vcf2tile script from inside the utils directory of the store repo, as follows:
python vcf2tile.py \
-c <path to project config file> \
-d <desired location to write output> \
-i <relative path to single or list of VCF files to be imported> \
-l <loader config to load data into tiledb (`example_configs/load_to_tile.cfg`)> Follow instructions under option #1 above but run the script without the -l option, and follow instructions under Loading Genomics DB
- Variant Store
- Python API
- Utils
- MAF to TileDB Import
- VCF to TileDB Import
- Additional Info