Skip to content

PacificBiosciences/reference_genomes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reference genomes and annotations for PacBio data

About

While PacBio HiFi human data can be aligned to any reference genome, our tool development has focused on GRCh38. This repository contains a curated set of reference genomes and annotations for use with HiFi data, as well as a table describing whether corresponding annotations (BED files) have been released to support the use of a PacBio developed/compatible tool with a given reference genome. We will continue to update this repository as new reference genomes and annotations become available.

Thanks to Heng Li for his 2017 blog post on the topic of human reference genomes, which was an influence on our early decisions.


Reference genomes

GRCh38 / hg38 GRCh37 / hg19 CHM13 T2T
Name human_GRCh38_no_alt_analysis_set human_hs37d5 human_chm13v2p0_maskedY_rCRS
Download Bundle download tar.gz
download md5
download tar.gz
download md5
download tar.gz
download md5
Use case Choose this reference to take advantage of the full suite of HiFi variant calling tools and resources.

PacBio Tool development is primarily focused on the linked "no_alt analysis set" for GRCh38, and this is equivalent to the hg38 reference that can be downloaded in SMRT Link.
Choose this reference if you are limited to analysis in hg19. Choose this reference if you are interested in variation in regions poorly assembled in GRCh38.
DeepVariant
pbsv
tandem repeat annotations
HiPhase
HiFiCNV
expected CN andexcluded regions
TRGT
repeat definitions
Paraphase
defined regions

(161 regions)

(11 regions)

Annotation types

annotation file description
trf.bed Tandem repeat annotation BED file to increase sensitivity and recall for pbsv (e.g., pbsv discover --tandem-repeats ref.trf.bed).

The repeat definition files are BED files containing coordinates and structure for tandem repeat loci. These are currently only available for GRCh38.

repeat definition file description
pathogenic_repeats.bed 56 loci with known pathogenic expansions
repeat_catalog.bed >170,000 loci with polymorphic repeats
source
adotto_repeat_catalog.bed.gz >900,000 tandem repeat loci
annotation file description
expected_cn.hg38.{XX,XY}.bed By default, HiFiCNV expects each chromosome to have two full copies (e.g. a diploid organism). When reporting variants to the output VCF file, it will only report deviations from this expectation. The expectation can be overridden by providing a BED file with expected copy number values. Examples corresponding to XX/XY karyotypes are provided.
These are only currently available for GRCh38 and hs37d5.
cnv.excluded_regions.bed.gz Regions that are known to cause artifacts during data processing (e.g. centromeres).
This is only currently available for GRCh38 and hs37d5.
cnv.excluded_regions.common_50.bed.gz Regions above, plus regions frequently called as a duplication or deletion in a population of 97 diverse samples from HPRC.
This is only currently available for GRCh38. This is the recommended excluded regions track for human sample analysis.

Change log

release change
2023.12.04 Initial commit and bundle versions.

DISCLAIMER

TO THE GREATEST EXTENT PERMITTED BY APPLICABLE LAW, THIS WEBSITE AND ITS CONTENT, INCLUDING ALL SOFTWARE, SOFTWARE CODE, SITE-RELATED SERVICES, AND DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. ALL WARRANTIES ARE REJECTED AND DISCLAIMED. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THE FOREGOING. PACBIO IS NOT OBLIGATED TO PROVIDE ANY SUPPORT FOR ANY OF THE FOREGOING, AND ANY SUPPORT PACBIO DOES PROVIDE IS SIMILARLY PROVIDED WITHOUT REPRESENTATION OR WARRANTY OF ANY KIND. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A REPRESENTATION OR WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACBIO.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages