Skip to content

Illumina/Polaris

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polaris

HiSeq<sup>™</sup> X data

HiSeq<sup>™</sup> X data

Table of Contents

Summary

The Polaris project provides

  • Population sequencing resources on high throughput Illumina sequencing platforms
  • Variant calls from multiple technologies, validated by population genetics and Mendelian methods

Further details of the sequencing resources, input data sources, genotyping methods and validation methods can be found in the project wiki.

Variant calls

Our latest truth set of Structural Variants (SVs) is v2.1. Please check our release-notes/v2.1 for details.

Download the truth set VCF

To download the SV truth set, please do:

Genome version hg38

wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.1-sv-truthset/all_merge.vcf.gz
wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.1-sv-truthset/all_merge.vcf.gz.tbi

Sequencing resources

Population cohorts with unrestricted access sequenced as part of Polaris are available through BaseSpace, the European Nucleotide Archive (ENA), and the Sequence Read Archive.

Additional cohorts are available through the EGA or dbGaP with restricted access subject to approval through a Data Access Committee. No variant calls are ever reported in Polaris for restricted access cohorts.

Further information the sequencing resources described below can be found in the [project wiki][0.3].

HiSeq X PCR-Free Data (Polaris 1)

All HiSeq X PCR-Free data was generated by Illumina Laboratory Services (ILS) with a target whole genome coverage of 30X.

There are currently four unrestricted access cohorts available in Polaris:

  1. Diversity Cohort (BaseSpace, ENA, SRA) — 150 samples selected to represent a diversity of populations
  2. Kids Cohort (BaseSpace, ENA, SRA) — 50 children whose parents were sequenced as part of the Diversity cohort
  3. PGx Cohort (BaseSpace, ENA, SRA) — 70 samples with orthogonally validated genotypes for 28 genes relevant for PGx4
  4. PGx 10X© Cohort (ENA, SRA) — the same 70 samples from the PGx cohort, prepared with the 10X Genomics Chromium Controller and sequenced on the HiSeq 4000

There is also a restricted access repeat expansion cohort available through EGA.

Associated resources

HiSeq 2000 PCR-free

HiSeq X PCR-Free

  • Parents & grandparents
    • ENA — pending
    • BaseSpace — pending
  • Children
    • dbGaP — pending

Pending cohorts

HiSeq X PCR-Free

  • Platinum Genomes pedigree
  • NIST Ashkenazi Jewish trio

10X© Chromium

  • Platinum Genomes Pedigree

NovaSeq 6000 S4 PCR-Free

  • Platinum Genomes pedigree
  • NIST Ashkenazi Jewish trio

Citing Polaris

When citing the repeat expansion cohort, please refer to the Expansion Hunter paper where it was originally described:

Dolzhenko, Egor, et al. "Detection of long repeat expansions from PCR-free whole-genome sequence data." Genome research 27.11 (2017): 1895-1903.

Issues

Please open an issue to provide feedback or ask questions.

References

  1. Eberle, et al (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27:157-164. doi:10.1101/gr.210500.116
  2. English, et al (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics. 16:286 doi:10.1186/s12864-015-1479-3
  3. Kehr, et al (2017) Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet. 49(4):588-593. doi: 10.1038/ng.3801
  4. Pratt, et al (2016) Characterization of 137 Genomic DNA Reference Materials for 28 Pharmacogenetic Genes: A GeT-RM Collaborative Project. J Mol Diagn. 18(1):109-23. doi:10.1016/j.jmoldx.2015.08.005
  5. Sedlazeck, et al (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Method. 15:461-468.