Skip to content
Data and information about the Polaris study
Branch: master
Clone or download
traxexx Merge pull request #8 from Illumina/Polaris-307
Polaris-307 init v2.1 release
Latest commit 27986bc Feb 27, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
cohorts/polaris-1 update Oct 17, 2018
release-data Polaris-307 init v2.1 release Feb 27, 2019
release-notes Polaris-307 init v2.1 release Feb 27, 2019
.gitignore CASC-264 add v1.5 release data Aug 22, 2018


HiSeq<sup>™</sup> X data

HiSeq<sup>™</sup> X data

Table of Contents


The Polaris project provides

  • Population sequencing resources on high throughput Illumina sequencing platforms
  • Variant calls from multiple technologies, validated by population genetics and Mendelian methods

Further details of the sequencing resources, input data sources, genotyping methods and validation methods can be found in the project wiki.

Variant calls

Our latest truth set of Structural Variants (SVs) is v2.1. Please check our release-notes/v2.1 for details.

Download the truth set VCF

To download the SV truth set, please do:

Genome version hg38


Sequencing resources

Population cohorts with unrestricted access sequenced as part of Polaris are available through BaseSpace, the European Nucleotide Archive (ENA), and the Sequence Read Archive.

Additional cohorts are available through the EGA or dbGaP with restricted access subject to approval through a Data Access Committee. No variant calls are ever reported in Polaris for restricted access cohorts.

Further information the sequencing resources described below can be found in the [project wiki][0.3].

HiSeq X PCR-Free Data (Polaris 1)

All HiSeq X PCR-Free data was generated by Illumina Laboratory Services (ILS) with a target whole genome coverage of 30X.

There are currently four unrestricted access cohorts available in Polaris:

  1. Diversity Cohort (BaseSpace, ENA, SRA) — 150 samples selected to represent a diversity of populations
  2. Kids Cohort (BaseSpace, ENA, SRA) — 50 children whose parents were sequenced as part of the Diversity cohort
  3. PGx Cohort (BaseSpace, ENA, SRA) — 70 samples with orthogonally validated genotypes for 28 genes relevant for PGx4
  4. PGx 10X© Cohort (ENA, SRA) — the same 70 samples from the PGx cohort, prepared with the 10X Genomics Chromium Controller and sequenced on the HiSeq 4000

There is also a restricted access repeat expansion cohort available through EGA.

Associated resources

Platinum Genomes

HiSeq 2000 PCR-free

HiSeq X PCR-Free

  • Parents & grandparents
    • ENA — pending
    • BaseSpace — pending
  • Children
    • dbGaP — pending

Pending cohorts

HiSeq X PCR-Free

  • Platinum Genomes pedigree
  • NIST Ashkenazi Jewish trio

10X© Chromium

  • Platinum Genomes Pedigree

NovaSeq 6000 S4 PCR-Free

  • Platinum Genomes pedigree
  • NIST Ashkenazi Jewish trio


Please open an issue to provide feedback or ask questions.


  1. Eberle, et al (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27:157-164. doi:10.1101/gr.210500.116
  2. English, et al (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics. 16:286 doi:10.1186/s12864-015-1479-3
  3. Kehr, et al (2017) Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet. 49(4):588-593. doi: 10.1038/ng.3801
  4. Pratt, et al (2016) Characterization of 137 Genomic DNA Reference Materials for 28 Pharmacogenetic Genes: A GeT-RM Collaborative Project. J Mol Diagn. 18(1):109-23. doi:10.1016/j.jmoldx.2015.08.005
  5. Sedlazeck, et al (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Method. 15:461-468.
You can’t perform that action at this time.