Skip to content
Ethnicity inference for Pediatric Preclinical Testing Consortium (PPTC) patient-derived xenograft models
Branch: master
Clone or download
LauraEgolf Merge pull request #2 from marislab/master
Merge pull request #2 from lauraritenour/master
Latest commit a2b9260 Jul 1, 2019


This repository contains the methods that were used to infer the approximate ethnic backgrounds for 252 patient-derived xenograft (PDX) models from the Pediatric Preclinical Testing Consortium (PPTC) using SNP array genotyping data.

Authors: Laura Egolf, Zalman Vaksman, Jo Lynne Rokita (2018)

Note that genotyping data derived from tumors are not ideal for inferring ethnicity, so these methods and results are meant to serve only as an approximation.

Datasets used:

  • PDX SNP array data (Illumina Final Report files from GenomeStudio): Deposited in
  • HapMap 3 (release 2): Downloaded from


  • Bash script used to convert the Illumina Final Report genotype files to PLINK format, merge these with HapMap 3, and run the PCA
  • plot_pdx_pca.R: R script used to plot the first two PCs and assign an approximate ethnicity to each sample

Supporting files:

  • filelist_InfiniumOmniExpress-24v1-2_A1.txt and filelist_humanomniexpress-24-v1-1-a.txt: List of SNP array data files for the 254 samples (separated by chip type)
  • snps_to_exclude.txt: List of 414 SNPs with problematic allele coding that caused errors in PLINK
  • ethnicity_coordinates_40kSNPs.txt: Coordinates used for assigning samples to general ethnicity groups
  • 2019-02-09-all-hist-colors.txt: Hexadecimal color codes used for plotting samples according to tumor histotype
  • pptc-pdx-clinical-web.txt: Clinical annotation for PDX samples

Software used:

  • R version 3.4.3
  • PLINK 1.07
  • PLINK 1.9

Output files:

  • PCA.plink.eigenval
  • PCA.plink.eigenvec
  • PCA.plink.eigenvec.var
  • inferred_ethnicities_40kSNPs.txt
You can’t perform that action at this time.