- PLINK 2.00 software
- NB: This is not PLINK 1.90 nor 1.07. We need the latest version of PLINK 2.0 for analysis.
- R packages: please see details here.
- Pre-computed PCA loadings: see below.
- Reference allele frequencies: see below.
- Imputed genotypes in PLINK compatible format
- Imputed dosages (
.pgen
/.pvar
/.psam
,.vcf
, or.bgen
) are preferred; but hard-called genotypes (.bed
/.bim
/.fam
) are acceptable. - Please follow the following instructions to convert non-PLINK files (
.vcf
or.bgen
) to PLINK binary format.
- Imputed dosages (
- Phenotype file
- Covariate file
- Please use the same files that you used for GWAS.
We provide variant IDs in three formats: 1) chromosome:position:ref:alt
in GRCh37, 2) in GRCh38, and 3) rsid. Please double check variant IDs of your dataset and choose appropriate one from below.
- Pre-compmuted PCA loadings: hgdp_tgp_pca_covid19hgi_snps_loadings.GRCh37.plink.tsv
- Reference allele frequencies: hgdp_tgp_pca_covid19hgi_snps_loadings.GRCh37.plink.afreq
- Pre-compmuted PCA loadings: hgdp_tgp_pca_covid19hgi_snps_loadings.GRCh38.plink.tsv
- Reference allele frequencies: hgdp_tgp_pca_covid19hgi_snps_loadings.GRCh38.plink.afreq
- Pre-compmuted PCA loadings: hgdp_tgp_pca_covid19hgi_snps_loadings.rsid.plink.tsv
- Reference allele frequencies: hgdp_tgp_pca_covid19hgi_snps_loadings.rsid.plink.afreq
For advanced users, we also provide the files in Hail format here: gs://covid19-hg-public/pca_projection/hgdp_tgp_pca_covid19hgi_snps_loadings.ht
.
Please refer to our example script and the Hail documentation for further information.
If you have imputed dosage files split by chromosome, you need to combine them first before using it with plink2 --score
. Please refer to PLINK 2’s documentation for more information. Depending on which file format you have, please use the following commands to 1) extract the relevant set of variants for each chromosome, and 2) merge them in PLINK 2 binary format for downstream processing.
To avoid creating too big dosage files, we first extract a variant list from the pre-computed loadings file for filtering.
cut -f1 [path to the pre-computed loadings file] | tail -n +2 > variants.extract
If variant IDs in your genotype files are of a different format than in the extracted variant list variants.extract
, please make an appropriate variants.extract
with the same ID format as in your genotype files. After following these instructions you will have imported your genotype files to PLINK 2 pfiles -- please then rename the variants in the resulting .pvar
file again to match the ids in the pre-computed loadings file.
For each chromosome file, please run the following extraction command:
plink2 \
--pfile [path to your per-chromosome pfile] \
--extract variants.extract \
--make-pfile \
--out [per-chromosome output name]
First, please collect file names of the filtered per-chromosome pfiles above.
ls [the previous per-chromosome output prefix].*.pgen | sed -e ‘s/.pgen//’ > merge-list.txt
Then, use plink2 --pmerge-list
to merge.
plink2 --pmerge-list merge-list.txt --out [all-chromosome output name]
For each chromosome file, please run the following extraction command:
plink \
--bfile [path to your per-chromosome pfile] \
--extract variants.extract \
--make-bed \
--out [per-chromosome output name]
First, please collect file names of the filtered per-chromosome pfiles above.
ls [outname].*.bed | sed -e ‘s/.bed//’ > merge-list.txt
Then, use plink --merge-list
to merge.
plink --merge-list merge-list.txt --out [all-chromosome output name]
For manipulating .bgen
files, you additionally need to install bgenix and cat-bgen.
For each chromosome file, please run the following extraction command:
bgenix \
-g [path to your per-chromosome bgen] \
-incl-rsids variant.extract \
> [per-chromosome output name].bgen
If your .bgen
files have different variant IDs, please make appropriate list of variant.extract
. You can check variant IDs via bgenix -g [path to bgen] -list
.
cat-bgen \
-g [path to your per-chromosome bgen 1] \
[path to your per-chromosome bgen 2] \
...
[path to your per-chromosome bgen 22] \
-og [all-chromosome outname]
Note that you can also glob all 22 .bgen
files via [prefix].*.bgen
.
Finally, please use the following command to import .bgen
into PLINK 2 pfiles.
plink2 \
--bgen [path to all-chromosome bgen] [REF/ALT mode: see below] \
--make-pfile \
--out [output pfile name]
For [REF/ALT mode]
, please refer to the PLINK 2 documentation. Basically, you can specify the following three options.
- 'ref-first': The first allele for each variant is REF.
- 'ref-last': The last allele for each variant is REF.
- 'ref-unknown': The last allele for each variant is treated as provisional-REF.
You can see whether REF is first/alt by checking bgenix -g [path to bgen] -list
.
For manipulating vcf files, you additionally need to install bcftools.
For each chromosome file, please run the following extraction command:
bcftools view -Oz \
-i “ID = @variants.extract” \
[path to your per-chromosome vcf file] \
> [per-chromosome outname>.vcf.gz]
bcftools concat -Oz [per-chromosome vcf files] > [all-chromosome outname].vcf.gz
Finally, please use the following command to import .vcf
into PLINK 2 pfiles.
plink2 \
--vcf [all-chromosome outname].vcf.gz \
dosage=[dosage field name: see below] \
--make-pfile \
--out [outname]
For the dosage field name, please refer to the following instructions from the PLINK 2 documentation.
To import the GP field (a posterior probability per possible genotype, not phred scaled), add
'dosage=GP'
(or'dosage=GP-force'
, see below). To import Minimac4-style DS+HDS phased dosage, add'dosage=HDS'
.'dosage=DS'
(or anything else for now) causes the named field to be interpreted as a Minimac3-style dosage.
Please use the same phenotype and covariate files that you used for GWAS. We expect FID
and IID
exactly match to those in genotypes. If FID
and IID
columns are not in these files, please remake the files with these columns.