# Compute Principal Component Analysis (PCA) of 1kGP

This notebook outlines the steps to perform a Principal Component Analysis (PCA) on the 1k Genomes Project (1kGP) data. The procedure includes:
- Converting the VCF file into PLINK binary format.
- Conducting PCA on the binary data using PLINK.


In [1]:
# Convert VCF to PLINK binary format
plink --vcf data/1kGP.fingerprinting.vcf.gz --make-bed --out working/1kGP_plink

PLINK v1.90b7 64-bit (16 Jan 2023)             www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to working/1kGP_plink.log.
Options in effect:
  --make-bed
  --out working/1kGP_plink
  --vcf data/1kGP.fingerprinting.vcf.gz

515606 MB RAM detected; reserving 257803 MB for main workspace.
--vcf: working/1kGP_plink-temporary.bed + working/1kGP_plink-temporary.bim +
working/1kGP_plink-temporary.fam written.
8687 variants loaded from .bim file.
3202 people (0 males, 0 females, 3202 ambiguous) loaded from .fam.
Ambiguous sex IDs written to working/1kGP_plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 3202 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping r

In [2]:
# List the generated PLINK binary files
ls working/1kGP_plink*

working/1kGP_plink.bed  working/1kGP_plink.fam  working/1kGP_plink.nosex
working/1kGP_plink.bim  working/1kGP_plink.log


In [3]:
# Perform PCA on the PLINK binary data
plink --bfile working/1kGP_plink --pca 10 --out working/1kGP_pca

PLINK v1.90b7 64-bit (16 Jan 2023)             www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to working/1kGP_pca.log.
Options in effect:
  --bfile working/1kGP_plink
  --out working/1kGP_pca
  --pca 10

515606 MB RAM detected; reserving 257803 MB for main workspace.
8687 variants loaded from .bim file.
3202 people (0 males, 0 females, 3202 ambiguous) loaded from .fam.
Ambiguous sex IDs written to working/1kGP_pca.nosex .
Using up to 191 threads (change this with --threads).
Before main variant filters, 3202 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate is in [0.9999995, 1).
8687 variants and 3202 people pass filters and QC.
Note: No phenotypes present.
Relationship matrix calculation co

In [4]:
# List the generated PCA files
ls working/1kGP_pca*

working/1kGP_pca.eigenval  working/1kGP_pca.log
working/1kGP_pca.eigenvec  working/1kGP_pca.nosex


In [5]:
# Display the top rows of the PCA eigenvec file
head working/1kGP_pca.eigenvec

HG00096 HG00096 -0.00979116 0.0246988 0.0030017 0.0175045 -0.000177201 -0.0215443 -0.00710697 0.0040074 -0.0118802 -0.00355934
HG00097 HG00097 -0.00860299 0.0246735 0.00216549 0.0173607 -0.00553673 0.00307645 -0.0026379 9.36712e-05 0.00774549 0.00669956
HG00099 HG00099 -0.00940134 0.0242093 0.00428509 0.020703 -0.00727684 -0.0173237 -0.00154378 -0.00440629 -0.00473937 0.00448401
HG00100 HG00100 -0.00974342 0.0227575 -0.00062867 0.0176841 -0.00821521 -0.0113509 -0.00653149 0.0004087 -0.00456923 -0.00936117
HG00101 HG00101 -0.00949396 0.0236118 0.00404194 0.0183175 0.000795819 -0.014901 -0.00882091 0.00770109 0.00246361 0.0284314
HG00102 HG00102 -0.0100465 0.0232639 0.00285737 0.0144682 -0.00136625 -0.0188499 0.00368535 -0.0134158 -6.3343e-05 -0.00326615
HG00103 HG00103 -0.00922797 0.024111 0.00111021 0.0167639 0.00386243 -0.0146263 -0.00929536 0.00098754 -0.00662425 0.00630826
HG00105 HG00105 -0.00962561 0.023225 0.00578964 0.0173585 -0.0102911 -0.0126562 -0.00592642 0.0171561 -0.006739

In [6]:
# Display the top rows of the PCA eigenval file
head working/1kGP_pca.eigenval

346.994
151.246
42.584
31.4449
5.55535
5.18856
4.60066
3.37955
3.36628
3.2925
