Estimate Kinship and FST under Arbitrary Population Structure with popkin
popkin ("population kinship") R package estimates the kinship matrix of individuals and FST from their biallelic genotypes.
Our estimation framework is the first to be practically unbiased under arbitrary population structures.
The stable version of the package is now on CRAN and can be installed using
The current development version can be installed from the GitHub repository using
install.packages("devtools") # if needed library(devtools) install_github('StoreyLab/popkin')
The examples below assume the following R data variables are present for
n individuals and
X, containing only unphased biallelic variants encoded as 0,1,2 counting a given reference allele per locus.
- The length-
subpopsthat assigns each individual to a subpopulation.
subpops vector is not required, but its use is recommended to improve estimation of the baseline kinship value treated as zero.
If your data is in BED format,
popkin will process it efficiently using BEDMatrix.
file is the path to the BED file (excluding .bed extension):
library(BEDMatrix) X <- BEDMatrix(file) # load genotype matrix object
Synopsis of commands
This is a quick overview of every
popkin function, covering estimation and visualization of kinship and FST from a genotype matrix.
First estimate the kinship matrix
Phi from the genotypes
All downstream analysis require
Phi, none use
X after this
library(popkin) Phi <- popkin(X, subpops) # calculate kinship from X and optional subpop labels
Plot the kinship matrix, marking the subpopulations.
inbrDiag replaces the diagonal of
Phi with inbreeding coefficients
plotPopkin( inbrDiag(Phi), labs=subpops )
Extract inbreeding coefficients from
inbr <- inbr(Phi)
w <- weightsSubpops(subpops) # weigh individuals so subpopulations are balanced Fst <- fst(Phi, w) # use kinship matrix and weights to calculate fst Fst <- fst(inbr, w) # estimate more directly from inbreeding vector (same result)
Estimate and visualize the pairwise FST matrix
pwF <- pwfst(Phi) # estimated matrix legTitle <- expression(paste('Pairwise ', F[ST])) # fancy legend label plotPopkin(pwF, labs=subpops, legTitle=legTitle) # NOTE no need for inbrDiag() here!
Rescale the kinship matrix using different subpopulations (implicitly changes the most recent common ancestor population used as reference)
Phi2 <- rescalePopkin(Phi, subpops2)
Please see the popkin vignette for a description of the key parameters and more detailed examples, including complex plots with multiple kinship matrices and multi-level subpopulation labeling.
Ochoa, Alejandro, and John D. Storey. 2016a. "FST And Kinship for Arbitrary Population Structures I: Generalized Definitions." bioRxiv doi:10.1101/083915.
Ochoa, Alejandro, and John D. Storey. 2016b. "FST And Kinship for Arbitrary Population Structures II: Method of Moments Estimators." bioRxiv doi:10.1101/083923.