# Admixture Software Demo

In this note, we demonstrate how to calculate ethnic admixture using the **ADMIXTURE** software, which is freely available [here](http://www.genetics.ucla.edu/software/admixture/download.html).

Following commands assume bash shell on MacOS or Linux.

## Input data files

ADMIXTURE expects input files in Plink binary format, Plink ordinary text format, or EIGSTRAT format. 

The example data set distributed from the ADMIXTURE [website](http://www.genetics.ucla.edu/software/admixture/download.html) is Plink binary format. 

In [1]:
;ls -l hapmap3.bed hapmap3.bim hapmap3.fam hapmap3.map

-rw-r--r--@ 1 huazhou  staff  1128171 Jun  4  2010 hapmap3.bed
-rw-r--r--@ 1 huazhou  staff   388672 Jun  4  2010 hapmap3.bim
-rw-r--r--@ 1 huazhou  staff     7136 Jun  4  2010 hapmap3.fam
-rw-r--r--@ 1 huazhou  staff   332960 Jun  4  2010 hapmap3.map


`bed` file contains the genotypes in binary format (not human readable). `bim` file contains SNP information (chromosome, SNP ID, genetic distance, position, allele 1, allele 2). There are 13,928 SNPs.

In [2]:
;head hapmap3.bim

1	rs10458597	0	554484	0	2
1	rs12562034	0	758311	1	2
1	rs2710875	0	967643	1	2
1	rs11260566	0	1168108	1	2
1	rs1312568	0	1375074	1	2
1	rs35154105	0	1588771	0	2
1	rs16824508	0	1789051	1	2
1	rs2678939	0	1990452	1	2
1	rs7553178	0	2194615	1	2
1	rs13376356	0	2396747	1	2


In [3]:
;wc -l hapmap3.bim

   13928 hapmap3.bim


`fam` file contains the sample information. There are 324 individuals.

In [4]:
;head hapmap3.fam

2431 NA19916 0 0 1 -9
2424 NA19835 0 0 2 -9
2469 NA20282 0 0 2 -9
2368 NA19703 0 0 1 -9
2425 NA19901 0 0 2 -9
2427 NA19908 0 0 1 -9
2430 NA19914 0 0 2 -9
2470 NA20287 0 0 2 -9
2436 NA19713 0 0 2 -9
2426 NA19904 0 0 1 -9


In [5]:
;wc -l hapmap3.fam

     324 hapmap3.fam


## Run with K = 3

We suppose the sample individuals are admixed from 3 populations. Let's run ADMIXTURE with K=3 populations.

In [6]:
;./admixture hapmap3.bed 3

****                   ADMIXTURE Version 1.3.0                  ****
****                    Copyright 2008-2015                     ****
****           David Alexander, Suyash Shringarpure,            ****
****                John  Novembre, Ken Lange                   ****
****                                                            ****
****                 Please cite our paper!                     ****
****   Information at www.genetics.ucla.edu/software/admixture  ****

Random seed: 43
Point estimation method: Block relaxation algorithm
Convergence acceleration algorithm: QuasiNewton, 3 secant conditions
Point estimation will terminate when objective function delta < 0.0001
Estimation of standard errors disabled; will compute point estimates only.
Size of G: 324x13928
Performing five EM steps to prime main algorithm
1 (EM) 	Elapsed: 0.228	Loglikelihood: -4.38757e+06	(delta): 2.87325e+06
2 (EM) 	Elapsed: 0.238	Loglikelihood: -4.25681e+06	(delta): 130762
3 (EM) 	Elapsed: 0.228	L

The program finishes the analysis in less than 20 seconds and outputs 2 files: `filename.K.P` and `filename.K.Q`.

In [7]:
;ls -l hapmap3.3.P hapmap3.3.Q

-rw-r--r--  1 huazhou  staff  376056 Sep 19 16:42 hapmap3.3.P
-rw-r--r--  1 huazhou  staff    8748 Sep 19 16:42 hapmap3.3.Q


`hapmap3.3.P` contains the estimated allele 1 frequencies of each SNP in 2 populations.

In [8]:
;head hapmap3.3.P

0.999990 0.999990 0.999990
0.946581 0.934992 0.901852
0.989626 0.382598 0.918612
0.973109 0.682057 0.907595
0.678695 0.918927 0.129153
0.999990 0.999990 0.999990
0.999990 0.990119 0.999990
0.841989 0.203466 0.851233
0.967501 0.860690 0.622157
0.870693 0.862778 0.842376


In [9]:
;wc -l hapmap3.3.P

   13928 hapmap3.3.P


`hapmap3.3.Q` contains the estimated admixture proportions for each individual.

In [10]:
;head hapmap3.3.Q

0.000010 0.896321 0.103669
0.009659 0.830876 0.159465
0.055770 0.725441 0.218790
0.000010 0.866447 0.133543
0.029255 0.888970 0.081775
0.009302 0.859576 0.131122
0.000010 0.715624 0.284366
0.013736 0.810352 0.175913
0.000010 0.727122 0.272868
0.034870 0.821125 0.144004


In [11]:
;wc -l hapmap3.3.Q

     324 hapmap3.3.Q


To use the admixture proportions to adjust for confounding in GWAS, we use the (any) $K-1$ columns from `hapmap3.3.Q` as covariates in regression.

## Multi-threading

To accelerate computing, we can turn on multi-threading using `-j` option.

In [12]:
;./admixture hapmap3.bed 3 -j4

****                   ADMIXTURE Version 1.3.0                  ****
****                    Copyright 2008-2015                     ****
****           David Alexander, Suyash Shringarpure,            ****
****                John  Novembre, Ken Lange                   ****
****                                                            ****
****                 Please cite our paper!                     ****
****   Information at www.genetics.ucla.edu/software/admixture  ****

Parallel execution requested.  Will use 4 threads.
Random seed: 43
Point estimation method: Block relaxation algorithm
Convergence acceleration algorithm: QuasiNewton, 3 secant conditions
Point estimation will terminate when objective function delta < 0.0001
Estimation of standard errors disabled; will compute point estimates only.
Size of G: 324x13928
Performing five EM steps to prime main algorithm
1 (EM) 	Elapsed: 0.231	Loglikelihood: -4.38757e+06	(delta): 2.87325e+06
2 (EM) 	Elapsed: 0.234	Loglikelihood: -4

## Choose K using cross-validation

Use `-cv` option. Read documentation.

## Standard errors

Use `-b` to turn on bootstrap for standard error.