Skip to content

Zilong-Li/phaseless

Repository files navigation

Imputation and Admixture for lcWGS in one goal

https://github.com/Zilong-Li/phaseless/actions/workflows/linux.yml/badge.svg https://github.com/Zilong-Li/phaseless/actions/workflows/mac.yml/badge.svg

Phaseless is designed for genotype imputation and admixture inference using low coverage sequencing data. Firstly, the imputation model is in the spirit of fastPHASE model but with genotype likelihood as input, and likewise STITCH works on raw reads. Next, the admixture inference is modeled on the haplotype cluster information from the fastphase model.

Table of Content

Build

Assume you have htslib installed in your system path.

git clone https://github.com/Zilong-Li/phaseless
make -j4

If you follow htslib installation guide and have it in your customized path

make HTSLIB=/path/to/your/htslib -j4

Usage

phaseless owns subcommands. please use phaseless -h to check it out.

Imputation

The parallelism of phaseless impute is designed for impute the whole genome at once, which means it run multiple chunks in parallel with each taken over by a thread. Check out the --chunksize option.

phaseless impute -g data/bgl.gz -c 10 -n 4 -s 100000

However, one might only be interested in imputing a single chunk for whatever reason. To change the behavior of parallelism and make it running in parallel for single chunk, we can use --single-chunk option to toggle the behavior.

phaseless impute -g data/bgl.gz -c 10 -n 4 -S

Admixture

With the binary file outputted by the above impute command, we can run admixture inference for different k ancestry.

phaseless admix -b impute.pars.bin -k 3 -n 4

Parameters

Besides, we can investigate and manipulate the parameters from fastPHASE model using the binary file outputted by impute command.

phaseless parse -b impute.pars.bin -c 0 ## single chunk, all samples
phaseless parse -b impute.pars.bin -c -1 -s samples.txt ## all chunks, specifc samples

Plotting

Now, we can do some interesting plotting.

./misc/plot_haplotype_cluster.R

misc/hapfreq.png

Output

Without specifying the output prefix -o, the output filenames of the above commands are as follows:

❯ tree -L 1
.
├── admix.Q
├── admix.log
├── parse.haplike.bin
├── parse.log
├── impute.recomb
├── impute.pi
├── impute.vcf.gz
├── impute.pars.bin
└── impute.log

Changes

check out the news file.