This repository contains code used for validation of IBD caller benchmarking and optimization findings in emprical data sets.
The pipeline is part of a broader project bmibdcaller, which also includes the
following repositories:
- bmibdcaller_simulations for simulations analysis.
- ishare/ibdutils for effienciently comparing two sets of inferred IBD segments.
- tskibd for obtaining true ibd segments from simulated genealogical trees.
There are two main components in the repository, including
- Scripts and notes to pre-process data available from MalariaGEN Pf7.
- Data pre-filtering and downloading, inferring dominant alleles, and imputation. See notes ./input/Readme.md.
- Constructing different datasets. See notes ./datasets/Readme.md.
- The Nextflow pipeline to benchmark the performance of multiple IBD callers before and after IBD caller parameter optimization:
For the emprical analysis pipeline, the software environment and result folder structure of this pipeline are similiar to the bmibdcaller_simulations repository. Details can be found Follow the readme from the simulation repository.
Examples of running the pipeline can be found in