PyIGD is a Python-only parser the Indexable Genotype Data (IGD) format.
For a C++ library that supports creating and parsing IGD, see picovcf (which also supports VCF -> IGD conversion).
You can install the latest release of PyIGD from pypi, via pip install pyigd
.
For development, you can clone the code install it directly from the directory (this will automatically reflect any code changes you make):
pip install -e pyigd/
or build and install via the wheel:
cd pyigd/ && python setup.py bdist_wheel
pip install --force-reinstall dist/*.whl
The pyigd.IGDReader
class reads IGD data from a buffer. See the example script that loads an IGD file, prints out some meta-data, and then iterates the genotype data for all variants. Generally the usage pattern is:
with open(filename, "rb") as f:
igd_reader = pyigd.IGDReader(f)
There is also the pyigd.IGDWriter
class to construct IGD files. Related is pyigd.IGDTransformer
, which is a way to create a copy of an IGD while modifying its contents. See the IGDTransformer sample list example and bitvector example.
IGD can be highly performant for a few reasons:
- It stores sparse data sparsely. Low-frequency variants are stored as sample lists. Medium/high frequency variants are stored as bit vectors.
- It is indexable (you can jump directly to data for the
ith
variant). Since the index is stored in its own section of the file, scanning the index is extremely fast. So only looking at variants for a particular range of the genome is very fast (in this case you would usepyigd.IGDFile.get_position_and_flags()
to find the first variant index within the range, and then usepyigd.IGDFile.get_samples()
after that). - The genotype data is stored in one of two very simple binary formats. This makes parsing fast, and the compact nature of the file makes reading from disk/memory fast as well.
- Clone picovcf and follow the instructions in its README to build the tools for that library.
- If you want to be able to convert
.vcf.gz
(compressed VCF) to IGD, make sure you build with-DENABLE_VCF_GZ=ON
- If you want to be able to convert
- One of the built tools will be
igdtools
, which can converts from VCF to IGD, among other things (such as filtering IGD files). - Do one of the following:
- If your project is C++, copy picovcf.hpp into your project,
#include
it somewhere and then use according to the documentation - If your project is Python, clone pyigd and install it per the README instructions.
- If your project is C++, copy picovcf.hpp into your project,