Skip to content

auerlab/vcf2hap

Repository files navigation

vcf2hap

Description

vcf2hap is a simple tool for generating a .hap file from a VCF. The .hap file is required by haplohseq.

vcf2hap is extremely fast (about 19 times faster than format_hap.py) and requires a trivial amount of memory regardless of the size of the VCF file.

Design and Implementation

The code is organized following basic object-oriented design principals, but implemented in C to minimize overhead and keep the source code accessible to scientists who don't have time to master the complexities of C++.

Structures are treated as classes, with accessor macros and mutator functions provided, so dependent applications and libraries need not access structure members directly. Since the C language cannot enforce this, it's up to application programmers to exercise self-discipline.

Building and installing

vcf2hap is intended to build cleanly in any POSIX environment. Please don't hesitate to open an issue if you encounter problems on any Unix-like system.

Primary development is done on FreeBSD with clang, but the code is frequently tested on Linux, MacOS, NetBSD, and OpenIndiana as well. MS Windows is not supported, unless using a POSIX environment such as Cygwin or Windows Subsystem for Linux.

The Makefile is designed to be friendly to package managers, such as Debian packages, FreeBSD ports, MacPorts, pkgsrc, etc. End users should install using a package manager. Note that pkgsrc can be used by anyone, on virtually any POSIX operating system, with or without administrator privileges..

I maintain a FreeBSD port and a pkgsrc package, which is sufficient to install cleanly on virtually any POSIX platform. If you would like to see a package in another package manager, please consider creating a package yourself. This will be one of the easiest packages in the collection and hence a good vehicle to learn how to create packages.

For an overview of available package managers, see the Repology website.

Installing vcf2hap on FreeBSD:

FreeBSD is a highly underrated platform for scientific computing, with over 2,000 scientific libraries and applications in the FreeBSD ports collection (of more than 30,000 total), modern clang compiler, fully-integrated ZFS filesystem, and renowned security, performance, and reliability. FreeBSD has a somewhat well-earned reputation for being difficult to set up and manage compared to user-friendly systems like Ubuntu. However, if you're a little bit Unix-savvy, you can very quickly set up a workstation, laptop, or VM using desktop-installer. GhostBSD offers an experience very similar to Ubuntu, but is built on FreeBSD rather than Debian Linux. GhostBSD packages lag behind FreeBSD ports slightly, but this is not generally an issue and there are workarounds.

To install the binary package on FreeBSD:

pkg install vcf2hap

You can just as easily build and install from source. This is useful for FreeBSD ports with special build options, for building with non-portable optimizations such as -march=native, and for work-in-progress ports, for which binary packages are not yet maintained.

cd /usr/ports/biology/vcf2hap && env CFLAGS='-march=native -O2' make install
cd /usr/ports/wip/vcf2hap && make install

Installing via pkgsrc

pkgsrc is a cross-platform package manager that works on any Unix-like platform. It is native to NetBSD and well-supported on Illumos, MacOS, RHEL/CentOS, and many other Linux distributions. Using pkgsrc does not require admin privileges. You can install a pkgsrc tree in any directory to which you have write access and easily install any of the nearly 20,000 packages in the collection. The auto-pkgsrc-setup script will help you install pkgsrc in about 10 minutes. Just download it and run

sh auto-pkgsrc-setup

Then, assuming you selected current packages and the default prefix

source ~/Pkgsrc/pkg/etc/pkgsrc.sh   # Or pkgsrc.csh for csh or tcsh
cd pkgsrc-dir/biology/vcf2hap
sbmake install clean clean-depends

Instructions for packagers

If you would like to add this project to another package manager rather than use FreeBSD ports or pkgsrc, basic manual build instructions for package can be found here. Your contribution is greatly appreciated!