I am a software engineer and computational biologist with a strong background algorithm design, machine learning, and genomic data analysis. I am comfortable with low-level performance engineering in languages such as C++ and Rust, and experienced with high-level data analysis and visualization in Python and JavaScript.
- Machine learning for multi-modal data integration (genomic, transcriptomic, and proteomic)
- Protein language models
- Sequence alignment algorithms
- Microbiome and metagenomics
- Flamino. A Flax NNX-based reimplementation of the ESM-2 protein language model.
- POASTA. A new algorithm for Partial Order Alignment (POA), a form of DNA sequence-to-DAG alignment. POA is a common component in pangenome graph construction pipelines, such as the pipelines used to construct the human pangenome reference graph [1, 2]. POASTA outperformed existing tools by 4.1x and enabled alignments not possible before, while retaining the guarantee of optimality. Published in Oxford Bioinformatics (2025).
- Pyfrost. A high-performance and low-memory Python library to construct and analyze compacted, colored de Bruijn graphs (ccDBGs). The ccDBG is a commonly used data structure in de novo genome assemblers. The library includes Python bindings to a fast, memory efficient, and C++-based ccDBG library Bifrost, and provides a NetworkX-like API.
- Tesserae. A recombination-aware DNA sequence aligner that uses a hidden markov model (HMM) to determine the optimal alignment of a query sequence to a panel of potential reference sequences. This is an improved, much faster version of the HMM described in a paper analyzing de novo genetic variants in experimental crossess of the malaria parasite Plasmodium falciparum [1].
- Strain Genome Explorer (StrainGE). A toolkit to detect and characterize low-abundance bacterial strain-level genetic diversity in whole metagenomic data. Published in Genome Biology (2022). I used StrainGE to obtain detailed insights into the gut and bladder E. coli dynamics of women with recurrent UTIs in a year-long study published in Nature Microbiology (2022).
- Zed WDL. An extension for the code editor Zed providing syntax highlighting and code completion support for the Workflow Description Language (WDL).
- dotfiles. My personal configuration files for various tools and editors.
- Reddit /r/place headless pixel placement bot. A CLI tool written in Python that would automatically place pixels on Reddit's /r/place canvas during its 2022 April Fool's event. It obtains instructions from a command and control server, and then logins to Reddit to submit the correct pixel color.
- TLC5940 Raspberry Pi Driver. The TLC5940 is a commonly used LED driver chip that can drive up to 16 LEDs (or more if chaining multiple chips). This C++ library provides a simple interface to control the TLC5940 from a Raspberry Pi.
- OpenGL-powered graph visualization system in VisPy. As part of the Google Summer of Code 2015, I contributed an OpenGL-powered graph visualization system in VisPy: I added OpenGL visuals required for displaying nodes and edges, and implemented several automatic graph layout algorithms. I additionally blogged about how to draw arbitrary shapes using OpenGL points.
- Visualizing the height of the Netherlands. Data preprocessing and visualization scripts to explore the height of the Netherlands. Built with PostgreSQL+PostGIS and d3.js. For the visualization, see my blog post.