Learn interpretable computational phenotyping models from k-merized genomic data
Clone or download
Latest commit 4a9aee3 Aug 10, 2018
Failed to load latest commit information.
core Update setup.py Aug 8, 2018
interfaces Update command_line.py Aug 8, 2018
LICENSE.md Added GPLv3 license Dec 3, 2015
README.md Added preprint Aug 10, 2018
install.sh Updated License Aug 7, 2018




Kover is an out-of-core implementation of rule-based machine learning algorithms that has been tailored for genomic biomarker discovery. It produces highly interpretable models, based on k-mers, that explicitly highlight genotype-to-phenotype associations.


Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potential new ones. An open-source disk-based implementation that is both memory and computationally efficient is included with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.

Drouin, A., Letarte, G., Raymond, F., Marchand, M., Corbeil, J. & Laviolette, F. (2018). Interpretable genotype-to-phenotype classifiers with performance guarantees. Submitted. [Preprint]

Drouin, A., Giguère, S., Déraspe, M., Marchand, M., Tyers, M., Loo, V. G., Bourgault, A. M., Laviolette, F. & Corbeil, J. (2016). Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons. BMC Genomics, 17(1), 754. [PDF]

Video lecture:

The Set Covering Machine implementation in Kover was featured in the following video lecture:

Interpretable Models of Antibiotic Resistance with the Set Covering Machine Algorithm, Google, Cambridge, Massachusetts (February 2017) [ slides ]

Google tech talk


For installation instructions, see: http://aldro61.github.io/kover/doc_installation.html


For tutorials on how to use Kover with your data, see: http://aldro61.github.io/kover/doc_tutorials.html


The documentation can be found at: http://aldro61.github.io/kover/


If you need help using Kover or to report any bug, please use Biostars.