findNeighbour2 is a server application for investigating bacterial relatedness using reference-mapped data. Accessible via RESTful webservices, findNeighbour2 maintains a sparse distance matrix in a database for a sequence collection. A maximum storable distance (e.g. 20 or 50 SNP) needs to be supplied. Distances higher than this are not cached.
findNeighbour2 supports incremental addition of samples, and, for a given sample, allows queries identifying similar sequences with millisecond response times.
The inputs to the service are strings containing DNA sequence information, typically generated by mapping and basecalling, followed by storage in FASTA or other formats. The service can be queried with strings containing DNA sequence information and a single nucleotide polymorphism threshold; it returns a list of similar samples. The software is designed for, has been extensively tested with, mapped data from bacterial genome sequencing.
It was produced as part of the Modernising Medical Microbiology initiative, who use it, as do Public Health England.
findNeighbour2 is written entirely in python and has three major components:
- webservice-server-rest, which is a Flask application implementing restful endpoints. This calls the xmlrpc endpoints transparently.
- webservice-server, which is an xmlrpc server.
- seqComparer, a class which implements reference-based compression, in-memory storage of a compressed representation of the sequence, fast sequence comparisons, and disc-based persistence.
How to test it
Endpoints in detail in brief
A publication describing this work is in BMC Bioinformatics: BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness DOI : 10.1186/s12859-017-1907-2 (https://dx.doi.org/10.1186/s12859-017-1907-2)
Test data sets of N. meningitidis, M. tuberculosis and S. enterica data are available to download here. These are .tar.gz files, to a total of 80GB.
During development, findNeighbour2 was referred to as ElephantWalk2, and you may find references to ElephantWalk2 or EW2 in the code base.