Method described in https://doi.org/10.1101/2022.03.18.484650. Now published in eLIFE.
All analysis notebooks can be found in the analysis directory. The appropriate conda environment is specified at the top of each notebook.
To perform species assignment for a test dataset only a subset of these notebooks is used. The tracking directory contains subdirectories for each of the four independent datasets analysed in the preprint. To perform species assignment for a test dataset, follow the step in the 0_NNoVAE_assignment.ipynb. This notebook contains instructions on running two notebooks in the analysis folder and provides the parameter values which should be used.
Sample information and processed haplotypes can be found in the data directory. ENA accessions for newly published samples are also provided in the data directory, accessions for samples from Madagascar will be added soon.
The pipeline for wgs in silico extractions is coming soon!