Releases: microsoft/PhyloD
PhyloD 2.0.0-alpha initial release
PhyloD v2.0.0-alpha
Welcome to the Microsoft Research PhyloD suite of virus adaptation tools.
A website that hosts these tools is at https://phylod.research.microsoft.com. Most users will find it much easier to use the website.
This package contains a suite of tools for analyzing within-host virus adaptation. These tools are primarily designed for identifying and analyzing HLA-mediated escape mutations, though the algorithms can be applied to adaptation from other sources as well. This release includes tools that identify HLA-amino acid associations, that train adaptation models of the form Pr(HIV sequence | HLA), and that evaluate adaptation on new sequences given trained models. These tools are all built around statistical models that account for the phylogenetic-distribution of viral sequences.
For an overview of Escape associations in HIV, see Carlson et al, Trends in Microbiology, 2015.
License
These binaries are licensed under the MIT License. Please read "LICENSE.txt" and THIRD-PARTY-NOTICES.txt before using these tools.
Contact
For questions and comments, please email phylod@microsoft.com, or reach out to Jonathan Carlson. Contact info is available at http://research.microsoft.com/~carlson
Release Notes
This is a major update to the code that was released to CodePlex in 2008. There are too many changes to enumerate, and the API has changed substantially.
This code corresponds to that used our Impact of pre-adapted HIV transmission publication.
Citation
Please cite the papers underlying the individual tools.
- Finding escape associations:
Jonathan M. Carlson, Zabrina Brumme, Christine Rousseau, Chanson Brumme, Philippa Matthews, Carl Kadie, James Mullins, Bruce D. Walker, P. Richard Harrigan, Philip J.R. Goulder, David Heckerman
Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag
PLoS Computational Biology, 4(11):e1000225, November 2008. - Adaptation and Hla Similarity
Jonathan M. Carlson, Victor Y. Du, Nico Pfeifer, Anju Bansal, Vincent Y.F. Tan, Karen Power, Chanson J. Brumme, Anat Kreimer, Charles E. DeZiel, Nicolo Fusi, Malinda Schaefer, Mark A. Brockman, Jill Gilmour, Matt A. Price, William Kilembe, Richard Haubrich, Mina John, Simon Mallal, Roger Shapiro, John Frater, P. Richard Harrigan, Thumbi Ndung’u, Susan Allen, David Heckerman, John Sidney, Todd M. Allen, Philip J.R. Goulder, Zabrina L. Brumme, Eric Hunter, Paul A. Goepfert
Impact of pre-adapted HIV transmission
Nature Medicine, doi:10.1038/nm.4100, May 2016. - Transmission Index
Jonathan M. Carlson_#, Malinda Schaefer_, Daniela C. Monaco, Rebecca Batorsky, Daniel T. Claiborne, Jessica Prince, Martin J. Deymier, Zachary S. Ende, Nichole R. Klatt, Charles E. DeZiel, Tien-Ho Lin, Jian Peng, Aaron M. Seese, Roger Shapiro, John Frater, Thumbi Ndung'u, Jianming Tang, Paul Goepfert, Jill Gilmour, Matt A. Price, William Kilembe, David Heckerman, Philip J. R. Goulder, Todd M. Allen, Susan Allen and Eric Hunter#
Selection bias at the heterosexual HIV-1 transmission bottleneck
Science, 345(6193):1254031, July 2014. - For a review of HIV adaptation and escape, see
Jonathan M. Carlson, Anh Q Le, Aniqa Shahid and Zabrina L Brumme
HIV-1 adaptation to HLA: a window into virus–host immune interactions
Trends in Microbiology, 23(4):212-224, April 2015.
For a partial list of papers that use this software, refer to Jonathan's publication page
INSTALLATION
These tools rely on .NET 4.0. Please insure you have installed the .NET runtime. While there
are cross-platform versions of the .NET runtime, we have only tested the tools on Windows.
These tools rely on Sho, which is licensed separately. To run these tools, you must install Sho:
- Download Sho
- Install Sho to the default directory (will drop the binaries in your program files folders)
- Run
copyShoBinaries.bat
from within the PhyloD directory. This will copy the relevant Sho binaries to the PhyloD directory.
EXAMPLES
Examples of the core programs are in the examples/ folder. These batch files can be modified to point to your data. Note that many of these will take some time to run. Each has a parameter that specifies the number of threads to use. If you have a multicore machine, set these parameters to the number of cores to improve perfomance. Really big jobs should be run on a cluster. Contact us if you want help with this.
- findAssociations.bat Runs the original PhyloD code for identifying HLA-amino acid associations. Modify the bat file to optimize for the number cores on your machine and to choose between the long and short example.
- computeHlaAdaptation.bat Computes adaptation scores given a pre-trained model.
- maximizeMultistateLikelihood.bat Trains a new adaptation model from the associations identified in findAssocaitions.bat
- trainAdaptationModel.bat does an end-to-end run of training a new adaptation model, then computing adaptation for the example sequences using the model you trained.
- computeTransmissionIndex.bat computes transmission index.
DOCUMENTATION OVERVIEW
This is going to be sparse. Email us if you have more questions...
All tools are executed using PhyloShell.exe. We've built the system using .NET reflection in such a way that PhyloShell.exe
will scan all the DLLs for "executable" classes, which are those implementing IRunnable
. The upshot is that you "execute" these class using syntax reminiscent of function calls. This syntax is recursive, so that arguments can themselves be complex types.
The basic syntax is
PhyloShell.exe TOOLNAME(Key1:Value1[,Key2:Value2,...])
Keys are argument names (these can be omitted if the argument is required). Values specify the type, and can themselves be complex.
Here's an example, that executes the ComputeAdaptation example (see examples\computeHlaAdaptation.bat
for the breakdown):
phyloshell.exe ComputeHlaAdaptation(model:subtypeC,PidToHlas:rousseau.hla.completed.txt,SequenceDB:SequenceDataBank(InputDir:.,SequenceFileFormatString:rousseau.{0}.fasta),mustHaveAllProteins:false,alignToReference:false,proteins:(gag,nef),OutAdaptationFormat:adaptationResults\Example{0}.txt,distribute:MultithreadedTasks(parallelOptions:4))
Typically these are best constructed using a bat script, as in the examples. Let's take this apart:
phyloshell.exe ComputeHlaAdaptation(^
model:subtypeC,^ # Specifics a simple string value to the model parameter
PidToHlas:rousseau.hla.completed.txt,^ # Specifies an HLA file to an argument that is parsed by an HlaFileParser
SequenceDB:SequenceDataBank(InputDir:.,SequenceFileFormatString:rousseau.{0}.fasta),^ # Takes something of type SequenceDataBank, which itself has arguments InputDir and SequencefileFormatString
mustHaveAllProteins:false,^ # Simple boolean input
alignToReference:false,^ # Simple boolean input
proteins:(gag,nef),^ # A collection (list, HashSet, etc). Note the comma-delimited list inside parentheses. If we wanted only gag, we could use proteins:(gag) or proteins:gag
OutAdaptationFormat:adaptationResults\Example{0}.txt,^ # A simple string. This one happens to be a format string
distribute:MultithreadedTasks(parallelOptions:4)) # An argument of type IDistributor. The specific instance we're creating is MultithreadedTasks, which takes an argument parallelOptions. If we wanted to send to the cluster, I'd us OnHpc(...)
The best way to use these tools is going to be to modify the example batch files, but hopefully this gives some explanation for the syntax.
Getting Help
There is two types of help: (1) What are the arguments for a type? (2) What subtypes can I specify?
To get the arguments, use "help" as an argument. For example:
>PhyloShell.exe ComputeHlaAdaptation(help)
Help for parsing type ComputeHlaAdaptation
USAGE: computeHlaadaptation([OPTIONS],PidToHlas)
Options are specified using command-delimited name:value pairs.
Use 'listsubtypes' as the value for complex options for a list of implementing types. Use 'help' to list arguments.
Required arguments can be named like optionals.
[No documentation]
REQUIRED:
PidToHlas: <HlaFileParser>
The HLAs. These must be at 4-digit resolution. Those at 2-digit resolution will be ignored. HLA files are of the form (tab delimited)
subjectID, A1, A2, B1, B2, C1, C2
The output from HlaCompletion is also accepted (includes two additional fields for completion data).
OPTIONS:
JobName: <String> default: ComputeHlaAdaptation
[No documentation]
Distribute: <IDistribute> default: Locally
[No documentation]
CopyLocal: <bool> default: False
[No documentation]
PidFile: <InputFile> default: null
File containing either a list of Pids, with no header. If null, will take all pids from the union of sequence files. Special case: if header is Pid [tab] TestGroup, then will parse as a list of Key-Value pairs of pid and CV test group number.
HlaForPositionalAdaptationScor...