Skip to content

Releases: microsoft/PhyloD

PhyloD 2.0.0-alpha initial release

16 May 00:45
Compare
Choose a tag to compare
Pre-release

PhyloD v2.0.0-alpha

Welcome to the Microsoft Research PhyloD suite of virus adaptation tools.

A website that hosts these tools is at https://phylod.research.microsoft.com. Most users will find it much easier to use the website.

This package contains a suite of tools for analyzing within-host virus adaptation. These tools are primarily designed for identifying and analyzing HLA-mediated escape mutations, though the algorithms can be applied to adaptation from other sources as well. This release includes tools that identify HLA-amino acid associations, that train adaptation models of the form Pr(HIV sequence | HLA), and that evaluate adaptation on new sequences given trained models. These tools are all built around statistical models that account for the phylogenetic-distribution of viral sequences.

For an overview of Escape associations in HIV, see Carlson et al, Trends in Microbiology, 2015.

License

These binaries are licensed under the MIT License. Please read "LICENSE.txt" and THIRD-PARTY-NOTICES.txt before using these tools.

Contact

For questions and comments, please email phylod@microsoft.com, or reach out to Jonathan Carlson. Contact info is available at http://research.microsoft.com/~carlson

Release Notes

This is a major update to the code that was released to CodePlex in 2008. There are too many changes to enumerate, and the API has changed substantially.

This code corresponds to that used our Impact of pre-adapted HIV transmission publication.

Citation

Please cite the papers underlying the individual tools.

  • Finding escape associations:
    Jonathan M. Carlson, Zabrina Brumme, Christine Rousseau, Chanson Brumme, Philippa Matthews, Carl Kadie, James Mullins, Bruce D. Walker, P. Richard Harrigan, Philip J.R. Goulder, David Heckerman
    Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag
    PLoS Computational Biology, 4(11):e1000225, November 2008.
  • Adaptation and Hla Similarity
    Jonathan M. Carlson, Victor Y. Du, Nico Pfeifer, Anju Bansal, Vincent Y.F. Tan, Karen Power, Chanson J. Brumme, Anat Kreimer, Charles E. DeZiel, Nicolo Fusi, Malinda Schaefer, Mark A. Brockman, Jill Gilmour, Matt A. Price, William Kilembe, Richard Haubrich, Mina John, Simon Mallal, Roger Shapiro, John Frater, P. Richard Harrigan, Thumbi Ndung’u, Susan Allen, David Heckerman, John Sidney, Todd M. Allen, Philip J.R. Goulder, Zabrina L. Brumme, Eric Hunter, Paul A. Goepfert
    Impact of pre-adapted HIV transmission
    Nature Medicine, doi:10.1038/nm.4100, May 2016.
  • Transmission Index
    Jonathan M. Carlson_#, Malinda Schaefer_, Daniela C. Monaco, Rebecca Batorsky, Daniel T. Claiborne, Jessica Prince, Martin J. Deymier, Zachary S. Ende, Nichole R. Klatt, Charles E. DeZiel, Tien-Ho Lin, Jian Peng, Aaron M. Seese, Roger Shapiro, John Frater, Thumbi Ndung'u, Jianming Tang, Paul Goepfert, Jill Gilmour, Matt A. Price, William Kilembe, David Heckerman, Philip J. R. Goulder, Todd M. Allen, Susan Allen and Eric Hunter#
    Selection bias at the heterosexual HIV-1 transmission bottleneck
    Science, 345(6193):1254031, July 2014.
  • For a review of HIV adaptation and escape, see
    Jonathan M. Carlson, Anh Q Le, Aniqa Shahid and Zabrina L Brumme
    HIV-1 adaptation to HLA: a window into virus–host immune interactions
    Trends in Microbiology, 23(4):212-224, April 2015.

For a partial list of papers that use this software, refer to Jonathan's publication page

INSTALLATION

These tools rely on .NET 4.0. Please insure you have installed the .NET runtime. While there
are cross-platform versions of the .NET runtime, we have only tested the tools on Windows.

These tools rely on Sho, which is licensed separately. To run these tools, you must install Sho:

  1. Download Sho
  2. Install Sho to the default directory (will drop the binaries in your program files folders)
  3. Run copyShoBinaries.bat from within the PhyloD directory. This will copy the relevant Sho binaries to the PhyloD directory.

EXAMPLES

Examples of the core programs are in the examples/ folder. These batch files can be modified to point to your data. Note that many of these will take some time to run. Each has a parameter that specifies the number of threads to use. If you have a multicore machine, set these parameters to the number of cores to improve perfomance. Really big jobs should be run on a cluster. Contact us if you want help with this.

  • findAssociations.bat Runs the original PhyloD code for identifying HLA-amino acid associations. Modify the bat file to optimize for the number cores on your machine and to choose between the long and short example.
  • computeHlaAdaptation.bat Computes adaptation scores given a pre-trained model.
  • maximizeMultistateLikelihood.bat Trains a new adaptation model from the associations identified in findAssocaitions.bat
  • trainAdaptationModel.bat does an end-to-end run of training a new adaptation model, then computing adaptation for the example sequences using the model you trained.
  • computeTransmissionIndex.bat computes transmission index.

DOCUMENTATION OVERVIEW

This is going to be sparse. Email us if you have more questions...

All tools are executed using PhyloShell.exe. We've built the system using .NET reflection in such a way that PhyloShell.exe will scan all the DLLs for "executable" classes, which are those implementing IRunnable. The upshot is that you "execute" these class using syntax reminiscent of function calls. This syntax is recursive, so that arguments can themselves be complex types.

The basic syntax is

PhyloShell.exe TOOLNAME(Key1:Value1[,Key2:Value2,...])

Keys are argument names (these can be omitted if the argument is required). Values specify the type, and can themselves be complex.

Here's an example, that executes the ComputeAdaptation example (see examples\computeHlaAdaptation.bat for the breakdown):

phyloshell.exe     ComputeHlaAdaptation(model:subtypeC,PidToHlas:rousseau.hla.completed.txt,SequenceDB:SequenceDataBank(InputDir:.,SequenceFileFormatString:rousseau.{0}.fasta),mustHaveAllProteins:false,alignToReference:false,proteins:(gag,nef),OutAdaptationFormat:adaptationResults\Example{0}.txt,distribute:MultithreadedTasks(parallelOptions:4))

Typically these are best constructed using a bat script, as in the examples. Let's take this apart:

phyloshell.exe ComputeHlaAdaptation(^
model:subtypeC,^                            # Specifics a simple string value to the model parameter
PidToHlas:rousseau.hla.completed.txt,^      # Specifies an HLA file to an argument that is parsed by an HlaFileParser
SequenceDB:SequenceDataBank(InputDir:.,SequenceFileFormatString:rousseau.{0}.fasta),^          # Takes something of type SequenceDataBank, which itself has arguments InputDir and SequencefileFormatString
mustHaveAllProteins:false,^                 # Simple boolean input
alignToReference:false,^                    # Simple boolean input
proteins:(gag,nef),^                        # A collection (list, HashSet, etc). Note the comma-delimited list inside parentheses. If we wanted only gag, we could use proteins:(gag) or proteins:gag
OutAdaptationFormat:adaptationResults\Example{0}.txt,^     # A simple string. This one happens to be a format string
distribute:MultithreadedTasks(parallelOptions:4))    # An argument of type IDistributor. The specific instance we're creating is MultithreadedTasks, which takes an argument parallelOptions. If we wanted to send to the cluster, I'd us     OnHpc(...)

The best way to use these tools is going to be to modify the example batch files, but hopefully this gives some explanation for the syntax.

Getting Help

There is two types of help: (1) What are the arguments for a type? (2) What subtypes can I specify?

To get the arguments, use "help" as an argument. For example:

>PhyloShell.exe ComputeHlaAdaptation(help)

Help for parsing type ComputeHlaAdaptation

USAGE: computeHlaadaptation([OPTIONS],PidToHlas)

Options are specified using command-delimited name:value pairs.
Use 'listsubtypes' as the value for complex options for a list of implementing types. Use 'help' to list arguments.
Required arguments can be named like optionals.

[No documentation]

REQUIRED:
    PidToHlas: <HlaFileParser>
      The HLAs. These must be at 4-digit resolution. Those at 2-digit resolution will be ignored. HLA files are of the form (tab delimited)

      subjectID, A1, A2, B1, B2, C1, C2

      The output from HlaCompletion is also accepted (includes two additional fields for completion data).


OPTIONS:
     JobName: <String> default: ComputeHlaAdaptation
           [No documentation]

     Distribute: <IDistribute> default: Locally
           [No documentation]

     CopyLocal: <bool> default: False
           [No documentation]

     PidFile: <InputFile> default: null
           File containing either a list of Pids, with no header. If null, will take all pids from the union of sequence files.     Special case: if header is Pid [tab] TestGroup, then will parse as a list of  Key-Value pairs of pid and CV test group number.

     HlaForPositionalAdaptationScor...
Read more