Skip to content

Commit

Permalink
Merge pull request #56 from danielparton/master
Browse files Browse the repository at this point in the history
MolProbity validation feature addition
  • Loading branch information
danielparton committed Sep 24, 2015
2 parents 2e6a120 + c2d5696 commit 3f59bfb
Show file tree
Hide file tree
Showing 23 changed files with 722 additions and 142 deletions.
13 changes: 9 additions & 4 deletions docs/cli_docs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,14 @@ The ``ensembler`` tool is operated via a number of subcommands, which should be
ensembler refine_explicit
ensembler package_models

Furthermore, the ``ensembler quickmodel`` subcommand allows the entire modeling
pipeline to be run in one go for a single target and a small number of
templates. Note that this command will not work with MPI.
The optional ``ensembler validate`` subcommand uses the
`MolProbity <http://molprobity.biochem.duke.edu/>`_ command-line tools to
conduct model quality validation based on criteria such as Ramachandran angles,
backbone distortion, and atom clashes.

The ``ensembler quickmodel`` subcommand allows the entire modeling pipeline to
be run in one go for a single target and a small number of templates. Note that
this command will not work with MPI.

To print helpstrings for each subcommand, pass the ``-h`` flag.

Expand Down Expand Up @@ -96,7 +101,7 @@ Additional Tools
Ensembler includes a ``tools`` submodule, which allows the user to conduct
various useful tasks which are not considered core pipeline functions. The
use-cases for many of these tools are quite specific, so they may not be
applicable to every project, and should also be used with caution.
applicable to every project, and should be used with caution.

Residue renumbering according to UniProt sequence coordinates
-------------------------------------------------------------
Expand Down
6 changes: 6 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,12 @@ Determines the number of waters to add when solvating models with explicit water

Solvates models using the number of waters determined in the previous step, then performs a short molecular dynamics simulation (default: 100 ps), using ``OpenMM``. The final structure is written to the compressed PDB file: ``explicit-refined.pdb.gz``, as well as serialized versions of the OpenMM System, State and Integrator objects.

::

$ ensembler validate

(Optional; requires `MolProbity <http://molprobity.biochem.duke.edu/>`_ command-line tools) Validates model quality using MolProbity, which uses criteria such as Ramachandran angles, backbone distortions, and atom clashes. The ``package_models`` command can filter models based on validation score, using the ``--model_validation_score_cutoff`` and ``--model_validation_score_percentile`` flags.

::

$ ensembler package_models --package_for FAH --nfahclones 3
Expand Down
30 changes: 29 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Then, to install Ensembler with Conda, use the following commands ::
$ conda config --add channels http://conda.anaconda.org/salilab
$ conda install ensembler

Conda will automatically install all dependencies except for the optional dependency `Rosetta <https://www.rosettacommons.org/software>`_. This requires a license (free for academic non-profit use), and will have to be installed according to the instructions for that package.
Conda will automatically install all dependencies except for the optional dependencies `Rosetta <https://www.rosettacommons.org/software>`_ and `MolProbity <http://molprobity.biochem.duke.edu/>`_. These require licenses (free for academic non-profit use), and will have to be installed according to the instructions for those packages. Some limited installation instructions are included below, but these are not guaranteed to be up to date.

Install from Source
-------------------
Expand Down Expand Up @@ -113,6 +113,10 @@ Optional packages:
Some functionality, including the ``quickmodel`` and ``inspect``
functions, requires pandas.

`MolProbity <http://molprobity.biochem.duke.edu/>`_
For model validation. The ``package_models`` function can use this
data to filter models by validation score.

Manually Installing the Dependencies
------------------------------------

Expand Down Expand Up @@ -159,3 +163,27 @@ databases such as UniProt, or are excluded from the unit tests due to being
slow. To run them: ::

$ nosetests ensembler -a non_conda_dependencies -a network -a slow

Installation of Dependencies Unavailable Through Conda
======================================================

(Note: only limited instructions are included here, and these are not guaranteed to be up to date. If you encounter problems, please consult the relevant support or installation instructions for that software dependency.)

MolProbity
----------

Download the `MolProbity 4.2 release source <https://github.com/rlabduke/MolProbity/archive/molprobity_4.2.zip>`_ from the GitHub repo.

Extract the zip file, enter the created directory, and run the following command: ::

$ ./configure.sh

This was all that was required when tested on a MacBook running OS X 10.8.

On a Linux cluster, it was first necessary to edit the file configure.sh to uncomment the following line, and comment the ``make`` command: ::

$ ./binlibtbx.scons -j 1

This forces the build to use only a single core - this ran rather slowly, but using more cores resulted in build failure. This is likely due to memory issues. After runnng ``./configure.sh`` it was then also necessary to run ``./setup.sh``.

Binaries can found in the ``[MolProbity source dir]/cmdline`` directory.
2 changes: 2 additions & 0 deletions ensembler/cli_commands/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
'refine_implicit',
'solvate',
'refine_explicit',
'validate',
'package_models',
'quickmodel',
'renumber_residues',
Expand All @@ -27,6 +28,7 @@
from . import refine_implicit
from . import solvate
from . import refine_explicit
from . import validate
from . import package_models
from . import quickmodel
from . import renumber_residues
10 changes: 5 additions & 5 deletions ensembler/cli_commands/build_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
model for a protein kinase domain target""",

"""\
--template_seqid_cutoff <cutoff> Select only templates with sequence identity (percentage)
--model_seqid_cutoff <cutoff> Select only templates with sequence identity (percentage)
greater than the given cutoff.""",
]

Expand Down Expand Up @@ -62,10 +62,10 @@ def dispatch(args):
else:
templates = False

if args['--template_seqid_cutoff']:
template_seqid_cutoff = float(args['--template_seqid_cutoff'])
if args['--model_seqid_cutoff']:
model_seqid_cutoff = float(args['--model_seqid_cutoff'])
else:
template_seqid_cutoff = False
model_seqid_cutoff = False

if args['--verbose']:
loglevel = 'debug'
Expand All @@ -75,7 +75,7 @@ def dispatch(args):
ensembler.modeling.build_models(
process_only_these_targets=targets,
process_only_these_templates=templates,
template_seqid_cutoff=template_seqid_cutoff,
model_seqid_cutoff=model_seqid_cutoff,
write_modeller_restraints_file=args['--write_modeller_restraints_file'],
loglevel=loglevel
)
6 changes: 5 additions & 1 deletion ensembler/cli_commands/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,8 @@ def dispatch(args):
else:
loglevel = 'info'

ensembler.modeling.cluster_models(process_only_these_targets=targets, loglevel=loglevel, **dispatch_args)
ensembler.modeling.cluster_models(
process_only_these_targets=targets,
loglevel=loglevel,
**dispatch_args
)
28 changes: 16 additions & 12 deletions ensembler/cli_commands/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,33 +16,37 @@
ensembler align [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--templates <templates>] [--templatesfile <templatesfile>] [--substitution_matrix <matrix>]
[-v | --verbose]
ensembler build_models [-h | --help] [--targets <target>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--template_seqid_cutoff <cutoff>]
ensembler build_models [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--model_seqid_cutoff <cutoff>]
[--write_modeller_restraints_file] [-v | --verbose]
ensembler cluster [-h | --help] [--targets <target>] [--targetsfile <targetsfile>]
ensembler cluster [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--cutoff <cutoff>] [-v | --verbose]
ensembler refine_implicit [-h | --help] [--targets <target>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--template_seqid_cutoff <cutoff>]
ensembler refine_implicit [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--model_seqid_cutoff <cutoff>]
[--gpupn <gpupn>] [--openmm_platform <platform>] [--simlength <simlength>]
[--retry_failed_runs] [--ff <ffname>] [--water_model <modelname>] [--api_params <params>]
[-v | --verbose]
ensembler solvate [-h | --help] [--targets <target>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--template_seqid_cutoff <cutoff>]
ensembler solvate [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--model_seqid_cutoff <cutoff>]
[--padding <padding>] [--select_nwaters_at_percentile <value>] [--ff <ffname>]
[--water_model <modelname>] [-v | --verbose]
ensembler refine_explicit [-h | --help] [--targets <target>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--template_seqid_cutoff <cutoff>]
ensembler refine_explicit [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--templates <template>] [--templatesfile <templatesfile>] [--model_seqid_cutoff <cutoff>]
[--gpupn <gpupn>] [--openmm_platform <platform>] [--simlength <simlength>]
[--retry_failed_runs] [--write_solvated_model] [--ff <ffname>] [--water_model <modelname>]
[--api_params <params>] [-v | --verbose]
ensembler package_models [-h | --help] [--package_for <choice>] [--targets <target>]
ensembler validate [-h | --help] [--targets <targets>] [--targetsfile <targetsfile>]
[--method <method>] [--modeling_stage <stage>] [-v | --verbose]
ensembler package_models [-h | --help] [--package_for <choice>] [--targets <targets>]
[--targetsfile <targetsfile>] [--templates <template>] [--templatesfile <templatesfile>]
[--template_seqid_cutoff <cutoff>] [--nfahclones <n>] [--compressruns] [-v | --verbose]
[--model_seqid_cutoff <cutoff>] [--model_validation_score_cutoff <cutoff>]
[--model_validation_score_percentile <percentile>] [--nfahclones <n>] [--compressruns]
[-v | --verbose]
ensembler testrun_pipeline [-h | --help]
ensembler quickmodel [-h | --help] [--targetid <id>] [--templateids <ids>]
[--target_uniprot_entry_name <entry_name>] [--uniprot_domain_regex <regex>]
[--template_pdbids <pdbids>] [--template_chainids <chainids>]
[--template_uniprot_query <query>] [--template_seqid_cutoff <cutoff>] [--no-loopmodel]
[--template_uniprot_query <query>] [--model_seqid_cutoff <cutoff>] [--no-loopmodel]
[--package_for_fah] [--nfahclones <nfahclones>] [--structure_dirs <structure_dirs>]
ensembler renumber_residues [-h | --help] [--target <targetid>] [-v | --verbose]
Expand Down
68 changes: 44 additions & 24 deletions ensembler/cli_commands/package_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,43 +12,51 @@

helpstring_unique_options = [
"""\
--package_for <choice> Specify which packaging method to use (required).
- transfer: compress results into a single .tgz file
- FAH: set up the input files and directory structure
necessary to start a Folding@Home project.""",
--package_for <choice> Specify which packaging method to use (required).
- transfer: compress results into a single .tgz file
- FAH: set up the input files and directory structure
necessary to start a Folding@Home project.""",

"""\
--nfahclones <n> If packaging for Folding@Home, select the number of clones
to use for each model [default: 1].""",
--nfahclones <n> If packaging for Folding@Home, select the number of clones
to use for each model [default: 1].""",

"""\
--compressruns If packaging for Folding@Home, choose whether to compress
each RUN into a .tgz file.""",
--compressruns If packaging for Folding@Home, choose whether to compress
each RUN into a .tgz file. [default: False]""",

"""\
--model_validation_score_cutoff <cutoff> Select only models with MolProbity validation score
less than the given cutoff.""",

"""\
--model_validation_score_percentile <percentile> Select only models with MolProbity validation score
less than the value at the given percentile.""",
]

helpstring_nonunique_options = [
"""\
--targetsfile <targetsfile> File containing a list of target IDs to work on (newline-separated).
Comment targets out with "#".""",
--targetsfile <targetsfile> File containing a list of target IDs to work on (newline-separated).
Comment targets out with "#".""",

"""\
--targets <target> Define one or more target IDs to work on (comma-separated), e.g.
"--targets ABL1_HUMAN_D0,SRC_HUMAN_D0" (default: all targets)""",
--targets <target> Define one or more target IDs to work on (comma-separated), e.g.
"--targets ABL1_HUMAN_D0,SRC_HUMAN_D0" (default: all targets)""",

"""\
--templates <template> Define one or more template IDs to work on (comma-separated), e.g.
"--templates ABL1_HUMAN_D0_1OPL_A" (default: all templates)""",
--templates <template> Define one or more template IDs to work on (comma-separated), e.g.
"--templates ABL1_HUMAN_D0_1OPL_A" (default: all templates)""",

"""\
--templatesfile <templatesfile> File containing a list of template IDs to work on (newline-separated).
Comment targets out with "#".""",
--templatesfile <templatesfile> File containing a list of template IDs to work on (newline-separated).
Comment targets out with "#".""",

"""\
--template_seqid_cutoff <cutoff> Select only templates with sequence identity (percentage)
greater than the given cutoff.""",
--model_seqid_cutoff <cutoff> Select only models with sequence identity (percentage)
greater than the given cutoff.""",

"""\
-v --verbose """,
-v --verbose """,
]

helpstring = '\n\n'.join([helpstring_header, '\n\n'.join(helpstring_unique_options), '\n\n'.join(helpstring_nonunique_options)])
Expand Down Expand Up @@ -77,10 +85,20 @@ def dispatch(args):
else:
templates = False

if args['--template_seqid_cutoff']:
template_seqid_cutoff = float(args['--template_seqid_cutoff'])
if args['--model_seqid_cutoff']:
model_seqid_cutoff = float(args['--model_seqid_cutoff'])
else:
template_seqid_cutoff = False
model_seqid_cutoff = False

if args['--model_validation_score_cutoff']:
model_validation_score_cutoff = float(args['--model_validation_score_cutoff'])
else:
model_validation_score_cutoff = None

if args['--model_validation_score_percentile']:
model_validation_score_percentile = int(args['--model_validation_score_percentile'])
else:
model_validation_score_percentile = None

if args['--nfahclones']:
n_fah_clones = int(args['--nfahclones'])
Expand All @@ -107,8 +125,10 @@ def dispatch(args):
ensembler.packaging.package_for_fah(
process_only_these_targets=targets,
process_only_these_templates=templates,
template_seqid_cutoff=template_seqid_cutoff,
model_seqid_cutoff=model_seqid_cutoff,
model_validation_score_cutoff=model_validation_score_cutoff,
model_validation_score_percentile=model_validation_score_percentile,
nclones=n_fah_clones,
archive=archive,
loglevel=loglevel,
)
)
10 changes: 5 additions & 5 deletions ensembler/cli_commands/quickmodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
/Users/partond/tmp/kinome-MSMSeeder/structures/sifts\"""",

"""\
--template_seqid_cutoff <cutoff> e.g. "80\"""",
--model_seqid_cutoff <cutoff> e.g. "80\"""",
]

helpstring = '\n\n'.join([helpstring_header, '\n\n'.join(helpstring_unique_options), '\n\n'.join(helpstring_nonunique_options)])
Expand Down Expand Up @@ -88,10 +88,10 @@ def dispatch(args):
else:
chainids_dict = None

if args['--template_seqid_cutoff']:
template_seqid_cutoff = float(args['--template_seqid_cutoff'])
if args['--model_seqid_cutoff']:
model_seqid_cutoff = float(args['--model_seqid_cutoff'])
else:
template_seqid_cutoff = None
model_seqid_cutoff = None

if args['--nfahclones']:
nfahclones = int(args['--nfahclones'])
Expand All @@ -103,4 +103,4 @@ def dispatch(args):
else:
structure_paths = None

QuickModel(targetid=args['--targetid'], templateids=templateids, target_uniprot_entry_name=args['--target_uniprot_entry_name'], uniprot_domain_regex=args['--uniprot_domain_regex'], pdbids=pdbids, chainids=chainids_dict, template_uniprot_query=args['--template_uniprot_query'], template_seqid_cutoff=template_seqid_cutoff, loopmodel=not args['--no-loopmodel'], package_for_fah=args['--package_for_fah'], nfahclones=nfahclones, structure_dirs=structure_paths)
QuickModel(targetid=args['--targetid'], templateids=templateids, target_uniprot_entry_name=args['--target_uniprot_entry_name'], uniprot_domain_regex=args['--uniprot_domain_regex'], pdbids=pdbids, chainids=chainids_dict, template_uniprot_query=args['--template_uniprot_query'], model_seqid_cutoff=model_seqid_cutoff, loopmodel=not args['--no-loopmodel'], package_for_fah=args['--package_for_fah'], nfahclones=nfahclones, structure_dirs=structure_paths)
10 changes: 5 additions & 5 deletions ensembler/cli_commands/refine_explicit.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
See OpenMM documentation for other water model options""",

"""\
--template_seqid_cutoff <cutoff> Select only templates with sequence identity (percentage)
--model_seqid_cutoff <cutoff> Select only templates with sequence identity (percentage)
greater than the given cutoff.""",

"""\
Expand Down Expand Up @@ -93,10 +93,10 @@ def dispatch(args):
else:
templates = False

if args['--template_seqid_cutoff']:
template_seqid_cutoff = float(args['--template_seqid_cutoff'])
if args['--model_seqid_cutoff']:
model_seqid_cutoff = float(args['--model_seqid_cutoff'])
else:
template_seqid_cutoff = False
model_seqid_cutoff = False

if args['--gpupn']:
gpupn = int(args['--gpupn'])
Expand All @@ -121,7 +121,7 @@ def dispatch(args):
sim_length=sim_length,
process_only_these_targets=targets,
process_only_these_templates=templates,
template_seqid_cutoff=template_seqid_cutoff,
model_seqid_cutoff=model_seqid_cutoff,
retry_failed_runs=args['--retry_failed_runs'],
write_solvated_model=args['--write_solvated_model'],
ff=args['--ff'],
Expand Down

0 comments on commit 3f59bfb

Please sign in to comment.