Skip to content

ScoreFlow

Cédric Bouysset edited this page Sep 11, 2017 · 13 revisions

Description

ScoreFlow is a tool designed to perform rescoring :

  • using empirical scoring functions as well as end-point free energy approaches
  • in a reasonable amount of time
  • with a proper organization of the results.

It includes the following scoring functions : PLANTS, AutoDock Vina, and MM(PB,GB)SA.

Usage

Input files

ScoreFlow requires 3 things from the user :

  • A directory that will contain all the files and folders mentioned below : the run folder.
  • A directory containing the docking results : this folder will either be "docking" or "input_files/lig", depending on the mode and scoring function chosen (see below).
  • A configuration file : ScoreFlow.config

Configuration file : ScoreFlow.config

Mode

For now, 3 different ways to perform the rescoring are available :

Name Performs rescoring on
ALL All docking poses from DockFlow
BEST A selection of docking poses from DockFlow
PDB Crystal structures from PDB files

For the BEST mode, the user must run LigFlow.
For the ALL or BEST mode, the receptor must be provided. The type of file used (PDB or MOL2) depends on the scoring function used : PDB for MM-PB-SA and vina, MOL2 for PLANTS and vina.
For the PDB mode, a directory containing the complexes in PDB files has to be provided.

Scoring Function

7 scoring functions have been implemented into ScoreFlow so far. Depending on the function chosen by the user, some supplementary information might be needed from the user.

  • PLANTS :

    Name Description
    plp95 Piecewise Linear Potential (PLP) from Gehlhaar DK et al
    plp PLANTS version of the PLP
    chemplp PLANTS version of the PLP implementing some of GOLD's terms

    PLANTS scoring functions require a MOL2 file for the receptor.
    It also requires SYBYL atom types. LigFlow can manage the conversion.

  • AutoDock Vina :

    Name Description
    vina AutoDock Vina's scoring function

    Vina can accept both MOL2 or PDB files for the receptor.
    A conversion to PDBQT files is done automatically with AutoDock's MGLTools.

  • MM(PB,GB)SA :

    Name Description Radii for Gpol SASA for Gnp Surface Tension (γ) Surface Offset (b)
    PB3 MMPBSA, model 3 Parse Molsurf 0.00542 0.920000
    GB5 MMGBSA, model 5 mbondi2 LCPO 0.00500 0.000000
    GB8 MMGBSA, model 8 mbondi3 LCPO atom dependant 0.195141

    MM(PB,GB)SA scoring functions require a PDB file for the receptor. It will be prepared with the ff14SB force field from Amber.
    To run these calculations, the docking poses should have GAFF atom types. LigFlow can manage the conversion with the following options : -at gaff --amber.

    The MM(PB,GB)SA calculations can be run in 2 different fashions :

    • based on a single snapshot extracted directly from the docking pose : 1F. An implicit solvent (Generalized Born model) conjugate gradient minimization can be performed prior to the MM(PB,GB)SA calculations, with minab.
    • based on a short MD simulation (hundreds of picoseconds of production) in an implicit solvent (Generalized Born model) : MD.

    The user must provide masks of atoms to generate files for the complex. Since Amber will ignore chains and renumber the residues accordingly, this step should be done with extra care. The ligand will always be named MOL, and will be the last residue of the complex.

    The salt concentration is 0.150 mol.L-1.

Resources for the calculations :

Finally, the user can choose how to run the experiment :

  • local : run on the current computer in serial,
  • parallel : run locally using GNU parallel for a more efficient use of your computer resources,
  • mazinger : run on a compute cluster equipped with PBS.
    Once all calculations have been submitted, you have the possibility to kill the ScoreFlow process by pressing Ctrl+C, and resume the process with ScoreFlow --resume.
    You can also kill ScoreFlow jobs that are still running with ScoreFlow --kill.

You can modify the current ScoreFlow.config file directly from the command line interface.
To get the available options, run ScoreFlow -h, and to have a more extensive help, run ScoreFlow -hh.

Running

Once all of the above requirements are met, you can launch ScoreFlow from the run folder :

ScoreFlow

⚠️ If you plan on comparing several MM-PBSA functions/models, run ScoreFlow with the --purge flag to delete the previous topology and coordinates files. Alternatively, inside the input_files directory, rename the com folder to your preference, as ScoreFlow will search for data inside directories named 'lig' and 'com', and output any structural data to the 'com' folder.

Results

Common to all scoring functions

All CSV tables mentioned below are located inside a sub-directory in the rescoring folder.

  • ranking.csv :
    Contains the score of all docking poses that were rescored. A decomposition of each energy term is given as well. This table is not sorted.

In case of errors :

  • errors.csv :
    Contains the name of the docking pose for which an error was produced, the directory containing the pose's file, and the step at which the error was produced.
  • directories :
    When an error is produced, ScoreFlow will keep all files inside a unique sub-folder, identified by the name of the docking pose, and stored inside a directory with the ligand's name. To have a more precise description of the cause of the error, the user should look for files with a .job extension.

PLANTS

  • features.csv :
    Contains a more detailed energy term decomposition.
  • protein_bindingsite_fixed.mol2 :
    Contains information on the residues present in the binding site (defined by the user in the configuration file). Such file can be used to run a protein-ligand interaction fingerprint analysis with PyPLIF if needed (not included in ChemFlow for now).

VINA

  • Conversion of the MOL2 files to PDBQT :
    The converted files are located inside input_files/lig/.

MM(PB,GB)SA

All the topology and coordinates files of the complexes are located in the input_files/com folder.
A time serie decomposition is also available :

  • For calculations on a single snapshot :
    • 1F_min.csv :
      Contains time serie decomposition of the minimization for every energy term of every docking pose rescored.
  • For calculations on a short MD simulation :
    • MD_min.csv :
      Contains time serie decomposition of the minimization for every energy term of every docking pose rescored.
    • MD_prod.csv :
      Contains time serie decomposition of the production for every energy term of every docking pose rescored.

FAQ

I did a virtual screening without DockFlow. Can I still use ScoreFlow ?

Yes, but you will need to adapt the structure of your files to the one DockFlow uses :

  1. Start by creating a directory. It must contain all of the files mentioned below :
  2. Put all of your docking poses in one (or more) sub-folder.
    Each docking pose must follow this scheme :
    • MOL2 file, with a single molecule per file
    • The name of each docking poses should be unique, starting with a common name for different binding modes of the same ligand, directly followed by an underscore "_", followed by any string. The ligand name cannot contain any underscore. Examples :
      ✅ ligand-1_conf_001.mol2
      ✅ lig1_index_05_conf_007.mol2
      ✅ ZINC00967532_conf10.mol2
      ❌ ZINC_00967532_conf10.mol2
      ❌ ZINC00967532-conf10.mol2
      ❌ lig_12_003.mol2

Once these 2 steps are done, you might have to run LigFlow to prepare your docking poses. See LigFlow for more information.

  1. In ScoreFlow.config :
    • Use the "ALL" or "BEST" mode (depending if you used LigFlow).
    • Provide a path to the directory created at step "1." as folder="path/to/directory".

I've just ran ScoreFlow on mazinger but now I would like to stop it. Is there a way to do it without killing all my other jobs ?
Run ScoreFlow --kill to kill the jobs related to the last rescoring you have performed.