ReactionMechanismGenerator · mliu49 · Jun 3, 2019 · May 30, 2019 · May 30, 2019 · May 30, 2019
diff --git a/arkane/explorerTest.py b/arkane/explorerTest.py
@@ -76,7 +76,7 @@ def test_reactions(self):
         """
         test that the right number of reactions are in output network
         """
-        self.assertEqual(len(self.explorerjob.networks[0].pathReactions), 6)
+        self.assertEqual(len(self.explorerjob.networks[0].pathReactions), 5)
 
     def test_isomers(self):
         """

diff --git a/documentation/source/users/rmg/running.rst b/documentation/source/users/rmg/running.rst
@@ -4,7 +4,7 @@
 Running a Job
 *************
 
-Running RMG job is easy and under different situations you might want add additional flag as the following examples.
+Running a basic RMG job is straightforward. However, depending on your case you might want to add the flags outlined in the following examples.
 
 **Note:** In all these examples ``rmg.py`` should be the path to your installed RMG (eg. yours might be ``/Users/joeblogs/Code/RMG-Py/rmg.py``) and ``input.py`` is the path to the input file you wish to run (eg. yours might be ``RMG-runs/hexadiene/input.py``).  If you get an error like ``python: can't open file 'rmg.py': [Errno 2] No such file or directory``  then probably the first of these is wrong. If you get an error like ``IOError: [Errno 2] No such file or directory: '/some/path/to/input.py'`` then probably the second of these is wrong.
 
@@ -20,86 +20,83 @@ Run with CPU profiling::
 
     python rmg.py input.py -p
 
-We recommend you make a job-specific directory for each RMG simulation. Some jobs can take quite a while to complete, so we also recommend using a job scheduler (if working in an linux environment). 
+We recommend you make a job-specific directory for each RMG simulation. Some jobs can take quite a while to complete, so we also recommend using a job scheduler if working in an linux environment. 
 
-The instructions below describe more special cases for running an RMG job.
+The instructions below describe special cases for running an RMG job.
 
 Running RMG in parallel with SLURM
 ----------------------------------
 
-RMG has the capability to run using multiple cores. Here is an example
-job submission script for an RMG-Py job with a SLURM scheduler
+RMG has the option to use multiple processes on one node for reaction generation and on-the-fly Quantum Mechanics Thermodynamic Property (QMTP) calculation. Here is an example submission script for an RMG-Py job with a SLURM scheduler.
 
-The job named ``min_par`` reserves 24 CPUs on a single node
-(``-np 24``), but uses only 12 workers (= 12 CPUs) in parallel during
+The job reserves 24 tasks on a single node, but uses only 12 processes in parallel during
 the RMG-Py simulation.
 
 Make sure that: 
 
 - the queue named ``debug`` exists on your SLURM scheduler. 
-- you modify the path to the parent folder of the RMG-Py installation folder 
-- you have an anaconda environment named ``rmg_env`` that contains RMG-Py's dependencies 
+- you modify the path to the parent folder of the RMG-Py installation folder.
+- you have an anaconda environment named ``rmg_env`` that contains RMG-Py's dependencies.
 - the working directory from which you launched the job contains the RMG-Py input file ``input.py``
 
-
-``-v`` adds verbosity to the output log file.
-
 .. code:: bash
 
     #!/bin/bash
+
     #SBATCH -p debug
-    #SBATCH -J min_par
+    #SBATCH -J jobname
     #SBATCH -n 24
 
-    hosts=$(srun bash -c hostname)
-
-    WORKERS=12
-
+    Processes=12
     RMG_WS=/path/to/RMG/parent/folder
     export PYTHONPATH=$PYTHONPATH:$RMG_WS/RMG-Py/
 
     source activate rmg_env
-    python -m scoop -n $WORKERS --host $hosts -v $RMG_WS/RMG-Py/rmg.py input.py
+
+    python  -n $Processes $RMG_WS/RMG-Py/rmg.py input.py
+
     source deactivate
 
 Running RMG in parallel with SGE
 --------------------------------
 
-RMG has the capability to run using multiple cores. Here is an example
-using the SGE scheduler.
+RMG has the option to use multiple processes on one node for reaction generation and on-the-fly Quantum Mechanics Thermodynamic Property (QMTP) calculation. Here is an example submission script for an RMG-Py job with a SGE scheduler.
 
-In order to help understand, the example job is also named ``min_par``
-reserving 24 CPUs on a single node (``#$ -pe singlenode 24``), but uses
-only 12 workers (= 12 CPUs) in parallel during the RMG-Py simulation.
+The job reserves 24 tasks on a single node, but uses only 12 processes in parallel during
+the RMG-Py simulation.
 
 Make sure that:
 
--  the queue named ``normal`` exists on your SGE scheduler
+-  the queue named ``debug`` exists on your SGE scheduler.
 -  you modify the path to the parent folder of the RMG-Py installation
-   folder
+   folder.
 -  you have an anaconda environment named ``rmg_env`` that contains
-   RMG-Py's dependencies
+   RMG-Py's dependencies.
 -  the working directory from which you launched the job
-   contains the RMG-Py input file ``input.py``
-
-``-v`` adds verbosity to the output log file
+   contains the RMG-Py input file ``input.py``.
 
 .. code:: bash
 
     #! /bin/bash
 
-    #$ -o job.log
-    #$ -l normal
-    #$ -N min_par
+    #$ -l debug
+    #$ -N jobname
     #$ -pe singlenode 24
 
-    WORKERS=12
-
+    Processes=12
     RMG_WS=/path/to/RMG/parent/folder
     export PYTHONPATH=$PYTHONPATH:$RMG_WS/RMG-Py/
 
     source activate rmg_env
-    python -m scoop --tunnel -n $WORKERS -v $RMG_WS/RMG-Py/rmg.py input.py
+
+    python -n $Processes $RMG_WS/RMG-Py/rmg.py input.py
 
     source deactivate
 
+
+Details on the implementation
+--------------------------------
+
+Currently, multiprocessing is implemented for reaction generation and the generation of QMfiles when using the QMTP option to compute thermodynamic properties of species. The processes are spawned and closed within each function. The number of processes is determined based on the ratio of currently available RAM and currently used RAM. The user can input the maximum number of allowed processes from the command line. For each reaction generation or QMTP call the number of processes will be the minimum value of either the number of allowed processes due to user input or the value obtained by the RAM ratio. The RAM limitation is employed, because multiprocessing is forking the base process and the memory limit (SWAP + RAM) might be exceeded when using too many processors for a base process large in memory.
+
+In python 3.4 new forking contexts 'spawn' and 'forkserver' are available. These methods will create new processes which share nothing or limited state with the parent and all memory passing is explicit. Once RMG is transferred to python 3 it is recommended to use the spawn or forkserver forking context to potentially allow for an increased number of processes.
diff --git a/rmg.py b/rmg.py
@@ -82,6 +82,10 @@ def parse_command_line_arguments(command_line_args=None):
     parser.add_argument('-t', '--walltime', type=str, nargs=1, default='00:00:00:00',
                         metavar='DD:HH:MM:SS', help='set the maximum execution time')
 
+    # Add option to select max number of processes for reaction generation
+    parser.add_argument('-n', '--maxproc', type=int, nargs=1, default=1,
+                        help='max number of processes used during reaction generation')
+
     # Add option to output a folder that stores the details of each kinetic database entry source
     parser.add_argument('-k', '--kineticsdatastore', action='store_true',
                         help='output a folder, kinetics_database, that contains a .txt file for each reaction family '
@@ -99,6 +103,9 @@ def parse_command_line_arguments(command_line_args=None):
     if args.walltime != '00:00:00:00':
         args.walltime = args.walltime[0]
 
+    if args.maxproc != 1:
+        args.maxproc = args.maxproc[0]
+
     # Set directories
     input_directory = os.path.abspath(os.path.dirname(args.file))
 
@@ -119,7 +126,7 @@ def main():
     args = parse_command_line_arguments()
 
     if args.postprocess:
-        print "Postprocessing the profiler statistics (will be appended to RMG.log)"
+        logging.info("Postprocessing the profiler statistics (will be appended to RMG.log)")
     else:
         # Initialize the logging system (resets the RMG.log file)
         level = logging.INFO
@@ -136,6 +143,7 @@ def main():
     kwargs = {
         'restart': args.restart,
         'walltime': args.walltime,
+        'maxproc': args.maxproc,
         'kineticsdatastore': args.kineticsdatastore
     }
 

diff --git a/rmgpy/data/kinetics/database.py b/rmgpy/data/kinetics/database.py
@@ -478,6 +478,8 @@ def generate_reactions_from_families(self, reactants, products=None, only_famili
         # Check if the reactants are the same
         # If they refer to the same memory address, then make a deep copy so
         # they can be manipulated independently
+        if isinstance(reactants, tuple):
+            reactants = list(reactants)
         same_reactants = 0
         if len(reactants) == 2:
             if reactants[0] is reactants[1]:
@@ -512,8 +514,6 @@ def generate_reactions_from_families(self, reactants, products=None, only_famili
                     same_reactants = 2
 
         # Label reactant atoms for proper degeneracy calculation (cannot be in tuple)
-        if isinstance(reactants, tuple):
-            reactants = list(reactants)
         ensure_independent_atom_ids(reactants, resonance=resonance)
 
         combos = generate_molecule_combos(reactants)

diff --git a/rmgpy/data/kinetics/family.py b/rmgpy/data/kinetics/family.py
@@ -373,6 +373,7 @@ def applyReverse(self, struct, unique=True):
 
 ################################################################################
 
+
 class KineticsFamily(Database):
     """
     A class for working with an RMG kinetics family: a set of reactions with 
@@ -1071,6 +1072,10 @@ def addKineticsRulesFromTrainingSet(self, thermoDatabase=None,trainIndices=None)
             logging.info('Must be because you turned off the training depository.')
             return
 
+        # Determine number of parallel processes.
+        from rmgpy.rmg.main import determine_procnum_from_RAM
+        procnum = determine_procnum_from_RAM()
+
         tentries = depository.entries
 
         index = max([e.index for e in self.rules.getEntries()] or [0]) + 1
@@ -1155,6 +1160,14 @@ def addKineticsRulesFromTrainingSet(self, thermoDatabase=None,trainIndices=None)
             # trainingSet=True used later to does not allow species to match a liquid phase library and get corrected thermo which will affect reverse rate calculation
             item = Reaction(reactants=[Species(molecule=[m.molecule[0].copy(deep=True)], label=m.label) for m in entry.item.reactants],
                              products=[Species(molecule=[m.molecule[0].copy(deep=True)], label=m.label) for m in entry.item.products])
+
+            if procnum > 1:
+                # If QMTP and multiprocessing write QMTP files here in parallel.
+                from rmgpy.rmg.input import getInput
+                quantumMechanics = getInput('quantumMechanics')
+                if quantumMechanics:
+                    quantumMechanics.runJobs(item.reactants+item.products, procnum=procnum)
+
             for reactant in item.reactants:
                 reactant.generate_resonance_structures()
                 reactant.thermo = thermoDatabase.getThermoData(reactant, trainingSet=True)

diff --git a/rmgpy/qm/main.py b/rmgpy/qm/main.py
@@ -29,13 +29,15 @@
 ###############################################################################
 
 import os
+from multiprocessing import Pool
 
 import logging
 
 import rmgpy.qm.mopac
 import rmgpy.qm.gaussian
 from rmgpy.data.thermo import ThermoLibrary
 
+
 class QMSettings():
     """
     A minimal class to store settings related to quantum mechanics calculations.
@@ -226,7 +228,50 @@ def getThermoData(self, molecule):
         else:
             raise Exception("Unknown QM software '{0}'".format(self.settings.software))
         return thermo0
-
+
+    def runJobs(self, spc_list, procnum=1):
+        """
+        Run QM jobs for the provided species list (in parallel if requested).
+        """
+        mol_list = []
+        for spc in spc_list:
+            if spc.molecule[0].getRadicalCount() > self.settings.maxRadicalNumber:
+                for molecule in spc.molecule:
+                    if self.settings.onlyCyclics and molecule.isCyclic():
+                        saturated_mol = molecule.copy(deep=True)
+                        saturated_mol.saturate_radicals()
+                        if saturated_mol not in mol_list:
+                            mol_list.append(saturated_mol)
+            else:
+                if self.settings.onlyCyclics and spc.molecule[0].isCyclic():
+                    if spc.molecule[0] not in mol_list:
+                        mol_list.append(spc.molecule[0])
+        if mol_list:
+            # Zip arguments for use in map.
+            qm_arg_list = [(self, mol) for mol in mol_list]
+
+            if procnum == 1:
+                logging.info('Writing QM files with {0} process.'.format(procnum))
+                map(_write_QMfiles_star, qm_arg_list)
+            elif procnum > 1:
+                logging.info('Writing QM files with {0} processes.'.format(procnum))
+                p = Pool(processes=procnum)
+                p.map(_write_QMfiles_star, qm_arg_list)
+                p.close()
+                p.join()
+
+
+def _write_QMfiles_star(args):
+    """Wrapper to unpack zipped arguments for use with map"""
+    return _write_QMfiles(*args)
+
+
+def _write_QMfiles(quantumMechanics, mol):
+    """
+    If quantumMechanics is turned on thermo is calculated in parallel here.
+    """
+    quantumMechanics.getThermoData(mol)
+
 
 def save(rmg):
     # Save the QM thermo to a library if QM was turned on