# Parallelized Calculations and Jobmapping

`molli` has implemented a `jobmap` function that enables the parallelized application of external drivers to `MoleculeLibrary` or `ConformerLibrary` objects. `molli` currently has 4 unique drivers for various geometry optimization, conformer generation, and property calculation methods. The `molli` `jobmap` function can be used to run parallelized calculations. These can be run through either a local computer or a cluster of computers. For information about the structure and available methods, check the [molli.pipeline: External Drivers](../API/molli.pipeline/pipeline-notes.md) section!

## Example 1: Running a Job on a Local Computer

An example script is shown below

```python

##Necessary imports
import molli as ml
from molli.pipeline.crest import CrestDriver

#This is the file the Molecules are retrieved from
source = ml.MoleculeLibrary("example.mlib", readonly=True)

#This is the file the conformer ensembles calculated will be written to.
destination = ml.ConformerLibrary("example_result.clib", readonly=False)

#This configures the driver, number of processes to use for each worker. Can also indicate how much memory to use.
crest = CrestDriver("crest", nprocs=16)

ml.pipeline.jobmap(
    crest.conformer_search,
    source=source, #Source of molecules
    destination=destination, #Where conformers will be written
    cache_dir="./conf_cache", #Where final outputs will be written, successful or not!
    scratch_dir="./scratch_dir", #Scratch Directory where calculations will be run
    n_workers=4, #Number of workers to use. In this case, 4 workers, each with 16 processors as defined in the driver.
    kwargs={
        "method": "gfnff", #GFNFF method to be used
        "temp": 298.15, #Temperature to assume
        "chk_topo": True, #Will check topology
    }, #These are arguments used in the conformer_search function and can be specified directly
    progress=True, #Will print out progress
    verbose = True, #Will print out extra information
)
```

This will create a Conformer Library with the path `example.clib`, and all inputs/outputs to `./conf_cache`. In the cache directory, there is an `input` folder which contains the formatted inputs used to submit calculations, as well as the `output` folder, which contains an encoded output (i.e. written in bytes).

## Example 2: Running a Job on a Cluster

In the likely event the user wants to use a computational cluster, a separate function was created for submission of jobs through the scheduler called `jobmap_sge`. This function was designed for use with clusters configured with the Oracle Grid Engine (also known as Sun Grid Engine) for batch submissions of jobs. This has the same functionality as `jobmap`, with the only deviation being that the collection of all `JobInput` instances is passed to a process that runs a `qsub` command instead of a local executor, and that `n_workers` no longer needs to be specified.



```python
#Necessary imports
import molli as ml
from molli.pipeline.crest import CrestDriver

#This is the file the Molecules are retrieved from
source = ml.MoleculeLibrary("example.mlib", readonly=True)

#This is the file the conformer ensembles calculated will be written to.
destination = ml.ConformerLibrary("example_result.clib", readonly=False)

#This configures the driver, number of processes to use for each worker. Can also indicate how much memory to use.
crest = CrestDriver("crest", nprocs=16)

ml.pipeline.jobmap_sge(
    crest.conformer_search,
    source,
    destination,
    cache_dir="./conf_cache", #Where final outputs will be written, successful or not!
    scratch_dir="./scratch_dir", #Scratch Directory where calculations will be run
    kwargs={
        "method": "gfnff", #GFNFF method to be used
        "temp": 298.15, #Temperature to assume
        "chk_topo": True, #Will check topology
    }, #These are arguments used in the conformer_search function and can be specified directly
    progress=True, #Will print out progress
    verbose = True, #Will print out extra information
    qsub_header="#$ -pe orte 16\n", #This specifies the parallel environment and number of slots
)
```

## Example 3: Loading Encoded Output Files

In the event that there is additional information desired from a file or a library gets written incorrectly, the encoded output cache can be read from and certain methods can be used. An example of this is shown below:


```python
#Necessary imports
import molli as ml
from glob import glob
from pathlib import Path
from tqdm import tqdm

#This is the file the Molecules are retrieved from
source = ml.MoleculeLibrary("example.mlib", readonly=True)

#This is the file the conformer ensembles calculated will be written to.
destination = ml.ConformerLibrary("example_result.clib", readonly=False)

#This reads and writes to the respective files
with source.reading(), destination.writing():
    for file in tqdm(glob('./conf_cache/output/*.out')):
        res = ml.pipeline.JobOutput.load(file) # Loads the Output file from the cache directory
        name = Path(file).stem #Gives name of file
        m = source[name] #Retrieves matching name from the source library

        #This retrieves the conformer geometry
        all_geoms = ml.CartesianGeometry.loads_all_xyz(
            res.files["crest_conformers.xyz"].decode()
        )
        
        # This creates a conformer ensemble
        result = ml.ConformerEnsemble(m, n_conformers=len(all_geoms))

        # This updates the coordinates of all the conformers
        for blank_conf, conf_geom in zip(result, all_geoms):
            blank_conf.coords = conf_geom.coords

        destination[name] = result
```