**MODEL GENERATION STAGES**

For all mutant sequences

* Docking with Conformational classification (GEOMETRY ANALYSIS)
* Simulation of Dynamics (XXMD)
    * EMMD
    * PRMD
    * URMD

**INTERACTIONS ANALYSIS**

* Generate DB od Interactions per modeling stage
* Gather all DBs and determine Super-Base Interchain Interactions
* Interaction decomposition per model (mutant, conformation:XXMD-simulation:frame, model name)
* Dimensionality reduction (Filter out meanningless base interactions)
* Statistics and Comparison between *Model Generation Stages* for all mutants

**WAITING FOR SIMS WITH MISSING RESIDUES**

*NOTES*

* Perform Geometry-Interactions Analysis simulatiniously to avoid DB id-equivalence
* Tagging per _Model Generation Stage_

> <span style="color:blue">None</span> : For unclassified data

> <span style="color:blue">ConformationN</span> : For conformationally classified data

> <span style="color:blue">ConformationN:XXMD:Protein_n</span> : For conformationally classified data, after XXMD stage, for Protein frame `n`

Packages used:

pysftp
fabric
pydispatch

# MD stages

Series of MD stages
* **Preparation** 

>This initial stage eoncompasses several substages such as 1) *MD setup* (of essential GROMACS files), 2) *embedding* (of `Protein` in `POPC` bilayer), and 3) re-solvation (with ice-water). Protocols are available for all 
of these. See *Section X*.

* **Energy-Minimization MD** (EMMD)

>Minimisation of the `System`'s total energy up to the nearest found local minima, via steepest descent algorithm.

* **Position-Restrained MD** (PRMD)

>Position restrain of `Protein` backbone-atoms (`N`, `C`, `O`) for NPT-equilibration of the `System`.

* **Unrestrained MD** (URMD)

>Standard MD production

Sequence of paired-stages

```bash
prep, embed
embed, emmd
emmd, prmd,
prmd, urmd
```

Check previous, and refine list

In [12]:

mpmodeling_dir = '/home/ba13026/mpmodeling/'

engines = {
    'pre-md': {
            'setup': mpmodeling_dir+'gmx_protocols/prepare_protein_com.sh',
            'embed': mpmodeling_dir+'protocols/gmx_protocols/embed_protein.sh',
            'resolvate': mpmodeling_dir+'protocols/gmx_protocols/resolvate2.sh'
        },
    'xxmd' : {
        # Create initiation files
        'prepare': '',
        # Use templates to make submission files and submit jobs to one or both clusters
        'submit' : '',
        'manage' : ''
        },
    'analysis': {
        #Geometry-Interaction DB# 
        'get-frames' : '',
        'setup-db' : '',
        'fill-db' : '',
        'make-nb' : '' 
        },
    'backup' : {}
}

files4check = {
    'setup': {
        'in':['confout_pep'], 
        'out': ['for_embedding']
    },
    'embed': {
        'in':['for_embedding'] ,
        'out':['confout']
    },
    'resolvate': {
        'in':['confout'],
        'out':['ionise']
    }
}


# Multi-cluster Job Management System

The system relies on a **master-slave** communication model, where one single clusters is in charge of preparing all run files  per model (GROMACS `.tpr` initiation files; cluster independent) and all relevant submission files (`slurm` files; cluster dependent: resources and expected job performance).

The workflow below will be described assuming two clusters only (although the overall logic can be extended to more than two clusters). Plus, all clusters are assumed to be accessed via an SSH protocol by a common user (although  extensible to two users), and all jobs are assumed to be either `cpu` or `gpu`.

## Set-up: Cluster access

1.  Decide identity of master and slave clusters: `C_master` and `C_slave`,
2.  By default, the `local` and `remote` clusters will correspond to the **master** and **slave** clusters, respectively,
3.  Set up *working directories* (`workdir`) on both clusters, with a **common basename**, e.g., `mydir`, at some relevant path on each cluster.
4.  Define `C_master` and `C_slave` as objects according to the Python class `Cluster`, with relevant instances:

>```python
class Cluster(object):
    def __init__(self):
        self.type = '' # Either 'local' or 'remote'
        self.name = '' # Cluster name, e.g., 'bluegem.acrc.bris.ac.uk'
        self.user = '' # Username, e.g., 'ba13026'
        self.pwd = '' # SSH Password
        self.workdir = '' # Absolute path: /path_to_dir_in_cluster/mydir/
        self.slurm_cpu_template = '' # Can be defined either as concatenated strings or a method
        self.slurm_gpu_template = '' # Same as above
```

saving all Python lines in a file named `ClusterComm_setup.py`

Once ready, the function `transfer` defined within `ClusterComm_methods.py`, should be able to easily exchange files between `C_master` and `C_slave`, either *sequentially* or in *parallel* as shown in **Subsec X**.

## Set-up: I/O Folder Tree Structure

For unambiguous allocation of MD data per classified docked model we use a *tree-structure* of folders within `mydir`, which must be identical within both cluster working directories; according to the list of classification tags associated to each  PDB model,

> Model classification tags: ( `mutant`, `group`, `name` )

*Recall*: Here, `mutant` corresponds to an element in a list of sequence names (`str`), whereas `group`, can be defined as `unclassified`, `conformation1`, `conformation1:md`, for instance; and `name` is the name of the model as docking output, usually featuring a unique number.

The pseudo-code below illustrates how to create a folder-tree structure according to the intrinsic hierarchy of the classification tags via nested loops


```python
for i in range(len( mutant_seqnames )):
    mutant = mutant_seqnames[i]
    # Name of first-level dir
    mutant_dir = Cluster.workir + mutant
    # Then, make this dir
    
    for j in range(len( defined_groups[mutant] )):
        group = defined_groups[mutant][j]
        # Name of second-level dir
        group_dir = mutant_dir + '/' + group
        # Then, make this dir
        
        for k in range(len( PDBs[group] )):
            pdb_name = PDBs[group][k]
            # Name of third-level dir
            model_dir = group_dir + '/' + pdb_name
            # Then, make this dir
```

Thus, by using a list of paths to each individual model directory (*leaf-folder*), it is clear that we can use our MD Python scripts for preparation or submission to any cluster. 

## Remote cluster manipulation 

## File exchange between clusters

## Inter-cluster communication for job queueing

In [None]:
from pydispatch import dispatcher
SIGNAL = 'my-first-signal'

In [None]:
def handle_event( sender ):
    """Simple event handler"""
    print('Signal was sent by', sender)

dispatcher.connect( handle_event, signal=SIGNAL, sender=dispatcher.Any )

In [None]:
first_sender = object()
second_sender = {}

def main( ):
    dispatcher.send( signal=SIGNAL, sender=first_sender )
    dispatcher.send( signal=SIGNAL, sender=second_sender )

In [None]:
main()

# Job-tracker and manager

## Framework

Get list of model tags, per job name for submission.

Say job 

`name` = `md_100ns`

and we get the list of all models for which PRMD stage has been finalised. 

Once the list is given, determine `queueing state`, per model (`model_id`) on each cluster: 

* `q_master(model_id)`
* `q_slave(model_id)`

A queueing state per model can take the following values per cluster:

* Running : `R` 
* Pending: `PD`
* Done: `D` 
* Filed:  `F` 
* Not Submitted: `NS`

Transitional states: `NS`, `PD`, `R`, and `F` 

For a transition from one state to another to happen, an `action` is usually performed per model, as listed below:

* <span style="color:red">Submission</span>: `NS` to `PD`
* Execution: `PD` to `R`
* Completion: `R` to `D`
* Queueing Failure: `PD` to `F`
* Execution failure: `R` to `F`
* <span style="color:red">Resubmission</span>: `F` to `PD` (up to a number of trials)

Note that some `actions` are simply performed automatically by the cluster, whereas for others, we need a script to be executed by either the master or the slave cluster. The actions in red are the only ones that need a manager.

Stationary states: `D`



** Inter-Cluster Job Management** will consist of controlling two actions

* _Job Queueing_: Submissions and  Resubmissions

* _Job Cancellation_: Forced Execution-failure and Queueing-failure 


Inter-cluster interaction rules (<span style="color:blue">HERE'S WHERE THE MANAGEMENT ACTUALLY COMES INTO PLAY</span>) on any of the clusters, either `local` or `remote` ones.


**Rules for Inter-Cluster Job Management**

Either use: `sbatch`, `scancel`, or `None`

* _Job Queueing_

Given a model, with `model_id`


IF

`q_X(model_id) in ['NS','F']`, for `X = local` and `remote`

THEN

Submit job = 1 , for `Cluster_local` and `Cluster_remote`


Code:

```python
####################################
import numpy

tSS = ['NS','F'] 
f = lambda x: x in tSS 

q_X = {} 
for ctype in ['local', 'remote']:
    Cluster_X = Clusters[ctype]
    q_X[ctype] = get_submission_state(model_id, Cluster_X)
    
if all(map(f, q_X.values())) == True:
    for ctype in ['local', 'remote']:
        Cluster_X = Clusters[ctype]
        submit2cluster(model_id, Cluster_X)
####################################
```

* _Job Cancellation_

IF 

`q_X(model_id) in ['R','D']`, for `X = local` and `remote`

and 

`q_X(model_id) in ['PD', 'F', 'NS']`, for `Y != X`

THEN 

KILL JOB on `Cluster_Y`, if `q_X(model_id) == 'PD'`

DO NOTHING, if `q_X(model_id) in ['F','NS']`


ELSE


`q_X(model_id) == ['R', 'D', '']` for some `X = local` or `remote`

Submit job = 0, for both clusters 


Code:

```python
####################################
import numpy

tSS0 = ['R','D']
tSS1 = ['PD', 'F', 'NS']

f0 = lambda x: x in tSS0
f1 = lambda x: x in tSS1

q_X = {} 
for ctype in ['local', 'remote']:
    Cluster_X = Clusters[ctype]
    q_X[ctype] = get_submission_state(model_id, Cluster_X)

if any(map(f0, q_X.values())) and any(map(f1, q_X.values())):
    for ctype in ['local', 'remote']:
        Cluster_X = Clusters[ctype]
        if q_X[ctype] == 'PD':
            kill_job(model_id, Cluster_X)
####################################
```

In [40]:
import sys

# Import modules from folder
modules_path = "/home/ba13026/mpmodeling/protocols/"
if modules_path not in sys.path:
    sys.path.append(modules_path)

def kill_job():
    print("I killed job")

def submit2cluster():
    print("I submitted jobs to clusters")

def get_submission_state():
    SS = {'local': 'R', 'remote':'PD'}
    return SS

In [51]:
from cluster_transfer import BG, BC4

SS = get_submission_state()

In [46]:
CLUSTERS = [BG, BC4]

In [48]:
# Test Submission States
tSS = ['NS','F'] 
tSS0 = ['R','D']
tSS1 = ['PD', 'F', 'NS']
# Test functions
f = lambda x: x in tSS
f0 = lambda x: x in tSS0
f1 = lambda x: x in tSS1

In [53]:
SS.values()

dict_values(['R', 'PD'])

In [60]:
any(map(f0, SS.values()))

True

**COMBINED CODE**

```python
####################################
import sys
import os
import numpy
import concurrent.futures
from cluster_transfer import BG, BC4
from some_lib import get_submission_state, submit2cluster, kill_job 

CLUSTERS = [BG, BC4]

# Test Submission States
tSS = ['NS','F'] 
tSS0 = ['R','D']
tSS1 = ['PD', 'F', 'NS']
# Test functions
f = lambda x: x in tSS
f0 = lambda x: x in tSS0
f1 = lambda x: x in tSS1

JobsDB = 'path/jobs.db'
job_name = 'md_100ns'

def manage_submission(model_id):
    # Get all cluster submission states
    SS = {} 
    for Cluster in CLUSTERS:
        SS[Cluster.type] = get_submission_state(model_id, job_name, JobsDB, Cluster)
        
    # Job Queueing: Submit/Resubmit
    if all(map(f, SS.values())) == True:
        for Cluster in CLUSTERS:
            submit2cluster(model_id, job_name, JobsDB, Cluster)

    # Job Cancellation: Force Running/Queueing failure
    if any(map(f0, SS.values())) and any(map(f1, SS.values())):
        for Cluster in CLUSTERS:
            if SS[Cluster.type] == 'PD':
                kill_job(model_id, job_name, JobsDB, Cluster)
   
def main():
    with concurrent.futures.ProcessPoolExecutor(max_workers = n_cpus) as executor:
        executor.map(func, param_list)


if __name__ == '__main__':
    n_cpus = sys.argv[1]
    param_list = model_IDS
    func = manage_submission
    main()
             
####################################
```

**<span style="color:red">CONTINUE FROM HERE</span>**

Work on libraries for `get_submission_state, submit2cluster, kill_job`

In [63]:
import sys
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

modules_path = "/home/ba13026/mpmodeling/protocols/"
if modules_path not in sys.path:
    sys.path.append(modules_path)

# from setup_db_JobManager  import Tags, Jobs, Base
from cluster_transfer import BG, BC4

import importlib
import setup_db_JobManager
importlib.reload(setup_db_JobManager)

dbfile = BG.workdir+'jobs.db'
engine = create_engine('sqlite:///'+dbfile)
setup_db_JobManager.Base.metadata.bind = engine
DBSession = sessionmaker()
DBSession.bind = engine
session = DBSession()

# JobsDB = dbfile


job_name = 'md_100ns'
queue = 'local'
state = 'PD'

model_ids = [x[0] for x in session.query(setup_db_JobManager.Jobs.id).filter_by(
                state = state,
                queue = queue,
                job_name = job_name
            ).all()]

# filtered_tags = []
# for id in model_ids:
#     x = session.query(
#             setup_db_JobManager.Tags.mutant,
#             setup_db_JobManager.Tags.group,
#             setup_db_JobManager.Tags.pdb_name
#         ).filter_by(id=id).all()
#     filtered_tags.append(json.dumps(list(x[0])))

# def get_submission_state(model_id, job_name, JobsDB, Cluster):
#     pass
    

In [67]:
session.query(
    setup_db_JobManager.Jobs.state
    ).filter_by(id = 24).all()
    

[('PD')]

In [None]:
SS = {}
model_jobname_id = 24

for Cluster in CLUSTERS:
    queue_id, state = session.query(
        setup_db_JobManager.Jobs.states
        ).filter_by(id = model_jobname_id).all()
    
    SS[Cluster.type] = {
        'job_id': job_id,
        'state': state
        }
    
SS = {
    'local': [queue_id, state]
    'remote': [queue_id, state]
}

**<span style="color:red">ISSUE</span>**


IF MANAGER IS LAUNCHED EVERY 15 SECS, THEN THINKABOUT WHAT TO DO WITH CONSTANTLY FAILING JOBS.
ALSO, THINK THAT SUBMIT AND CANCEL FUNCTIONS SHOULD UPDATE THE DATABASE OF JOB STATES.


Future imporvement: 

Prority for CPU or GPU jobs per cluster can be defined

**How to determine queueing states**

Different states can be determined either 1) using `scontrol` alone, knowing the `job_id` beforehand (stored in DB), or 2) with any other alternative criteria as indicated below. 

If `job_id` unkown, then you need to extract them using `squeue` and associating these to each model, on each cluster, and then feeding these into DB.

Queueing states are:

* Running : `R` 

> Can be determined using `scontrol` if `job_id` on cluster available

* Pending: `PD`

> Can be determined using `scontrol` if `job_id` on cluster available

* Done: `D` 

> Determined either using `scontrol` if `job_id` on cluster available or just checking presence of expected output files: `name.gro` and NO ERROR MESSAGE

* Failed:  `F` 

> Determined either using `scontrol` if `job_id` on cluster available or just checking presence of expected error output files:  ............. and NO ERROR MESSAGE

* Not Submitted: `NS`

> Determined either using `scontrol` if `job_id` on cluster available or just checking presence of expected output files: `name.log, .xtc, .edr`

**How to get `job_id`s per model on a cluster**

* Job submitted for first time, then use sbatch to get id and record this in DB
* Job already on queue but no job_id recorded in DB. Need to get model_id too (use model tags)

