### PyRosettaCluster 
## Tutorial 3. Multiple decoys

Tutorial 3 demonstrates various ways to efficiently run protocols multiple times. 

Parallelization can be accomplished by passing multiple tasks to PyRosettaCluster and/or PyRosettaCluster's `nstruct` argument. 

In addition, protocols can `yield` or `return` multiple `Pose` objects.

### 1. Import packages

In [1]:
import bz2
import glob
import json
import logging
logging.basicConfig(level=logging.INFO)
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.packed_pose as packed_pose
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
import pyrosetta.distributed.tasks.score as score
import pyrosetta.distributed.viewer as viewer
import random
import tempfile

from pyrosettacluster import PyRosettaCluster, get_instance_kwargs, reproduce

### 2. Initialize a compute cluster using `dask`

1. Click the "Dask" tab in Jupyter Lab <i>(arrow, left)</i>
2. Click the "+ NEW" button to launch a new compute cluster <i>(arrow, lower)</i>

![title](images/dask_labextension_1.png)

3. Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook

![title](images/dask_labextension_2.png)

Inject client code here, then run the cell:

In [2]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:45657")
client

0,1
Client  Scheduler: tcp://127.0.0.1:45657  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 16.63 GB


### 3. Define the user-provided paths:

In [3]:
my_PyRosettaCluster_git_repo = '/shared/home/aloshbaugh/PyRosettaCluster'

in_dir = os.path.join(my_PyRosettaCluster_git_repo, 'tutorials/input')
work_dir = os.path.join( 
    my_PyRosettaCluster_git_repo, 
    'tutorials/3_Multiple_decoys' )

### 4. A protocol that returns multiple poses

PyRosettaCluster automatically passes poses through protocols supplied by the user. If a protocol produces `n` poses, the subsequent protocol runs `n` times, once for each pose. `Pose` objects returned by the final protocol are written to disk.

Multiple poses can be yielded iteratively, or returned as list or comma-separated:

Yield:

    for _ in range(n_results):
        yield backrub(ppose.pose.clone())

Return list:

    return list_of_poses

Return comma-separated:

    return pose1, pose2, pose3


In [4]:
def protocol1(packed_pose_in=None, **kwargs):
    """
    Performs backrub on a pose, which can be (a) input to the function or 
    (b) accessed through kwargs 's' argument.
    
    Args:
        packed_pose: A `PackedPose` object. Optional.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        Multiple `PackedPose` objects.
    """
    import pyrosetta
    import pyrosetta.distributed.io as io
    import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
    
    input_protocol = """
        <ROSETTASCRIPTS>
          <MOVERS>
            <Backrub 
              name="backrub" 
              pivot_residues="22A,23A,24A,25A,26A,27A" 
            />
          </MOVERS>
          <PROTOCOLS>
            <Add mover="backrub"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """
    backrub = rosetta_scripts.SingleoutputRosettaScriptsTask(input_protocol)
    
    if packed_pose_in == None:
        packed_pose_in = io.pose_from_file(kwargs['s'])
    
    n_results = 3
    for _ in range(n_results):
        yield backrub(packed_pose_in.pose.clone())


def protocol2(packed_pose_in, **kwargs):
    """
    Performs sequence design (Thr24-->ALLAAxc) an input pose (Top7, pdb:1qys).
    
    Args:
        ppose: A `PackedPose` object to be designed.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        A `PackedPose` object that has been designed.
    """
    import pyrosetta
    import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts

    input_protocol = """
        <ROSETTASCRIPTS>
          <RESIDUE_SELECTORS>
            <Index name="T24" resnums="24A" />
            <Not name="not24" selector="T24" />
          </RESIDUE_SELECTORS>
          <TASKOPERATIONS>
            <ResfileCommandOperation 
              name="T24_ALLAA" command="ALLAAxc" residue_selector="T24"/>
            <OperateOnResidueSubset name="restrict_others" selector="not24">
              <PreventRepackingRLT/>
            </OperateOnResidueSubset>
          </TASKOPERATIONS>
          <MOVERS>
            <PackRotamersMover 
              name="design_mover" task_operations="T24_ALLAA,restrict_others" 
            />
          </MOVERS>
          <PROTOCOLS>
            <Add mover="design_mover"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """
    
    design_protocol = rosetta_scripts.SingleoutputRosettaScriptsTask(input_protocol)
    packed_pose_out = design_protocol(packed_pose_in.pose.clone())
    
    return packed_pose_out

### 5. Define the user-provided kwargs:

Returning multiple dictionaries allows the user to run the first protocol multiple times on different inputs. While this example simply passes the same PDB twice, you can imagine using this option to cycle through different input pdbs.

In [5]:
dictionary_of_options = {
    "-out:level":"300",
    "-multithreading:total_threads":"1",
}

def create_tasks():
    return [ 
        {
            "options": "-ex1",
            "extra_options":dictionary_of_options,
            "s":os.path.join(in_dir, '1QYS.pdb'),
        }, 
        {
            "options": "-ex1",
            "extra_options":dictionary_of_options,
            "s":os.path.join(in_dir, '1QYS.pdb'),
        }, 
    ]

### 6. Launch the original simulation using `distribute()`

The protocol produces a decoy, which we will reproduce at a later step.

In this example we use the `PyRosettaCluster` `nstruct` argument. `nstruct` is an `int` object specifying the number of repeats of the first user-provided PyRosetta protocol.

In [11]:
protocols = [protocol1, protocol2, protocol1]

PyRosettaCluster(
    tasks=create_tasks,
    client=client,
    scratch_dir=work_dir,
    output_path=work_dir,
    nstruct=2
).distribute(protocols=protocols)

While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!

In the "Dask" tab, click the various diagnostic tools _(arrows)_ to open new tabs:

![title](images/dask_labextension_4.png)

Arrange the diagnostic tool tabs within Jupyter Lab how you best see fit by clicking and dragging them:

![title](images/dask_labextension_3.png)

### 7. Visualize the resultant decoy

Gather pose from disk into memory:

In [12]:
results = glob.glob(os.path.join(work_dir, "decoys/*/*.pdb.bz2"))
packed_poses = []
for bz2file in results:
    with open(bz2file, "rb") as f:
        packed_poses.append(
            io.pose_from_pdbstring(bz2.decompress(f.read()).decode())
        )

View the pose in memory. 

Your designed Top7 (1qys) is shown in rainbow ribbon, with side chains in white sticks. The default view shows position 24 at top middle with blue ribbon. Backbone flexibility was modeled at positions 22-27, and position 24 was allowed to design to any amino acid except cysteine.

There are 36 result poses: 2 (kwargs) x 2 (nstruct) x 3 (protocol1) x 1 (protocol2) x 3 (protocol1)

Click and drag to rotate, zoom in and out w/ mouse scroll.

In [14]:
view = viewer.init(packed_poses, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view.add(viewer.setDisulfides(radius=0.25))
view()

interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=35), Output()), _d…

<function pyrosetta.distributed.viewer.core.Viewer.show.<locals>.view(i=0)>

### Congrats! 
You have successfully run a multi-protocol Rosetta trajectory using `PyRosettaCluster`! This ends Tutorial 3.