## PyRosettaCluster: A Framework For Reproducible Computational Protein Design

This Jupyter Lab example generates a decoy using PyRosettaCluster, then reproduces the simulation to generate an identical copy of the decoy.

### 1. Import packages

In [10]:
import bz2
import glob
import json
import logging
logging.basicConfig(level=logging.INFO)
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.packed_pose as packed_pose
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
import pyrosetta.distributed.tasks.score as score
import pyrosetta.distributed.viewer as viewer
import random
import tempfile

from pyrosettacluster import PyRosettaCluster, get_instance_kwargs, reproduce

### 2. Initialize a compute cluster using `dask`

1. Click the "Dask" tab in Jupyter Lab <i>(arrow, left)</i>
2. Click the "+ NEW" button to launch a new compute cluster <i>(arrow, lower)</i>

![title](images/dask_labextension_1.png)

3. Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook

![title](images/dask_labextension_2.png)

Inject client code here, then run the cell:

In [19]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:37835")
client

0,1
Client  Scheduler: tcp://127.0.0.1:37835  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 16.63 GB


### 3. Define the user-provided out paths:

In [2]:
workdir = '/shared/home/aloshbaugh/PyRosettaCluster/tutorials/tutorial_1/logs' #didn't write anything here
output_path = '/shared/home/aloshbaugh/PyRosettaCluster/tutorials/tutorial_1/output'

### 2. Define the user-provided protocol:

In [3]:
def protocol1(packed_pose, **kwargs):
    """
    Repack the input `PackedPose` object.
    
    Args:
        packed_pose: an input `PackedPose` object.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        A `Pose` object.
    """
    print('before a')
    import pyrosetta
    import pyrosetta.distributed.io as io
    from pyrosetta.rosetta.protocols.minimization_packing import (
        PackRotamersMover,
    )
    
    print('a')

    input_protocol = """
        <ROSETTASCRIPTS>
          <TASKOPERATIONS>
            <RestrictToRepacking name="only_pack"/>
          </TASKOPERATIONS>

          <MOVERS>
            <PackRotamersMover name="pack" task_operations="only_pack" />
          </MOVERS>

          <PROTOCOLS>
            <Add mover="pack"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """
    relax = rosetta_scripts.SingleoutputRosettaScriptsTask(input_protocol)

    print('b')
    
    relax.setup()
    
    print('c')
    
    #pose = io.to_pose(packed_pose)
    
    print('d')
    
    packed_pose = relax(packed_pose)
    
    print('e')

    return packed_pose

def my_first_protocol(packed_pose, **kwargs):
    """
    Repack the input `PackedPose` object.
    
    Args:
        packed_pose: an input `PackedPose` object.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        Three `Pose` objects.
    """
    print('a')
    import pyrosetta
    import pyrosetta.distributed.io as io
    from pyrosetta.rosetta.protocols.minimization_packing import (
        PackRotamersMover,
    )
    print('b')
    pose = io.to_pose(packed_pose)
    print('c')
    pack_rotamers = PackRotamersMover(
        scorefxn=pyrosetta.create_score_function("ref2015.wts"),
        task=pyrosetta.standard_packer_task(pose),
        nloop=10,
    )
    
    pack_rotamers.apply(pose)
    dummy_pose_1 = io.to_pose(io.pose_from_sequence("W" * 6))
    dummy_pose_2 = io.to_pose(io.pose_from_sequence("F" * 6))
    print('d')
    return pose, dummy_pose_1, dummy_pose_2

### 2. Define the user-provided input pdb:

In [4]:
pdb_string = !curl https://files.rcsb.org/download/1qys.pdb
pdb_string = "\n".join(pdb_string)
input_packed_pose = score.ScorePoseTask()(io.pose_from_pdbstring(pdb_string,'1qys'))

INFO:pyrosetta.rosetta:Found rosetta database at: /shared/home/aloshbaugh/.conda/envs/jupyterlab/lib/python3.7/site-packages/pyrosetta/database; using it....
INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python37.Release 2020.15+release.3121c734db02d2b62dd1974dcb8daface3f50057 2020-04-10T09:29:24] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.


### 3. Execute protocol

### 4. Launch the first simulation the generate a decoy to later reproduce

In [5]:
def create_tasks():
    yield {
        "options": "-ex1",
        "extra_options": "-out:level 300 -multithreading:total_threads 1",
    }

# wpose = pyrosetta.distributed.packed_pose.to_pose( ppose )

# pdb_info = pyrosetta.rosetta.core.pose.PDBInfo( wpose )
# name = pdb_info.name()
# print(pdb_info)
# print(name)
# name='x'
# name = wpose.pdb_info().name()
# wpose.dump_pdb( "{0}.pdb".format( os.path.join(outdir,name) ) )


protocols = [ protocol1 ] # [ my_first_protocol, my_second_protocol, my_third_protocol]





In [None]:
PyRosettaCluster(
    tasks=create_tasks,
    input_packed_pose=input_packed_pose,
    protocols=protocols,
    client=client,
    scratch_dir=workdir,
    output_path=output_path, #k
).distribute(protocols=protocols)

While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!

In the "Dask" tab, click the various diagnostic tools _(arrows)_ to open new tabs:

![title](images/dask_labextension_4.png)

Arrange the diagnostic tool tabs within Jupyter Lab how you best see fit by clicking and dragging them:

![title](images/dask_labextension_3.png)

### 5. Visualize the results

Gather poses from disk into memory:

In [6]:
results = glob.glob(os.path.join(output_path, "decoys/*/*.pdb.bz2"))
packed_poses = []
for bz2file in results:
    with open(bz2file, "rb") as f:
        packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))

View the pose in memory. Click and drag to rotate, zoom in and out w/ mouse scroll.

In [7]:
view = viewer.init(packed_poses, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view.add(viewer.setDisulfides(radius=0.25))
view()

### Reproduce the decoy.

The `PyRosettaCluster` instance keyword arguments to reproduce this decoy are recovered using `get_instance_kwargs()`. 

The protocol produced only one decoy, which is accessed by index zero of results: `results[0]`.

In [12]:
instance_kwargs = get_instance_kwargs(input_file=results[0])
instance_kwargs

{'ami_id': '',
 'compressed': True,
 'cores': 1,
 'dashboard_address': ':8787',
 'decoy_dir_name': 'decoys',
 'decoy_ids': [0],
 'dry_run': False,
 'ignore_errors': False,
 'instance_id': '',
 'logging_level': 'INFO',
 'logs_dir_name': 'logs',
 'max_workers': 1000,
 'memory': '4g',
 'min_workers': 1,
 'nstruct': 1,
 'output_path': '/shared/home/aloshbaugh/PyRosettaCluster/tutorials/tutorial_1/output',
 'processes': 1,
 'project_name': '2020.05.08.18.04.10.293556',
 'protocols': ['protocol1'],
 'save_all': False,
 'scheduler': None,
 'scorefile_name': 'scores.json',
 'scratch_dir': '/shared/home/aloshbaugh/PyRosettaCluster/tutorials/tutorial_1/logs',
 'seeds': ['998948069'],
 'sha1': '',
 'simulation_name': '2020.05.08.18.04.10.293556',
 'tasks': {'options': '-ex1',
  'extra_options': '-out:level 300 -multithreading:total_threads 1'},
 'timeout': 0.5}

### 6. Launch the second simulation the reproduce a decoy

The `input_packed_pose`, `client`, and `protocols` variables need to be specified along with the `PyRosettaCluster` instance keyword arguments needed to reproduce the desired trajectory:

In [20]:
# PyRosettaCluster(
#     input_packed_pose=input_packed_pose,
#     client=client,
#     **get_instance_kwargs(input_file=results[0]),
# ).distribute(protocols=protocols)

#PyRosettaCluster().
reproduce(
    input_file=results[0],
    protocols=protocols,
    instance_kwargs={"client": client},
)

#     Given an input file that was written by PyRosettaCluster (or a scorefile
#     and a decoy name that was written by PyRosettaCluster), an iterable
#     of user-defined PyRosetta protocols, and any additional PyRosettaCluster
#     instance kwargs, reproduce a given decoy using a new instance of PyRosettaCluster.
#     Args:
#         input_file: A `str` object specifying the path to the '.pdb' or '.pdb.bz2'
#             file from which to extract PyRosettaCluster instance kwargs. If input_file
#             is provided, then ignore the scorefile and decoy_name argument parameters.
#             Default: None
#         scorefile: A `str` object specifying the path to the JSON-formatted scorefile
#             from which to extract PyRosettaCluster instance kwargs. If scorefile
#             is provided, decoy_name must also be provided.
#             Default: None
#         decoy_name: A `str` object specifying the decoy name for which to extract
#             PyRosettaCluster instance kwargs. If decoy_name is provided, scorefile
#             must also be provided.
#             Default: None
#         protocols: An iterable object of function or generator objects specifying
#             an ordered sequence of user-defined PyRosetta protocols to execute for
#             the reproduction.
#             Default: None
#         instance_kwargs: A `dict` object of valid PyRosettaCluster attributes which
#             will override any PyRosettaCluster attributes that were used to generate
#             the original decoy.
#             Default: None
#     Returns:
#         None

ProcessError: Worker thread killed!

### 7. Visualize the reproduced decoy

In [45]:
for bz2file in glob.glob(os.path.join(output_path, "decoys/*/*.pdb.bz2")):
    if bz2file not in results:
        reproduced_result = bz2file
        break

with open(reproduced_result, "rb") as f:
    reproduced_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())

In [36]:
view = viewer.init(reproduced_packed_pose, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view.add(viewer.setDisulfides(radius=0.25))
view()

### 8. Optionally, perform sanity checks to confirm that the reproduced pose is identical to the original:

PyRosetta trajectories are _deterministic_ depending on the input random number generated seed(s)!

In [37]:
original_pose = poses[14].pose
reproduced_pose = reproduced_packed_pose.pose

#### Assert that the sequences are identical:

In [38]:
assert original_pose.sequence() == reproduced_pose.sequence()

#### Assert that the `total_score`s are identical:

In [40]:
scorefxn = pyrosetta.create_score_function("ref2015.wts")
assert scorefxn(original_pose) == scorefxn(reproduced_pose)

#### Assert that the C$_{\alpha}$–C$_{\alpha}$ root-mean-square deviation (RMSD) is `0.0` Å:

Note: There is no need to first superimpose the `original_pose` and `reproduced_pose` because they were both generated starting from the same `input_packed_pose`

In [44]:
assert pyrosetta.rosetta.core.scoring.CA_rmsd(original_pose, reproduced_pose) == 0.0

#### The reason the `original_pose` and `reproduced_pose` are identical is because the `seeds`, `decoy_ids`, and `protocols` attributes were identical in both `PyRosettaCluster` simulations:

In [52]:
for attribute in ["seeds", "decoy_ids", "protocols"]:
    assert get_instance_kwargs(reproduced_result)[attribute] == get_instance_kwargs(results[14])[attribute]