### PyRosettaCluster 
## Tutorial 1A. Simple protocol

Tutorial 1A is a Jupyter Lab that generates a decoy using PyRosettaCluster. It is the simplest use case, where one protocol takes one input pdb file and returns one output pdb file. 

The information needed to reproduce the simulation is included in the result pdb file. Tutorial 1B reproduces Tutorial 1A.

### 1. Import packages

In [1]:
import bz2
import glob
import json
import logging
logging.basicConfig(level=logging.INFO)
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.packed_pose as packed_pose
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
import pyrosetta.distributed.tasks.score as score
import pyrosetta.distributed.viewer as viewer
import random
import tempfile

from pyrosettacluster import PyRosettaCluster, get_instance_kwargs, reproduce

### 2. Initialize a compute cluster using `dask`

1. Click the "Dask" tab in Jupyter Lab <i>(arrow, left)</i>
2. Click the "+ NEW" button to launch a new compute cluster <i>(arrow, lower)</i>

![title](images/dask_labextension_1.png)

3. Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook

![title](images/dask_labextension_2.png)

Inject client code here, then run the cell:

In [2]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:45657")
client

0,1
Client  Scheduler: tcp://127.0.0.1:45657  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 16.63 GB


The `client` is actually optional for the `distribute()` and `reproduce()` methods. Supplying client allows you to monitor from within this Jupyter Lab Notebook. 

If you don't supply a client, `PyRosettaCluster` will instantiate a local cluster, or an SGE cluster if you pass scheduler option, e.g.:

    PyRosettaCluster(
        ...
        scheduler=None,  # Run locally
        ...
    )
    
    PyRosettaCluster(
        ...
        scheduler=your_SGE_scheduler,  # Run on SGE cluster
        ...
    )

### 3. Define the user-provided paths:

User provides the location of their PyRosettaCluster git repo:

In [3]:
path_to_PyRosettaCluster_git_repo = '/shared/home/aloshbaugh/PyRosettaCluster'

in_dir = os.path.join( path_to_PyRosettaCluster_git_repo, 'tutorials/input')
work_dir = os.path.join( 
    path_to_PyRosettaCluster_git_repo, 'tutorials/1A_Simple_protocol'
)

### 4. Define the user-provided protocol:

In [4]:
def protocol1(packed_pose_in=None, **kwargs):
    """
    Repack the input `PackedPose` object.
    
    Args:
        packed_pose_in: A `PackedPose` object to be repacked. Optional.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        A `PackedPose` object.
    """
    import pyrosetta
    import pyrosetta.distributed.io as io
    import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
    
    input_protocol = """
        <ROSETTASCRIPTS>
          <TASKOPERATIONS>
            <RestrictToRepacking name="only_pack"/>
          </TASKOPERATIONS>
          <MOVERS>
            <PackRotamersMover name="pack" task_operations="only_pack" />
          </MOVERS>
          <PROTOCOLS>
            <Add mover="pack"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """
    pack_rotamers = rosetta_scripts.SingleoutputRosettaScriptsTask(
        input_protocol
    )
    
    packed_pose_in = io.pose_from_file(kwargs['s'])
    packed_pose_out = pack_rotamers(packed_pose_in.pose.clone())
    
    return packed_pose_out

### 5. Define the user-provided kwargs:
`options` and `extra_options` get concatenated eventually before initialization, but specifying `extra_options` will override the default `"-out:levels all:warning"`, and specifying options will override the default `"-ex1 -ex2aro"`.

In [5]:
def kwargs_for_tasks():
    yield {
        "options": "-ex1",
        "extra_options":"-out:level 300 -multithreading:total_threads 1", 
        "s":os.path.join( in_dir, '1QYS.pdb' ),
        }

### If you must manipulate your pose outside `PyRosettaCluster`...
#### Avoid using `create_tasks()` with `Pose` objects.
You might notice that the above code passes the protein structure information to `PyRosettaCluster` as a `string` type location of a file. The `Pose` object 
is instantiated from that string, within `PyRosettaCluster` and using the random seed which is saved by `PyRosettaCluster`.

You may be tempted to instantiate your pose before `PyRosettaCluster`, and pass a `pose` object in the `create_tasks()`. However, in this case Rosetta will be  initiated with a random seed outside `PyRosettaCluster`, and that random seed won't be saved by `PyRosettaCluster`. As a consequence, any action taken on the pose (e.g. filling in missing heavy atoms) will not be reproducible by `PyRosettaCluster` `reproduce()` method.

If you must instantiate your pose before `PyRosettaCluster`, the user must supply:

    import pyrosetta
    pyrosetta.init( "-run:constant_seed 1" )
    
and instantiate `PyRosettaCluster` with additional argument `input_packed_pose`, e.g.:

    PyRosettaCluster(
        ...
        input_packed_pose=input_packed_pose,
        ...
    )

to ensure reproducibility. The `constant_seed` flag defaults to seed `1111111` ([documentation](https://www.rosettacommons.org/docs/latest/rosetta_basics/options/run-options)).

For an initialization example, see tutorial 4.

In summary, best practice involves giving `create_tasks` information which will be used by the distributed protocol to create a pose within `PyRosettaCluster`. In edge cases, the user may supply a pose and constant seed outside `PyRosettaCluster`.

### 6. Launch the original simulation using `distribute()`

The protocol produces a decoy, which we will reproduce at a later step.

Running `distribute()` returns line `INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'options': '-mute all -multithreading:total_threads 1', 'extra_options': '-run:constant_seed 1 -run:jran 1111111', 'set_logging_handler': None, 'notebook': None, 'silent': True}` which may confuse the user about whether the seed is constant or random. This line records initialization with the default constant seed on the master node that controls the worker nodes. The worker nodes run the actual protocols, and each worker node initializes Rosetta with a random seed, which is the seed saved by PyRosettaCluster. The master node is initialized with a constant seed as good practice.

In [6]:
protocols = [protocol1]

PyRosettaCluster(
    tasks=kwargs_for_tasks,
    protocols=protocols,
    client=client,
    scratch_dir=work_dir,
    output_path=work_dir,
).distribute(protocols=protocols)

INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'options': '-mute all -multithreading:total_threads 1', 'extra_options': '-run:constant_seed 1 -run:jran 1111111', 'set_logging_handler': None, 'notebook': None, 'silent': True}
INFO:pyrosetta.rosetta:Found rosetta database at: /shared/home/aloshbaugh/.conda/envs/jupyterlab/lib/python3.7/site-packages/pyrosetta/database; using it....
INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python37.Release 2020.15+release.3121c734db02d2b62dd1974dcb8daface3f50057 2020-04-10T09:29:24] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.


While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!

In the "Dask" tab, click the various diagnostic tools _(arrows)_ to open new tabs:

![title](images/dask_labextension_4.png)

Arrange the diagnostic tool tabs within Jupyter Lab how you best see fit by clicking and dragging them:

![title](images/dask_labextension_3.png)

### 7. Visualize the resultant decoy

Gather pose from disk into memory:

In [7]:
results = glob.glob(os.path.join(work_dir, "decoys/*/*.pdb.bz2"))
packed_poses = []
for bz2file in results:
    with open(bz2file, "rb") as f:
        packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))

View the pose in memory. Click and drag to rotate, zoom in and out w/ mouse scroll.

In [8]:
view = viewer.init(packed_poses, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view.add(viewer.setDisulfides(radius=0.25))
view()

interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=1), Output()), _do…

<function pyrosetta.distributed.viewer.core.Viewer.show.<locals>.view(i=0)>

### Congrats! 
You have successfully performed a Rosetta simulation using `PyRosettaCluster`! The next tutorial will reproduce this exact simulation.