Author: Lester Hedges<br>
Email:&nbsp;&nbsp; lester.hedges@bristol.ac.uk

___Jupyter Recap___:
* Press Shift+Enter to execute a cell and move to the cell below.
* Press Ctrl+Enter to execute a cell and remain in that cell.
* Run a shell command on the underlying operating system by prefixing the command with an exclamation mark, !
* Remember that the flow is in the order that you execute cells, which is not necessarily linear in the notebook. Keep track of the numbers in brackets to the left of the cell!


# Running nodes

The previous notebook showed you how to write a node to perform minimisation of a molecular system within an interactive Jupyter notebook. This notebook introduces you to some of the other ways of running BioSimSpace nodes, showing how the same script can be used in several different ways. 

## Running nodes on the command-line

The typical way of interacting with BioSimSpace is by running a workflow component, or _node_, from the command-line. A node is just a normal Python script that is run using the `python` interpreter. Let's use the molecular minimisation example from the previous notebook, which we've provided as a Python script called `minimisation.py` within `nodes` directory. (This is the just the previous notebook, downloaded as a regular Python script.)

From the command-line, we can query the node to see what it does and get information about the inputs:

In [None]:
!python nodes/minimisation.py --help

In the previous notebook, input was achieved via a graphical user interface where the user could configure options and upload files. On the command-line, inputs must be set as command-line arguments. From the information provided in the node itself, i.e. the description, the definition of inputs and outputs, BioSimSpace has autogenerated a nicely formatted [argparse](https://docs.python.org/3/library/argparse.html) help message that describes how the node works. The information shows all of the inputs and outputs, let's us know which inputs are optional, and specifies any default values or constraints.

Note that it's possible to pass options to the node in various ways, e.g. directly on the command-line, using a [YAML](https://en.wikipedia.org/wiki/YAML) configuration file, or even using environment variables. This provides a lot of flexibility in the way in which BioSimSpace nodes can be run. For now we'll just pass arguments on the command-line.

Try running the node without any arguments and seeing what the output is:

In [None]:
!python nodes/minimisation.py

Thankfully we've provided some files for you. As before, these are found in the `input` directory.

In [None]:
!ls input/amber

(The files define a solvated alanine dipeptide system in [AMBER](http://ambermd.org) format.)

Let's now run the minimisation node using these files as input. In the interests of time, let's also reduce the number of steps to 1000. The files can be passed to the script in various ways. All of the following are allowed:

```bash
!python nodes/minimisation.py --steps=1000 --files="input/ala.crd, input/ala.top"
!python nodes/minimisation.py --steps=1000 --files input/ala.crd input/ala.top
!python nodes/minimisation.py --steps=1000 --files input/*
```

In [None]:
!python nodes/minimisation.py --steps=1000 --files input/amber/*

Once the process has finished running (the asterisk to the left of the cell will disappear) we should find that the minimised molecular system has been written to the working directory.

In [None]:
!ls minimised.*

Note that the files have been written in the same format as the original molecular system, i.e. AMBER.

We have also provided some GROMACS format input files.

In [None]:
!ls input/gromacs

Let's now run the node using these files as input. This is a larger system so the minimisation will take a little longer.

In [None]:
!python nodes/minimisation.py --steps=1000 --files input/gromacs/*

There should now be two additional GROMACS format output files in the working directory.

In [None]:
!ls minimised.*

## Running nodes from within BioSimSpace

BioSimSpace also provides functionality for running nodes internally. This allows you to call a node from within a script, thereby using existing nodes as building blocks for more complicated workflows. To see what nodes are available we can use the `list` function from the `BSS.Node` package:

In [None]:
import BioSimSpace as BSS
BSS.Node.list()

To get information about a particular node, we simply pass its name to the help function:

In [None]:
BSS.Node.help("minimisation")

To execute a node we use the `run` function. Let's see how this works:

In [None]:
help(BSS.Node.run)

The function takes a dictionary of input values and returns another dictionary containing the outputs. Let's generate a valid input dictionary:

In [None]:
input = {"files" : ["input/amber/ala.crd", "input/amber/ala.top"], "steps" : 1000}

We can now run the `minimisation` node, passing the dictionary from above:

In [None]:
output = BSS.Node.run("minimisation", input)

Finally, let's print the output dictionary to see the result of running the node:

In [None]:
print(output)

That completes this section of the tutorial. We hope you have a better understanding of the aims of the BioSimSpace project and have the knowledge needed to write your own interoperable workflow nodes, which will be needed for the exercies below.


# Exercises

In the following excercises you will several workflow nodes. Make sure to use the built in help if you are unsure of what to do. In particular, the following might be useful:

* BioSimSpace packages can query themselves to find out what functionality is available. For example, `BSS.IO.fileFormats()` returns a list of supported file formats and `BSS.Process.packages()` returns a list of the supported molecular dynamics packages. These can be useful for setting allowed values for input requirements.
* `help(BSS.Gateway)`: Get information about the `Gateway` package and the different input `Requirement` types that are available.
* `help(BSS.Types)`: Get information about the `Types` package. These can be used to define default values for input requirements (of matching type). Remember that the `Units` package provides a convenient shortcut for declaring `Types`, e.g. `10*BSS.Units.Length.nanometer` is equivalent to `BSS.Types.Length(10, "nanometer")`.

In [None]:
10 * BSS.Units.Length.nanometer == BSS.Types.Length(10, "nanometer")

1. Write a node to convert between file formats. This should take a set of molecular files as input and convert them to a single file in another format. If you get stuck, an answer can be found [here](nodes/conversion.ipynb).

2. Write a node to genrate input files for different molecular dynamics protocols and packages As input, this node should take a set of molecular files, the name of a protocol, and the name of a supported molecular dynamics package. As output it should return a list of the input files that are generated by BioSimSpace. If you get stuck, an answer can be found [here](nodes/input_files.ipynb).

3. Write a node to equilibrate a molecular system. Consider supporting some of the options that are available in `BSS.Protocol.Equilibration`. At a minimum, the node should take a set of molecular files as input and return a different set of files representing the equilibrated system as output. If you get stuck, an answer can be found [here](nodes/equilibrationipynb).