Author: Lester Hedges<br>
Email:&nbsp;&nbsp; lester.hedges@bristol.ac.uk

# Introduction

Welcome to the [CCP-BioSim](http://www.ccpbiosim.ac.uk) workshop on [BioSimSpace](https://biosimspace.org), the EPSRC flagship software project. In this workshop you'll learn how to use BioSimSpace to write robust and portable molecular workflow components and interact directly with running molecular dynamics processes inside of a Jupyter notebook.

As a computational chemist you are likely overwhelmed by the amount of different software packages that are available to you. Having choice is a good thing, but too much can become a burden. I'm sure you have all, at times, come across at least one of the following conundrums:

* How do I know what is the best tool for the job?
* How much time should I invest in learning this new package?
* How can I possibly become an expert in everything?
* I want to share my incredible script with a collaborator but it won't work on their computer?
* Will my workflow break if I upgrade to the latest version of my favourite package?
* Will I get the same result if I re-run my analysis from last month? 

Solving these problems is the core goal of BioSimSpace. The wealth of fantastic software in our community is a real asset but _interoperability_ is currently a problem. Since there is no point reinventing the wheel, BioSimSpace is not an attempt to produce yet another molecular simulation package that reproduces all of the functionality from existing programs. This would result in just another tool for you to learn, along with yet another set of standards and formats. Instead, BioSimSpace is essentially just a set of _shims_, or bits of _glue_, that connect together existing software packages, allowing you to interact with them using a consistent Python interface.

With BioSimSpace you will be able to:

* Write generic workflow components _once_ in a package-agnostic language.
* Run the same script from the command-line, Jupyter, or your within your workflow engine of choice.
* Know that you are running your job using the most suitable package that is availabe on your computer.
* Continue using your favourite package X but be able to share scripts with your collaborator who prefers package Y.
* Publish workflow components online and search our database for others that may be of use to you.
* Be able to take advantage of new software packages and hardware resources as and when they become available.

## Running a node from the command-line

The typical way of interacting with BioSimSpace is by running a workflow component, or _node_, from the command-line. A node is just a regular Python script that is run using the `python` interpreter. It can be as complex or as simple as you like, but typically it should execute a single task that has a clear input and output. An example could be the energy minimisation of a molecular configuration. We've provided a node for you in the `nodes` directory called `minimisation.py`.

From the command-line, we can query the node to see what it does and get information about the inputs:

In [None]:
!python nodes/minimisation.py --help

From the output we can see that there is an optional `steps` argument that sets the number of minimisation steps. If this argument is ommited then a default of 10000 steps is used. There is also one required argument,`files`, that specifies the set of input files that define the molecular configuration. The square brackets and ellipsis indicate that this argument can take multiple values, i.e. one or more files. Since the argument is required there is no default value. Note that the documentation doesn't specify the format of the input files.

The help message also describes the output of the node. Here there is a single output, `minimised`, which is a set of files representing the minimised molecular system.

Note that it's possible to pass options to the node in various ways, e.g. from the command-line, using a [YAML](https://en.wikipedia.org/wiki/YAML) configuration file, or even using environment variables. This provides a lot of flexibility in the way in which BioSimSpace nodes can be run. (BioSimSpce nodes can even be run within BioSimSpace, but that's for later.) For now we'll just pass arguments on the command-line.

Try running the node without any arguments and seeing what the output is:

In [None]:
!python nodes/minimisation.py

Thankfully we've provided some files for you. These are found in the `input` directory.

In [None]:
!ls input/amber

The files define a solvated alanine dipeptide system in [AMBER](http://ambermd.org) format.

Let's now run the minimisation node using these files as input. In the interests of time, let's also reduce the number of steps to 1000. The files can be passed to the script in various ways. All of the following are allowed:

```bash
!python nodes/minimisation.py --steps=1000 --files="input/ala.crd, input/ala.top"
!python nodes/minimisation.py --steps=1000 --files input/ala.crd input/ala.top
!python nodes/minimisation.py --steps=1000 --files input/*
```

In [None]:
!python nodes/minimisation.py --steps=1000 --files input/amber/*

Once the process has finished running (the asterisk to the left of the cell will disappear) we should find that the minimised molecular system has been written to the working directory.

In [None]:
!ls minimised.*

Note that the files have been written in the same format as the original molecular system, i.e. AMBER.

We have also provided some GROMACS format input files.

In [None]:
!ls input/gromacs

Let's now run the node using these files as input. This is a larger system so the minimisation will take a little longer.

In [None]:
!python nodes/minimisation.py --steps=1000 --files input/gromacs/*

There should now be two additional GROMACS format output files in the working directory.

In [None]:
!ls minimised.*

Congratulations, you have just run your first BioSimSpace node! We'll now look at the node in more detail and learn how to a write generic, portable workflow component.