<vcenter><center>**w_ipa: Automated, Convenient Analysis!**</center></vcenter>

<center><vcenter>by Audrey Pratt</center></vcenter>

In [1]:
# This is just a setup cell that allows us to use w_ipa in a notebook environment.
# I need to run this to ensure that everything works, but I don't want to present this slide, as
# we'll be running things on a cluster and using the terminal utilities.

%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import w_ipa
w = w_ipa.WIPI()
# At startup, it will load or run the analysis schemes specified in the configuration file (typically west.cfg)
w.main()
w.interface = 'matplotlib'

  from ._conv import register_converters as _register_converters



Welcome to w_ipa (WESTPA Interactive Python Analysis) v. 1.0B!
Run w.introduction for a more thorough introduction, or w.help to see a list of options.
Running analysis & loading files.

Complete!
Setting iteration to iter 1500.


<center>**w_ipa: WESTPA Interactive Python Analysis**</center>

w_ipa is a WESTPA tool that automates analysis using 'analysis schemes', and allows for interactive analysis of WESTPA data.  In this tutorial, I'll cover how to use this tool to do the following:

1. Calculate rate constants/fluxes
2. Adjust & and use alternate state definitions
3. Trace segments (weight, pcoord, auxdata)
4. Plot all of the above in the terminal!

To begin running this tutorial, in your terminal on H2P, run the following:

```
git clone https://github.com/ajoshpratt/westpa2018analysis.git
cd westpa2018analysis
./create_conda_environment.sh
```
You'll need to answer 'y' to all the prompts.  Once it has finished:
```
./start.sh
cd westpa2018analysis
source env.sh
cd SHORT_TEST
w_ipa -t
```

Let's begin!

<center>**How does analysis in WESTPA normally work?**</center>

To calculate rate constants, fluxes, and state populations from a WE simulation run under equilibrium conditions, you must first split your simulation into separate steady-state ensembles and solve from there.  WESTPA comes with tools that can calculate these properties using either of the following methods:

1. Non-Markovian reweighting (a matrix-based calculation)
2. Direct calculation (*i.e.*, directly tracing trajectories and directly calculating properties)

These tools, known as **w_reweight** and **w_direct**, appropriately, require that your parameter space is split into bins and that some bins correspond to a known 'state'.  The tool that accomplishes this is known as **w_assign**, which bins and assigns trajectory walkers (or *segments*) into different steady-state ensembles and places the output into a file called **assign.h5**, by default.

**w_reweight/w_direct** will calculate and average rate constants, fluxes, and state populations according to whatever averaging scheme you choose.  Importantly, in a typical project, **you may wish to use various state definitions and/or averaging schemes!**.

<center>**What is the typical analysis workflow?**</center>

A typical analysis workflow is as follows:

1. Create a bin space and define states that are appropriate for your system.
2. Run w_assign with your bin scheme/state definitions and create **assign.h5**.
3. Run w_reweight/w_direct with the averaging options you desire and create **reweight.h5/direct.h5**.

If you choose to change bin definitions, state definitions, or your averaging scheme, you will overwrite assign.h5 and/or direct.h5/reweight.h5 when you re-run those tools.  ***It is critical that you keep track of which files pertain to which analysis scheme***.

* assign.h5 contains information about the bins and state definitions used.
* direct.h5/reweigh5.h5 contain information about what averaging scheme was used, but no information about the actual states.

This introduces the possibility of error!

<center>**What if we wish to analyze the simulation as it progresses?**</center>

Running w_assign & w_direct/w_reweight while the simulation is on a cluster is straightforward.  However, visualization (such as creating graphs to analyze the evolution of the rate constant) typically requires creating PDFs, which can involve copying data to and from a local computer.

In addition, analyzing the progress coordinate or auxiliary data (such as the RMSD of a protein or molecule) typically requires loading the main data file (typically **west.h5**) into python and running a custom analysis.

Often, we simply want a fast answer to the question, *how is my simulation proceeding?*

<center>**What about tracing the properties of a particular trajectory?**</center>

One of the major strengths of the WE method is that it maintains complete trajectories.  Up until now, tracking properties of individual pathways over the course of the simulation has typically required writing custom python code and becoming familiar with the structure of the main data file (west.h5). 

<center>**w_ipa solves all of these problems!**</center>

w_ipa allows for the creation of 'analysis schemes', stored in the west.cfg file, which involve the following:

1. Bin and state definition.
2. Averaging options

w_ipa will automatically run any analysis necessary, and will update when any options are changed AND when the main data file (west.h5) has more iterations than when the analysis was previously run.  Therefore, **the results of w_ipa are always up to date with the options specified in the west.cfg file.**

Once w_ipa has finished updating any analysis files necessary, it will drop you into an ipython prompt in which the object **w** gives easy access to analysis data, trajectory tracing functions, and tools that allow you to generate plots within the terminal (using ncurses). 

<center>**How is w_ipa configured?**</center>

A WESTPA simulation primarily uses the **west.cfg** file, which can be considered the master parameter file. As such, w_ipa uses it as well, under the heading of a section titled **analysis**.

The general format of the analysis section is as follows.  A more detailed example is available in the ipython notebook on the github site:

```
analysis:
  directory: ANALYSIS # Where are we storing the analysis files?
  OPTIONS THAT SHOULD APPLY TO EVERY SCHEME
  analysis_schemes:
    SCHEME_NAME:
      OPTIONS THAT APPLY ONLY TO THIS SCHEME.  These take precedence over the general ones.
      bins!
      states!
```

The assign.h5, reweight.h5, and direct.h5 files are stored under ANALYSIS/SCHEME_NAME.

The optional arguments that can be passed in to w_assign, w_direct, and w_reweight can be specified by creating a section with the tool name and using argument: value pairs.  We'll see an example in a minute!

If you're reading this, you've downloaded the ipython notebook!  Excellent!

The best example really is what's in the west.cfg already, but here you go.
```
analysis:
  directory: ANALYSIS # this is the directory where the analysis files are stored
  postanalysis: True # should we run the Non-Markovian reweighting scheme?
  kinetics:
    # Command line arguments that we want applied to both w_direct and w_reweight for EVERY scheme
  w_assign:
    # Optional command line arguments for w_assign.  This would normally apply when you wish to use
    # an auxiliary dataset for analysis.
  analysis_schemes:
    SCHEME_NAME:
      enabled: True
      kinetics:
        # command line arguments that we want applied to this scheme only.
      states: 
        - label: STATE_NAME
          coords: [[X]] # A value for the state that will map to a bin.
      bins:
        - type: RectilinearBinMapper # WESTPA bin mapper type.
          boundaries: [[0.0, 1000000]] # Should encompass all possible pcoord types.
```

<center>**What's in OUR west.cfg?**</center>

If you were to open west.cfg, you'd see the following at the bottom:

```
analysis:
    directory: ANALYSIS
    postanalysis: True
    #w_assign:
    #  construct_dataset: system.distpcoord
    kinetics:
      step_iter: 1
      evolution: cumulative
      extra: [ 'disable-correl' ]
    analysis_schemes:
      CANONICAL:
        enabled: True
        states:
          - label: unbound
            coords: [[24.0]]
          - label: bound
            coords: [[2.19]]
        bins:
          - type: RectilinearBinMapper
            boundaries: [[0.0,2.2,24.00,100000]]
```

Once you've run w_ipa, you should see something that looks like the following:
```
Welcome to w_ipa (WESTPA Interactive Python Analysis) v. 1.0B!
Run w.introduction for a more thorough introduction, or w.help to see a list of options.
Running analysis & loading files.

Complete!
Setting iteration to iter 98.
Your current scheme, system and iteration are : CANONICAL, /gscratch2/lchong/ajp105/westpa2018analysis/SHORT_TEST, 98

In [1]: 
```

We can now begin analyzing this simulation!  In the directory SHORT_TEST, we're only analyzing the initial portion of the simulation, as analysis tools scale with the number of iterations.