# BilayerAnalyzer

The BilayerAnalyzer class is the primary tool in the bilayer_analyzer module. It is used to construct a set analyses and (at the moment a limited set) of plot builders. The BilayerAnalyzer can be imported from the bilayer_analyzer module set:

In [1]:
#import the BilayerAnalyzer class 
from vorbilt.bilayer_analyzer.bilayer_analyzer import BilayerAnalyzer

## Constructing a BilayerAnalyzer instance

We can then build an analyzer instance and construct our analysis set. The BilayerAnalyzer can be initialized in three ways.

### 1. via psf_file, trajectory, and selection keyword options.

In [2]:
#initialize analyzer with keyword options--and default analyses
sel_string = "resname POPC or resname DOPE or resname TLCL2"
ba = BilayerAnalyzer(
    psf_file='../vorbilt/sample_bilayer/sample_bilayer.psf',
    trajectory='../vorbilt/sample_bilayer/sample_bilayer_10frames.dcd',
    selection=sel_string,
)

parsing inputs...
setting up analysis protocol:
build objects:
mda_frame
com_frame
with analysis:
Analysis: Mean squared displacement.
  with analysis_id: msd_1 
   and settings: 
    leaflet: both 
    resname: all 
setting up plot protocol
('trajectory', '../vorbilt/sample_bilayer/sample_bilayer_10frames.dcd')
('selection', 'resname POPC or resname DOPE or resname TLCL2')
('psf', '../vorbilt/sample_bilayer/sample_bilayer.psf')
building the MDAnalysis objects...


This constructs an analyzer for the given structure (psf_file) and trajectory (trajectory). The selection keyword value is a MDAnalysis selection string that picks out the bilayer lipids from the rest of the system. Although a file path and name are used in this example for the 'trajectory', a list of trajectory files can also be passed to the  

> Note: Athough the keyword psf_file implies that a CHARMM psf file should be used for the structure file, any valid structure file input to MDAnalysis can be used. See th [topology readers](https://pythonhosted.org/MDAnalysis/documentation_pages/topology/init.html) MDAnalysis page for more details.

> Note: In addition to a filename string, the trajectory keyword argument also accepts a list of filename strings for loading multiple trajectory files.  

> Note: Each lipid is assumed to be a unique residue. See the [selections](https://pythonhosted.org/MDAnalysis/documentation_pages/selections.html) page for details on making MDAnalysis selections.

ba initialized with a single default mean squared displacement analysis (MSD) as shown in the std out text:
    with analysis:
    Analysis: Mean squared displacements
      with analysis_id: msd_1 
       and settings: 
        leaflet: both 
        resname: all 
The MSD analysis has the analysis_id 'msd_1'. Each analysis in the set of analyses are assigned a unique analysis_id, which is used to reference that particular analysis. And we can see that the msd_1 analysis has the settings 'leaflet' and 'resname'. Each analysis has may have settings that are initialized with pre-set defaults that(outside of this default MSD analysis) can be user specified.  

#### Listing the valid analyses that can be added to the BilayerAnalyzer instance
There is a set of analyses that defined and can be assigned as part of the built-in analysis protocols of a BilayerAnalyzer instance. A function in the bilayer_analyzer module is provided to print these to std out:

In [3]:
#let's import the function
from vorbilt.bilayer_analyzer.bilayer_analyzer import print_valid_analyses

We can call the function to get a complete list of the built-in analyses available to BilayerAnalyzer instances including the analysis_key, a short statement/description about what the analysis is, and the adjustable settings of that analysis (settings outputs of 'none' means that that analysis has no adjustable settings).

In [4]:
print_valid_analyses()

analysis_key: ndcorr ---> Normal dimension displacement-lipid type cross correlation.
  with settings:
    none --> <type 'NoneType'>
analysis_key: mass_dens ---> Mass density.
  with settings:
    n_bins --> <type 'int'>
    selection_string --> <type 'str'>
analysis_key: bilayer_thickness ---> Bilayer thickness using lipid_grid.
  with settings:
    none --> <type 'NoneType'>
analysis_key: nnf ---> Lateral order nearest neighbor fraction.
  with settings:
    leaflet --> <type 'str'>
    n_neighbors --> <type 'int'>
    resname_2 --> <type 'str'>
    resname_1 --> <type 'str'>
analysis_key: apl_grid ---> Area per lipid using lipid_grid
  with settings:
    none --> <type 'NoneType'>
analysis_key: disp_vec ---> Displacement vectors.
  with settings:
    leaflet --> <type 'str'>
    resname --> <type 'str'>
    interval --> <type 'int'>
    wrapped --> <type 'bool'>
analysis_key: dc_cluster ---> Distance cutoff clustering.
  with settings:
    resname --> <type 'str'>
    leaflet --> <

Importantly, each type of analysis has a unique analysis_key (e.g. 'msd' and 'apl_box'). The analysis_key is used to specify the analysis type when adding analyses to the analyzer. 

#### 2. via an input file

The analyzer can also be created using an input file with the necessary commands. Let's look at an example, the file 'VORBILT/tests/sample_1.in' which reads :
> #set the structure file (psf) for the system 
 
> psf ../sample_bilayer/sample_bilayer.psf

> #set the trajectory file

> trajectory ../sample_bilayer/sample_bilayer_10frames.dcd

> #MDAnalysis syntax selection text to select the bilayer

> selection "not resname CLA and not resname TIP3 and not resname POT"

> #define an analysis for mean squared displacement (msd): named 'msd_1'

> analysis msd msd_1

> #define an analysis for mean squared displacement (msd) for (resname) POPC lipids in the
> #upper leaflet of the bilayer: named 'msd_2' 

> analysis msd msd_2 leaflet upper resname POPC

> #define a plot for mean squared displacement data (msd) including computes 'msd_1' and 
'msd_2': named 'msd_p'

> plot msd msd_p msd_1 DOPE-U msd_2 POPC-U


In this input script there are five different command types that will be parsed by the analyzer during intialization. The first three 'psf', 'trajectory', and 'selection' are required (similar to their keyword counterparts in initialization 1). 

The other two command types used in this input script are the 'analysis' and 'plot'. 'analysis' commands are used to add analyses to the analyzers set of protocols. They hsave the basic format:

> analysis analysis_key analysis_id

and additionally the analysis settings can be set using the format:

> analysis analysis_key analysis_id setting_key value

In the same spirit the 'plot' command is used to add 'auto' plot builders to the set of protocols in the ianalyzer. 

>Note: Currently the plotting protocols development is behind that of the analysis protocols and therefore not all analyzer built-in analyses have corresponding built-in plot protocols. Some additional plotting tools are provided in vorbilt's plot_generation module (although many of the newer analyses that have been added don't have plot functions in this module either, so will require direct use of matplotlib (or other tool) to generate plots. 

The plot command has a very similar format to that of the analysis command with the type of plotting specified by a 'plot_key' and the particular plot identified with a 'plot_id'. 
The plot command in the input script for genrating MSD time series plots has the format:

> plot plot_key plot_id analysis_id legend_name ...

Now let's actually initialize the analyzer using the input script:

In [5]:
ba = BilayerAnalyzer(input_file="../tests/sample_1.in")

parsing input file '../tests/sample_1.in'...
msd msd_1
msd msd_2 leaflet upper resname POPC
msd msd_p msd_1 DOPE-U msd_2 POPC-U
setting up analysis protocol:
build objects:
lipid_grid
mda_frame
com_frame
with analysis:
Analysis: Mean squared displacement.
  with analysis_id: msd_1 
   and settings: 
    leaflet: both 
    resname: all 
Analysis: Mean squared displacement.
  with analysis_id: msd_2 
   and settings: 
    leaflet: upper 
    resname: POPC 
setting up plot protocol
('trajectory', '../vorbilt/sample_bilayer/sample_bilayer_10frames.dcd')
('selection', 'not resname CLA and not resname TIP3 and not resname POT')
('psf', '../vorbilt/sample_bilayer/sample_bilayer.psf')
('analysis', ['msd msd_1', 'msd msd_2 leaflet upper resname POPC'])
('plot', ['msd msd_p msd_1 DOPE-U msd_2 POPC-U'])
building the MDAnalysis objects...


### 3. Using an input dictionary

Finally, the analyzer can be initialized using an input dictionary. The dictionary should at least have the 3 required keywords 'psf_file', 'trajectory', and 'selection'

In [6]:
# define the input dictionary
input_dict = {'psf' : '../vorbilt/sample_bilayer/sample_bilayer.psf', 
             'trajectory' : '../vorbilt/sample_bilayer/sample_bilayer_10frames.dcd',
              'selection' : 'resname POPC or resname DOPE or resname TLCL2' 
             }

#now initialize the analyzer
ba = BilayerAnalyzer(input_dict=input_dict)

              

setting up analysis protocol:
build objects:
lipid_grid
mda_frame
com_frame
with analysis:
Analysis: Mean squared displacement.
  with analysis_id: msd_1 
   and settings: 
    leaflet: both 
    resname: all 
setting up plot protocol
('trajectory', '../vorbilt/sample_bilayer/sample_bilayer_10frames.dcd')
('selection', 'resname POPC or resname DOPE or resname TLCL2')
('psf', '../vorbilt/sample_bilayer/sample_bilayer.psf')
building the MDAnalysis objects...
