# Tutorial for ENSANE (Enhanced iNSANE)

The main code of ENSANE was created independently from insane, but the lipid, solvent and protein definitions used in insane have been converted to ENSANE data format.

# General feature overview:

General:
- Can be either imported and run from a script or run from the terminal
- Currently can only be used to create a "cubic" (rectangular) box

Membrane creation:
- All membranes are created in the xy-plane.
- Individual leaflets in membranes can be created independent of each other.
- Any number of membranes can be created, though overlaps between different membranes is not checked for, so use with caution.
- The number of lipids in a leaflet is calculated exactly, based on the APL (area per lipid) of the leaflet, taking into account the leaflet surface area occupied by proteins.
- All calculations are done on a leaflet-by-leaflet basis
- Multiple methods available for optimizing ratio between lipid types within a given leaflet

Protein insertion:
- Structure files (pdb/gro) can be inserted as "proteins"
- Any number of structures files can be given
- Structures can be moved and rotated based on the structure's center
  - Multiple methods available for designating the structure's center

Solvation:
- The number of non-ionic solvent molecules is calculated based on the free volume of the box
  - The free volume is estimated using the number of other particles present in the box
- The number of ionic solvent molecules is calculated based on the solvent volume
  - Estimated using the number of non-ionic solvent particles created using the same command
- Solvent placement algorithm ensures no solvent is placed within the hydrophobic volume of membranes and ensures a minimum distance between solvent and other particles
- Any number of solvations can be given
- Solvent molecules can be contain any number of beads but must currently be a single residue
- Structures (pdb/gro) can be imported and used as solvent molecules but must currently be a single residue

Charge calculations:
- Topology files can be given from which charge information will be read
- Understands definitions and recursively runs "#include Path/To/top.itp" statements
  - For definitions and #include statements to work correctly they must be given in the same order as one would use for "gmx grompp"
- Lipids and solvent will their names directly correlated with names in the topology
- Protein files must have their name(s) specified in their command

Dev-related features
- 2D visualizations of leaflets with intersecting protein beads can be plotted
- The system data can be pickled for later examination

# ENSANE workflow
A point is only run if its prerequisites are fulfilled
1. Ensure commands given are understood
1. Process imported topology file(s)
1. Import structures to structure libraries
1. Preprocess lipid definitions
1. Preprocess protein, leaflet and solvent commands
1. Membrane/leaflet initial creation
1. Protein placement
1. Leaflet-protein overlap checks if both leaflets and proteins are present
1. Adjust leaflets if overlaps are found
1. Solvate the system
1. Plotting and pickling if requested
1. pdb/gro, top and log files are written



# Command and subcommand syntax

ENSANE functions on a system of commands and subcommands as shown here:\
Run from script:
- ENSANE(box = [10,10,10], prot = "Structure.pdb prot_name:PROTEIN1:LIGAND1")

Run from command line:
- python ESNANE.py -box 10 10 10 -prot Structure.pdb prot_name:PROTEIN1:LIGAND1

Here the main command is "prot"/"-prot". For "prot" specifically the first subcommand given must be the name of the file to be used.

Subsequent subcommands are given as SETTING:VAL1:VAL2:VAL3 pairs where the string prior to the first colon designates the setting to be changed, while following values separated by colons are the values to be given to the setting.

The number of values a setting can be given depends on the specific setting.

Subcommands given to a command call only affects that specific command

*Note that when run from a script, if one wants multiple calls to any given command (such as "prot") then each call must be an individual string contained in a list. When running from the command line one must instead give each command call as a separate "-prot" command. Such as here:*

Run from script:
- ENSANE(box = [10,10,10], prot = ["Struc1.pdb prot_name:PROTEIN1:LIGAND1", "Struc2.pdb prot_name:PROTEIN2:LIGAND2"])

Run from command line:
- python ESNANE.py -box 10 10 10 -prot Struc1.pdb prot_name:PROTEIN1:LIGAND1 -prot Struc2.pdb prot_name:PROTEIN2:LIGAND2


# Importing and command explanations

This section will explain the various commands and showcase some of their more useful subcommands

## Packages

The following packages are needed for all features of ENSANE to run:
- ast
- itertools
- re
- math
- re
- numpy as np
- random
- copy
- scipy.spatial
- alphashape
- shapely
- shapely.plotting
- inspect
- sys
- os
- argparse
- matplotlib
- pickle

## General system commands

The "-box X Y Z" command is used to set the sidelengths of the box in x/y/z. The commands "x"/"y"/"z" can instead be used to set the coordinates individually
- box = [10, 10, 10]
- x = 10, y = 10, z = 10

The "-ff string" command can be used to set the "standard" force field used "leaf" and "solv" commands. This can be used to have multiple versions of the lipid. "ff" can also be set individually per "leaf"/"solv" command or for specific lipids/solvents, but that will be shown further down.
- ff = "M3" (default)
- ff = "dev18"
- ff = "PhosV13"

The "-sn string" command can be used to set the name of the outputted system.
- sn = "Tutorial System"

The "-rand number" command can be used to set the random seed.
- rand = 5

"-h" will cause the program to print the help.


## Protein insertions ("prot" / "-prot")

The "prot" command is used to insert structures into specific positions

The first subcommand must always be a path to the file that is to be imported
- prot = "Protein.pdb"

The structure can be moved using tx/ty/tz (in nm) and rotated using rx/ry/rz (in degrees)
- prot = "Protein.pdb tx:5 tz:3 ry:90"

The centering method can be changed using "cen_method" and has the following uses:
- "cen_method:mean" - Centers on the mean coordinate of all beads (default)
- "cen_method:axis" - Centers on the axial mean coordinate
- "cen_method:bead:**beadnr**" - Centers on a specific bead number
- "cen_method:res:**resnr**" - Centers on the mean coordinate of a specific residue number
- "cen_method:point:**x**:**y**:**z**" - Centers on a specific point. Uses the coordinate system of the imported file.

If one wants to use topology files to obtain the charge of a structure then the name(s) to be found can be designated used "prot_names". If protein names cannot be found in the topology then the program will revert to looking up amino acid charges.
- prot = "Protein.pdb prot_names:PROTEIN1:LIGAND1:LIGAND1"
- This will ensure that the charge from both PROTEIN1 and two instances of LIGAND1 are considered



## Membrane (leaflet) creation ("leaf" / "-leaf")

The "leaf" command is used to create membranes.

Lipids can be added to a "leaf" command by simply adding their name to the command along with a number indicating the ratios between lipids in the leaflet. If another string is added after the number, then the lipid will be looked for in that specific force field.
- leaf = "POPC:5 POPE:2.5 CHOL:1:dev18"

The "ff" subcommand can be used to set the default force field for the specific "leaf" command. Lipid-specific ff designations overwrites the leaflet ff designation if given:
- leaf = "POPC:5 POPE:2.5 CHOL:1:dev18 ff:PhosDev13" - POPC and POPE would use "PhosDev13" and CHOL would use "dev18"

Whether a "leaf" command should create a "bilayer", or standalone "upper" or "lower" leaflets can be designated using the "type" subcommand:
- "type:bilayer" (default)
- "type:upper"
- "type:lower"

Asymmetry can be created thusly:
- leaf = ["type:upper POPC:5 POPE:2 CHOL:1:dev18", "type:lower POPC:3 POPE:3 CHOL:2:dev18"]

Leaflets will by default fill the entire xy-plane but their sizes can be changed using x/y along with their center. This can be used to create patch with different starting lipid compositions. Note that the system is centrosymmetric during calculations so center commands should be given based on that and be in ${nm}$. It should also be noted that each leaflet is treated independently in calculations and thus APL calculations might be slightly off due to multiple roundings.
- box = [10,10,10], leaf = ["POPC:5 CHOL:1 x:5 center:2.5:0:0", "POPC:4 CHOL:2 x:5 center:-2.5:0:0"]

Leaflet APL can be set using "apl" and should be given in ${nm^2}$

The method used for converting lipid ratios to actual lipids can be chosen using "lipid_optim"
- "lipid_optim:avg_optimal" (default) - Lipids are optimized such that the mean lipid deviation from the expected ratios is used.
- "lipid_optim:abs_val" - Treats lipid ratios as actual number of lipids
- "lipid_optim:fill" - Attempts to fill the leaflet regardsless of how skewed the ratios would become. Stops if perfect ratio reached
- "lipid_optim:force_fill" - Same as "fill" but forces the leaflet to be filled completely
- "lipid_optim:no" - Does not attempt to optimize the lipid ratios (Works identically to insane.py)





## Solvations ("solv" / "-solv")

The "solv" command is used to solvate the system.

Non-ionic solvent can be added using "solv" while positive and negative ionic solvent can be added using "pos" and "neg" respectively:
- solv = "solv:W pos:NA neg:CL"

The "ff" subcommand can be used to set the default force field for the specific "solv" command. solvent-specific ff designations are currently not available:

Different solvents and ions can be added in different ratios designated by a number after the solvent name:
- solv = "solv:W:5 solv:SW:2 pos:NA:5 pos:CA:1 neg:CL"

The molarity (atomistic molarity) of the solvent and ions can be set using "solv_molarity" and "salt_molarity" respectively
- solv = "solv:W pos:NA neg:CL solv_molarity:55.56 salt_molarity:0.15" (default values shown)


Multiple solvations can be made to for example insert X number of ligands. Note that the solvation command that fills the box with water and ions should go last. The "count" command makes it such that the molarity is consideres as the actual number of solvent molecules.
- solv = ["solv:TRP count:True solv_molarity:20", "solv:W pos:NA neg:CL"]


## Import structures for solvations ("imp_struc" / "-imp_struc")

The "imp_struc" command can be used to import single-residue structures, which are added to the solvent/ion libraries.

The first subcommand should be a path to the structure file. By default the structures will be placed in the ff "imp" but it can be changed using the "ff" subcommand. The structures contained can then be used as shown:
- imp_struc = ["Extra_ions.gro ff:M3", "ligands.pdb ff:ligands", "lipids_in_solv.pdb ff:lipids"]
- solv = ["solv:LIG1 count:True solv_molarity:20 ff:ligands", "solv:W pos:NA neg:CL"]


# Example commands

In [22]:
from ENSANE import ENSANE
''' Very simple system

Contains a symmetric membrane composed of POPC and CHOL in a 5:1 ratio

The system is solvated using default molarities for water, potassium ions and chloride ions

Charge data is gathered from the "top_for_ENSANE.top" file

Output file names are automatically generated as "output.pdb" and "topol.top" and no log file is written
'''
### From script:
ENSANE(
    pbc = [10, 10, 10],
    leaf = "POPC:5 CHOL:1",
    solv = "solv:W pos:NA neg:CL",
    imp_top = "top_for_ENSANE.top",
    backup = False,
)
### From terminal:
# The \ indicates a continuation of the command on the next line
cmd = "python ENSANE.py \
-box 10 10 10 \
-leaf POPC:5 CHOL:1 \
-solv solv:W pos:NA neg:CL \
-imp_top top_for_ENSANE.top"

In [23]:
from ENSANE import ENSANE
''' Simple system

Contains a symmetric membrane composed of POPC and CHOL in a 4.7:1.3 ratio with an APL of 0.536
causing more lipids to be placed in the membrane than with the default APL of 0.6

A protein is inserted at the center of the system with its topology name being "PROT_1"

The system is solvated using default molarities for water, potassium ions and chloride ions

Charge data is gathered from the "top_for_ENSANE.top" file

Written files are given custom names and a log file is written
'''

### From script:
ENSANE(
    pbc = [10, 10, 10],
    prot = "output_martinize.pdb prot_name:PROT_1",
    leaf = "apl:0.536 POPC:4.7 CHOL:1.3",
    solv = "solv:W pos:NA neg:CL",
    imp_top = "top_for_ENSANE.top",
    out = "simple_system.pdb",
    top = "simple_topol.top",
    log = "Log_ENSANE_simple.txt",
    backup = False,
)

### From terminal:
# The \ indicates a continuation of the command on the next line
cmd = "python ENSANE.py \
-box 10 10 10 \
-prot output_martinize.pdb prot_name:PROT_1 \
-leaf apl:0.53632 POPC:4.7 CHOL:1.3 \
-solv solv:W pos:NA neg:CL \
-imp_top top_for_ENSANE.top \
-out simple_system.pdb \
-top simple_topol.top \
-log Log_ENSANE_simple.txt"

In [24]:
from ENSANE import ENSANE
''' Asymmetric membrane system

Contains an asymmetric membrane
- The upper leaflet is composed of POPC and CHOL in a 4.7:1.3 ratio with an APL of 0.573 and
- The lower leaflet is composed of POPC and CHOL in a 3.8:2.8 ratio with an APL of 0.487 and
- The APLs will cause the lower leaflet to contain more lipids than the upper one

A protein is inserted at the center of the system with its topology name being "PROT_1"

The system is solvated using default molarities for water, potassium ions and chloride ions
- The system contains a 5:3 ratio of regular water "W" to small water "SW"
- The system contains a 5:1 ratio of potassium ions to calcium ions
- The system's only negative ion is chloride

Charge data is gathered from the "top_for_ENSANE.top" file

Written files are given custom names and a log file is written
'''


### From script:
ENSANE(
    pbc = [10, 10, 10],
    leaf = ["type:upper apl:0.573 POPC:4.7 CHOL:1.3", "type:lower apl:0.487 POPC:3.8 CHOL:2.8"],
    solv = "solv:W:5 solv:SW:3 pos:NA:5 pos:CA:1 neg:CL",
    imp_top = "top_for_ENSANE.top",
    out = "asym_system.pdb",
    top = "asym_topol.top",
    log = "Log_ENSANE_asym.txt",
    backup = False,
)
### From terminal:
# The \ indicates a continuation of the command on the next line
cmd = "python ENSANE.py \
-box 10 10 10 \
-leaf type:upper apl:0.573 POPC:4.7 CHOL:1.3 \
-leaf type:lower apl:0.487 POPC:3.8 CHOL:2.8 \
-solv solv:W:5 solv:SW:3 pos:NA:5 pos:CA:1 neg:CL \
-imp_top top_for_ENSANE.top \
-out asym_system.pdb \
-top asym_topol.top \
-log Log_ENSANE_simple.txt"

In [None]:
from ENSANE import ENSANE
''' Flooding with imported ligand system

Contains a symmetric membrane composed of POPC and CHOL in a 5:1 ratio with the default APL of 0.6

A protein is inserted at the center of the system with its topology name being "PROT_1"

The system is solvated first with 30 molecules if LIG1 after which it is solvated
using default molarities for water, potassium ions and chloride ions

Charge data is gathered from the "top_for_ENSANE.top" file

Written files are given custom names and a log file is written
'''


### From script:
ENSANE(
    pbc = [10, 10, 10],
    prot = "output_martinize.pdb prot_name:PROT_1",
    leaf = "POPC:5 CHOL:1",
    solv = ["solv:LIG1 count:True solv_molarity:30 ff:Ligands", "solv:W pos:NA neg:CL"],
    imp_top = "top_for_ENSANE.top",
    imp_struc = "Ligand.pdb",
    out = "flooding_system.pdb",
    top = "flooding_topol.top",
    log = "Log_ENSANE_simple.txt",
    backup = False,
)

### From terminal:
# The \ indicates a continuation of the command on the next line
cmd = "python ENSANE.py \
-box 10 10 10 \
-prot output_martinize.pdb prot_name:PROT_1 \
-leaf POPC:5 CHOL:1 \
-solv solv:LIG1 count:True solv_molarity:30 ff:Ligands \
-solv solv:W pos:NA neg:CL \
-imp_top top_for_ENSANE.top \
-imp_struc Ligand.pdb ff:Ligands \
-out flooding_system.pdb \
-top flooding_topol.top \
-log Log_ENSANE_simple.txt"