# This notebook will detail the produiction of an analysis pipeline for molecular dynamic simulations carried out with openmmm

The flow of the analysis is broken down and handled by classes as follows: <br>

1. **master_anal** - this class contains information about an entire simulation carried out. <br>

    - Dictionary of all files outputted from the simulation organised into their respective steps (i.e. groups all thermal ramping files, equilibration files, minimization files). The idea behind this is that these files can be called and mdanalysis universes can be created for each part of the simulation with ease. It also contains information about the residues in the simulation (it contains a list of unique residue codes and a dictionary matching residue codes to polymers - this part is necessary since the polymers made with amber are built from a series of residues, so a 10-mer will actually have 10 residue codes.) This dictionary makes selecting individual polymers much easier! <br>

2. **universe** - this class creates a mdanalysis universe we can pass to the analysis methods <br>

3. **anal_methods** - this class contains mdanalysis methods 

  
   

The first thing to do is set up the manager class - something that is consistent across all of these notebooks!

In [1]:
from modules.sw_directories import *
from modules.sw_basic_functions import *
from modules.sw_analysis import *
import os as os

manager = SnippetSimManage(os.getcwd())



  from pkg_resources import resource_string


The system I will be analysing is "3HB_10_polymer_5_5_array_crystal" however you can use any system name that has simulations associated with it. The first step is to locate the simulation files required and return paths. The function below could return filepaths to multiple directories if the simualtion has been carried out multiple times.

In [2]:
system_name = "3HB_10_polymer_5_5_array_crystal"
base_molecule_name = "3HB_10_polymer"
sim_avail = manager.simulations_avail(system_name)

Output contains paths to simulation directories.


Print out the path of the simulation directories - this is a list and will contain unique simulation runs.

In [3]:
sim_avail

['/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/2024-10-04_174036']

Everything for the analysis is contained within this path with the exception of the topology file which can be returned in different ways using the manager class. <br>

Now we want to set up *master_anal* class which will contain the steps of the simulation we want to analyse. This class requires: <br>

    - Manger object
    - System name (i.e. "3HB_10_polymer_5_5_array_crystal")
    - Base molecule name (i.e. "3HB_10_polymer")
    - Simulation directory (a directory from the list returned by "manager.simulations_avail(system_name)"
    - poly length (i.e., 10)

We can set this up with the following line: *masterclass = master_anal(manager, system_name, base_molecule_name, sim_avail[i], poly_length)*

There are lots of attributes returned by this class that can be returned with *masterclass.attribute*: <br>

    - masterclass.manager : *will load the manager object, but it is passed to the class so its function can be used within the range of methods implemented here*
    - masterclass.system_name : *returns the system name* 
    - masterclass.topology_file : *returns the filepath to the topology file for the given system name* 
    - masterclass.simulation_directory : *returns the path the direcotry where all simulation files are located* 
    - masterclass.simulation_files : *returns a dictionary where simulation files are grouped based on their simulation stage* 
    - masterclass.min_filepath : *returns the filepath to the minimized pdb at the first stage of the simulation* 
    - masterclass.simulation_stages : *returns a list of the stages the simulation carried out*
    - masterclass.base_pdb : *returns a filepath to the pdb file of the base polymer*
    - masterclass.base_poly_vol: *returns the volume of the base polymer*
    - mastercalss.system_vol: *returns the volume of the system (i.e. base_poly_vol * number_of_polymers)*

*Poly_length* can be omitted when this class is initialised, but if it is specified we gain access to these attributes: <br>

    - masterclass.poly_length : *returns length of the polymers in the system*
    - masterclass.residues_codes : *returns residue codes of the polymer units* 
    - masterclass.poly_sel_dict : *returns resids of the individual polymers so we can get the values of each*
    - masterclass.number_of_polymers : *returns the number of polymers in the system*

Examples of each of these will be included below.

# Set up masterclass for simulation analysis

This class *master_anal* doesn't do analysis but contains information that is super useful for analysis. We need to pass the system name, the simulation folder we want to analyse and the length of the polymers (if this applicable). Below will be an example of each attribute that is described above.

In [4]:
# sim_avail[x] is from the mananger class and will be an entire filepath to a directory containing simulation outputs
masterclass = master_anal(manager, system_name, base_molecule_name, sim_avail[0], 10)

In [30]:
# System name
masterclass.system_name

'3HB_10_polymer_5_5_array_crystal'

In [5]:
# Topology file
masterclass.topology_file

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/3HB_10_polymer_5_5_array_crystal.prmtop'

In [31]:
# Simulation directory
masterclass.simulation_directory

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/2024-10-04_174036'

In [6]:
# Simulation files
masterclass.simulation_files

defaultdict(list,
            {'1_atm': ['3HB_10_polymer_5_5_array_crystal_1_atm_2024-10-04_174036.pdb',
              '3HB_10_polymer_5_5_array_crystal_1_atm_2024-10-04_174036.dcd',
              '3HB_10_polymer_5_5_array_crystal_1_atm_2024-10-04_174036.txt'],
             'temp_ramp_heat': ['3HB_10_polymer_5_5_array_crystal_temp_ramp_heat_300_700_2024-10-04_174036.txt',
              '3HB_10_polymer_5_5_array_crystal_temp_ramp_heat_300_700_2024-10-04_174036.pdb',
              '3HB_10_polymer_5_5_array_crystal_temp_ramp_heat_300_700_2024-10-04_174036.dcd'],
             'min': ['min_3HB_10_polymer_5_5_array_crystal.pdb'],
             'temp_ramp_cool': ['3HB_10_polymer_5_5_array_crystal_temp_ramp_cool_300_700_2024-10-04_174036.dcd',
              '3HB_10_polymer_5_5_array_crystal_temp_ramp_cool_300_700_2024-10-04_174036.txt',
              '3HB_10_polymer_5_5_array_crystal_temp_ramp_cool_300_700_2024-10-04_174036.pdb']})

In [32]:
# Filepath to minimized pdb
masterclass.min_filepath

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/2024-10-04_174036/min_3HB_10_polymer_5_5_array_crystal.pdb'

In [34]:
# Simulations stages
masterclass.simulation_stages

['1_atm', 'temp_ramp_heat', 'min', 'temp_ramp_cool']

In [36]:
# Filepath to base polymer pdb
masterclass.base_pdb

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer/3HB_10_polymer.pdb'

In [38]:
# Volume of the base polymer in angstroms cubed
masterclass.base_poly_vol

863.8880000000003

In [40]:
# Volume of all the polymers in the system
masterclass.system_vol

21597.200000000008

In [41]:
# Length of the polymers
masterclass.poly_length

10

In [7]:
# Residue codes of the polymers
masterclass.residue_codes

{'hAD', 'mAD', 'tAD'}

In [43]:
# Resids for the different polymers we can select
masterclass.poly_sel_dict

{'Polymer_1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'Polymer_2': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 'Polymer_3': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 'Polymer_4': [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 'Polymer_5': [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 'Polymer_6': [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 'Polymer_7': [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 'Polymer_8': [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 'Polymer_9': [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 'Polymer_10': [91, 92, 93, 94, 95, 96, 97, 98, 99, 100],
 'Polymer_11': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
 'Polymer_12': [111, 112, 113, 114, 115, 116, 117, 118, 119, 120],
 'Polymer_13': [121, 122, 123, 124, 125, 126, 127, 128, 129, 130],
 'Polymer_14': [131, 132, 133, 134, 135, 136, 137, 138, 139, 140],
 'Polymer_15': [141, 142, 143, 144, 145, 146, 147, 148, 149, 150],
 'Polymer_16': [151, 152, 153, 154, 155, 156, 157, 158, 159, 160],
 'Polymer_17': [161, 162, 163, 164, 165, 166,

In [42]:
masterclass.number_of_polymers

25

# Universe class

Now we can set up a universe for our simulation. We want to pass the masterclass class to **Universe** alongside a string coming from **masterclass.simulation_stages** as follows: <br>

universe = Universe(masterclass, '1_atm') <br>

This will create an mdanalysis for a specific part of the simulation using the .pdb trajectory. We can also specify the trajectory we want to use (i.e. '.pdb' or '.dcd' but if nothing is specified it will use the '.pdb' trajectory by default. <br>

universe = Universe(masterclass, '1_atm', '.dcd') <br>

In [45]:
# Pass a simulation stage and file format, the simulation stage lets us know what trajectory file we want to access
universe = Universe(masterclass, 'temp_ramp_cool', ".dcd")



The **universe** object contains manay attributes. This is in additionl to all of the attributes from the **master_anal** class as all of those attributes are passed down to the **universe** object. The attributes in the universe class are: <br>

    - universe.traj_format : *returns the format of the trajectory file - supports .pdb and .dcd*
    - universe.sim_stage
    - universe.masterclass : *the masterclass instance we have already intialised*
    - universe.topology_file : *the topology file*
    - universe.trajectory : *the trajectory file*
    - universe.output_filename : *an output file for any graphs generated throughout the analysis - this is attached to a specific simulation direcotry*
    - universe.universe : *the classic mdanalysis universe object*
    - universe.data_file : *filepath to the data file outputted for the given stage of a simulation*
    - universe.data : *the data from the datafile but loaded up in a pandas dataframe*
    

In [46]:
# Trajectory format
universe.traj_format

'.dcd'

In [47]:
# The simualtion stage we are analysing
universe.sim_stage

'temp_ramp_cool'

In [48]:
# The original masterclass object
universe.masterclass

<modules.sw_analysis.master_anal at 0x7f4cebe57350>

In [49]:
# Topology file
universe.topology

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/3HB_10_polymer_5_5_array_crystal.prmtop'

In [50]:
# Trajectory file
universe.trajectory

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/2024-10-04_174036/3HB_10_polymer_5_5_array_crystal_temp_ramp_cool_300_700_2024-10-04_174036.dcd'

In [15]:
# Path where any outputted graphs will end up
universe.output_filename

'/home/dan/polymersimulator/pdb_files/systems/3HB_10_polymer_5_5_array_crystal/2024-10-04_174036/3HB_10_polymer_5_5_array_crystal_temp_ramp_cool'

In [16]:
# MDanalysis universe object
universe.universe

<Universe with 3075 atoms>

In [17]:
# Dataframe containing all the data from this stage of the simulation
universe.data

Unnamed: 0,"#""Progress (%)""",Step,Time (ps),Potential Energy (kJ/mole),Kinetic Energy (kJ/mole),Total Energy (kJ/mole),Temperature (K),Box Volume (nm^3),Density (g/mL),Speed (ns/day),Elapsed Time (s)
0,0.0%,1000,1.000000,-25115.301993,18640.853699,-6474.448294,584.457605,38.765671,0.941194,0.00,0.000231
1,0.0%,2000,2.000000,-22984.163637,20445.845135,-2538.318502,641.050559,38.884793,0.938311,6.11,14.141430
2,0.0%,3000,3.000000,-22207.232071,21687.085684,-520.146388,679.967901,39.295464,0.928505,6.12,28.242461
3,0.0%,4000,4.000000,-21275.433820,22604.436956,1329.003136,708.730153,38.514641,0.947329,6.12,42.335292
4,0.1%,5000,5.000000,-21498.174381,22345.699989,847.525608,700.617821,39.464460,0.924529,6.13,56.420394
...,...,...,...,...,...,...,...,...,...,...,...
9994,100.0%,9995000,9995.000002,-36960.375464,9719.748180,-27240.627284,304.748958,31.623401,1.153767,6.05,142759.173750
9995,100.0%,9996000,9996.000002,-37217.966354,9361.407441,-27856.558913,293.513691,31.813964,1.146856,6.05,142773.638903
9996,100.0%,9997000,9997.000002,-37140.970462,9574.010816,-27566.959646,300.179569,31.824723,1.146468,6.05,142788.077985
9997,100.0%,9998000,9998.000002,-37093.903853,9411.511642,-27682.392211,295.084638,32.051156,1.138369,6.05,142802.541960
