# Session 4: Advanced Universe Creation and Attributes

<a id='trajanalysis'></a>

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="right"/></a>

Authors: 

- Dr Micaela Matta - [@micaela-matta](https://github.com/micaela-matta)
- Dr Richard Gowers - [@richardjgowers](https://github.com/richardjgowers) 

This notebook is adapted from materials developed for the [2018 Workshop/Hackathon](https://github.com/MDAnalysis/WorkshopHackathon2018).

## Learning Outcomes


This notebook contains examples of more complicated `Universe` construction.


#### Additional resources

 - During the workshop, feel free to ask questions at any time
 - For more on how to use MDAnalysis, see the [User Guide](https://userguide.mdanalysis.org/2.0.0-dev0/) and [documentation](https://docs.mdanalysis.org/2.0.0-dev0/)
 - Ask questions on the [GitHub Discussions forum](https://github.com/MDAnalysis/mdanalysis/discussions) or on [Discord](https://discord.gg/fXTSfDJyxE)
 - Report bugs on [GitHub](https://github.com/MDAnalysis/mdanalysis/issues?)


# Google Colab package installs

This installs the necessary packages for Google Colab. Please only run these if you are using Colab.

In [None]:
# NBVAL_SKIP
!pip install condacolab
import condacolab


In [None]:
# NBVAL_SKIP
import condacolab
condacolab.check()
!mamba install -c conda-forge mdanalysis mdanalysistests mdanalysisdata nglview rdkit

In [None]:
# NBVAL_SKIP
# enable third party jupyter widgets
from google.colab import output
output.enable_custom_widget_manager()

In [None]:
import warnings
warnings.filterwarnings("ignore") 

import MDAnalysis as mda
import MDAnalysisData as data


## 1. `transfer_to_memory`

The MDAnalysis data model only loads a single frame of trajectory data into memory at any point.  This is because loading an entire trajectory at once would require a large amount of memory.

Using the `in_memory` keyword in `Universe` creation, (or calling the `Universe.transfer_to_memory()` method,
the entire trajectory can be read into memory.
This will require significantly more memory on the workstation,
typically a similar amount to the filesize of the trajectory.

In [None]:
adk = data.datasets.fetch_adk_equilibrium()

In [None]:
regular_u = mda.Universe(adk['topology'], adk['trajectory'])

%timeit [ts.frame for ts in regular_u.trajectory]

Iterating through a trajectory can be much faster without having to read from the trajectory file for each frame.

In [None]:
memory_u = mda.Universe(adk['topology'], adk['trajectory'], in_memory=True)

%timeit [ts.frame for ts in memory_u.trajectory]

Transferring a trajectory to memory converts the `Universe.trajectory` object to a `MemoryReader`.
One notable difference of this `Reader` is any changes made to atom positions are permanent!
This can be useful when you want to apply a coordinate transformation (ie align the structure) and then analyse afterwards.

In [None]:
print(memory_u.trajectory)

## 2. `guess_bonds`

By default, bond information is only present in a `Universe` if the topology file had these.
This means that various methods such as `.fragments` will not work

In [None]:
nhaa = data.datasets.fetch_nhaa_equilibrium()

nhaa_u = mda.Universe(nhaa['topology'])

nhaa_u.atoms.fragments

It is possible to try and guess bonds based upon the separations between atoms.
Bonds are guessed by comparing the distance between two atoms ($d_{ij}$) to the sum of their vdw radii ($r$) multiplied by a fudge factor ($f = 0.72$ by default).

$$ d_{ij} <= f * (r_i + r_j) $$

Some vdw_radii are built in to `MDAnalysis`, however any missing radii can be given via the `vdwradii` keyword:

In [None]:
nhaa_u = mda.Universe(nhaa['topology'], guess_bonds=True, vdwradii={'CL': 2.0, 'NA': 2.0})

In [None]:
nhaa_u.atoms.fragments

## 3. ChainReader

MD Trajectories are often created in a series of discrete simulations.
By supplying a list of trajectory filenames to `Universe` creation,
these will be read in sequence by the `ChainReader` class.

In [None]:
adk_dims = data.datasets.fetch_adk_transitions_DIMS()

print(adk_dims['trajectories'][:5])

In [None]:
chain_u = mda.Universe(adk_dims['topology'], adk_dims['trajectories'])

In [None]:
print(chain_u.trajectory)

## 4. `fetch_mmtf`

You can load structures from the Protein Data Bank using the `fetch_mmtf` method.
This will download the `mmtf` data from the PDB, and create a Universe from this:

In [None]:
u = mda.fetch_mmtf('5YVL')

print(u)

## 5. Creating new systems with MDAnalysis

Whilst `MDAnalysis` is designed for reading pre existing simulation files, there is also some features which allow the construction of systems

### Universe.empty and adding new attributes

The `Universe` object can also be constructed from the `Universe.empty` method, which is similar to `np.zeros`.

In [None]:

mda.Universe.empty?

Here we create an 20 atom Universe, with a trajectory attached.  The positions of all atoms will initially be zero

In [None]:
u = mda.Universe.empty(n_atoms=21, n_residues=7,
                       trajectory=True)

In [None]:
print(u.atoms)
print(u.residues)

In [None]:
print(u.positions)

In [None]:
for i, res in enumerate(u.residues):
    u.atoms[i * 3: (i + 1) * 3].residue = res

We can then add various topology attributes to these atoms:

In [None]:
u.add_TopologyAttr('masses', values=[10.0] * 21)
u.add_TopologyAttr('names', values=['A'] * 21)
u.add_TopologyAttr('types', values=['Ca'] * 21)
u.add_TopologyAttr('resids', values=range(7))


And finally we can write this `Universe` out to a file:

In [None]:
u.atoms.write('new.gro')