# <center>Day 1, Session 1: Molecules</center>

<center>MDAnalysis Workshop - May 2021 - PRACE/SURF</center>

## Getting started with MDAnalysis

**Installing MDAnalysis**

We've done the hard work for you of installing MDAnalysis into the JupyterHub environment.

If you want to install on your own computer, this is normally done through **pip** or **conda** (see https://www.mdanalysis.org/pages/installation_quick_start/). Note though that in this workshop, we're using the new MDAnalysis 2.0.0 beta version, which is installed using pip by:

`pip install MDAnalysis==2.0.0b0`

If you want to use the example data used here, you'll also need MDAnalysisTests:

`pip isntall MDAnalysis==2.0.0b0`

### The general structure of MDAnalysis

The two fundamental classes of MDAnalysis are the `Universe` and the `AtomGroup`.



<center><img src="imgs/mdaclasses.png" alt="mda" style="width: 1000px;"/></center>

- **The `Universe`** contains everything about a molecular dynamics system
  - Static information: atoms and their connectivities
  - Dynamic information: The trajectory
  
- The atoms in a `Universe` can be accessed through a hierarchy of containers:
 - *Atoms* can be grouped together into **an `AtomGroup`** 
    - *Residues* are made up of *atoms*. They can be grouped into `ResidueGroups`
      - *Segments* are made up of *residues*. They can be grouped into `SegmentGroups`.

**A (very) basic workflow for an analysis in MDAnalysis:**

1. import MDAnalysis
2. load a Universe
3. define an atomgroup
4. collect position data
5. analyse!

# `Universe`

**The basic command for loading a universe is:**

 `u = mda.Universe(topology, trajectory)`

- The *topology* file must contain the atom information 
- The (optional) *trajectory* file(s) contains the positions of atoms with time - more on this next Session. 

Note that some files can double as both a *topology* and a *trajectory*.  

MDanalysis supports [over 40 input file types](https://userguide.mdanalysis.org/2.0.0-dev0/formats/index.html#formats)

In [1]:
# First we import MDAnalysis
import MDAnalysis as mda

# Let's get some example data
from MDAnalysis.tests.datafiles import PSF, DCD

# and now load our universe!
u = mda.Universe(PSF, DCD)
print(u)

<Universe with 3341 atoms>


**Key properties of a `Universe`:**

- `atoms`: an `AtomGroup` containing all of the system's atoms
    - similarly, `segments` and `residues`; a `SegmentGroup` and a `ResidueGroup`, respectively
    
- Various bond and angle information, as `TopologyGroups`: `bonds`, `angles`, `dihedrals`, `impropers` (if found in the topology file)

- `trajectory`: next Session!

In [2]:
u.bonds

<TopologyGroup containing 3365 bonds>

# AtomGroups

**An `AtomGroup` is an "array" of atoms.**

We can get various properties of each atom contained in an `AtomGroup` through attribues, e.g.:

   - `names`
   - `resnames`
   - `resids` 
   - `charges`
   - `masses`

Exactly which properties you can get depend on what is read from the topology (see the [documentation](https://userguide.mdanalysis.org/2.0.0-dev0/formats/index.html#formats))

In [3]:
ag = u.atoms
ag.names

array(['N', 'HT1', 'HT2', ..., 'C', 'OT1', 'OT2'], dtype=object)

**`ResidueGroup`s and `SegmentGroup`s work similarly.**


We can get various properties of each atom contained in it through attribues, using `atoms`, `residues` and `segments`

In [4]:
rg = u.residues
rg.resids

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103, 104,
       105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
       118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
       131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
       144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
       157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
       170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 18

We don't ususally want to work with the whole set of atoms in a trajectory. We need a way to create `AtomGroups` containing selected atoms...

# Atom selections

### But first... visualising selections

[nglview](https://github.com/nglviewer/nglview#usage) will allow us to view MDAnalysis Universes and AtomGroups inside Jupyter notebooks.

We've intalled if for you on the workshop Jupyter hub, but you can also install it on your own computer (*Beware: it's not always straightforward getting it to actually work. Try making sure to install nglview and launch the notebook from the same conda environment*)

In [5]:
# first, import nglview
import nglview as nv
  
# add a universe (or atomgroup)
view_u = nv.show_mdanalysis(u)

# launch the viewer
view_u



NGLWidget(max_frame=97)

## Selecting atoms to create AtomGroups

 - indexing

In [6]:
ag = u.atoms[0:2]

In [7]:
view = nv.show_mdanalysis(ag)
view

NGLWidget(max_frame=97)

**Selection strings and `select_atoms`**

We can use the `select_atoms()` method of an `AtomGroup` or `Universe` to return an `AtomGroup` based on a selection string.

There's a lot of options for selection strings (see the  [UserGuide]( https://userguide.mdanalysis.org/2.0.0-dev0/selections.html)); including:

 - selection by attribute (e.g. residue name (`resname`)), including presets like `protein`
 - wildcard matching (`*`)
 - boolean operators (`and`, `or`, `not`)
 - geometric (e.g. `around`, `sphzone`, ...)
 - and more!
 
 

In [8]:
ag = u.select_atoms('protein')
view_ag = nv.show_mdanalysis(ag)
view_ag

NGLWidget(max_frame=97)

In [9]:
view_ag.add_licorice()

# Working with coordinates

**The most useful attribute of out atoms are their coordinates, available in the `positions` attribute of an `AtomGroup`**

The positions are returned as a numpy array, which we can then readily manipulate.

There are some built-in functions based on position data, e.g. `center_of_mass()`, `center_of_geometry()`

In [10]:
pos = u.atoms.positions
print(pos)

[[ 11.736044    8.500797  -10.445281 ]
 [ 12.365119    7.839936  -10.834842 ]
 [ 12.0919485   9.441535  -10.724611 ]
 ...
 [  6.512604   18.447018   -7.134053 ]
 [  6.300186   19.363485   -7.935916 ]
 [  5.5854015  17.589624   -6.9656615]]


This is just data from one frame - in the next session, you'll learn how to work with trajectories to get data across a whole simulation.

## A summary of Lecture 1

Most simulation analysis will involve extracting position data from certain atoms.

- A `Universe` contains all information about a simulation system

- An `AtomGroup` contains information about a group of atoms

- We can use `Universe.select_atoms()` to create an `AtomGroup` containing specific atoms from a `Universe`

- Positions of atoms in an AtomGroup are accessed through `AtomGroup.positions`

### Now - on to the first tutorial!

Find the tutorial notebook at https://jupyter.lisa.surfsara.nl/jhlsrf005/hub/ - look for Day1-Session1-Practical/session1_practical.ipynb

**Remember:**
- Go at your own pace!
- Ask questions!
- Be respectful!
- Take breaks!