### This notebook introduces the fundamental objects of `MDAnalysis`:

 - the `Universe`,
 - `AtomGroup`, and `Atom` objects.
 - selecting and manipulating `AtomGroup`s
 - `Residue`s and `Segment`s
 - `Bond` `Angle` and `Dihedral`

# 1. Fundamental objects

## `Universes` and `AtomGroups`

> "If you wish to make an apple pie from scratch, you must first invent the Universe." 

> ~ Carl Sagan

First, we need to import `MDAnalysis`, giving us access to all the components in its namespace:

In [None]:
import MDAnalysis as mda
import MDAnalysisData as data
print(mda.__version__)
print(data.__version__)

One of the most fundamental objects in the `MDAnalysis` data model is the `Universe` object.
A `Universe` can be thought of as an interface to all the data of a simulation;
it contains all of a simulations' topology information (names, charges, masses etc) at the least,
but usually also includes trajectory information (positions, velocities etc) as well.

In order to do anything, we do need some actual molecular dynamics data to work with.
You should have downloaded a dataset package for this workshop;
Let's load a dataset.

To make a Universe, we need at the very least a topology file. We'll use one for a simulation system that includes adenylate kinase with the water removed:

We now have a `Universe` object. Since the topology (PSF) file we loaded contained both atom identities and bond information, the `Universe` is able to access these details.

We can access all atoms in the `Universe` through the `Universe.atoms` attribute.
This returns an `AtomGroup` which is probably the most important class we will learn about.

We previously learnt about `numpy` arrays, which represent a series of numbers.

An `AtomGroup` is like an array of atoms, and offers access to the data of these atoms through various attributes

All of these attributes of an `AtomGroup` return numpy arrays of the same length as the `AtomGroup` itself;
that is, each element corresponds to each atom in the `AtomGroup`, in order.

In general, `MDAnalysis` will try and extract as much information as possible from the files given to `Universe`.

## Working with individual atoms

By slicing an `AtomGroup` we can access individual `Atom` objects.
These `Atom` objects will have singular versions of the various attributes of `AtomGroup`s.

In general working with individual `Atom` objects is discouraged as it is inefficient and will lead to poor performance.

# 2. Selecting atoms

It is also rare that we want to operate on all atoms in the system!

`MDAnalysis` offers a few different ways to select atoms,
in this section we will go over the most useful methods.

## 2.1 Numpy style selections

As previously mentioned, an `AtomGroup` is like an array of atoms,
and therefore we can slice it exactly like we would slice a `numpy` array

### Fancy indexing

The simplest option to select specific atom is to use fancy indexing. You can specify the atoms in a list

or as a range

### Boolean indexing

You can also create an array with `True`/`False` values of the same length as the `AtomGroup`. Every atom for which the array is set to `True` will be selected.

We can create such a boolean array by doing a comparison of a `numpy` array:

or shorter:

## 2.2 Selection Strings and `select_atoms`

We've already seen that complex selections can be performed on `AtomGroups` using numpy style indexing.
However, `MDAnalysis` also features a CHARMM-style atom selection mechanism that is often more convenient.
We can consult the docstring for `select_atoms` to see the available selection keywords

In [None]:
u.atoms.select_atoms?

Although boolean selections work well enough for selecting out atoms from AtomGroups, the selection language makes more complex selections possible with probably less effort.

For example, we can create selections of all atoms in a particular residue type:

For name like selections, we can also use the `*` symbol to indicate a wildcard selection.

Here for example `name OD*` would select `OD1, OD2, OD3` etc

As a shortcut, multiple values can be given and these will be implicitly OR'd together.
For example to select all atoms with name NZ OR NH* in residues named LYS or ARG:

### Geometric selections

The `select_atoms` method also has various geometric keywords that make selecting atoms based on geometric criteria much easier.

If we want only residues that are involved in salt bridges, we can use our `AtomGroups` as part of additional selections. We can also use the `around` selection operator to specify only atoms within 4 angstroms. At the end we can get the full residues back.

## 2.3 Set operations

`AtomGroup`s can also be combined using `&` for "and" and `|` for "or":

These two selections are identical:

We can also 

By design, an `AtomGroup` can have repeats of the same atom, for example through this selection:

The `unique` property will return a version of the `AtomGroup` with only one of each Atom:

### Challenge:

- #### count the number of glycine residues in the protein
- #### select all Nitrogen atoms within 5.0A of a carbon atom
- #### select all oxygens that are bonded to a alpha Carbon

# 3. `ResidueGroups` and `SegmentGroups`

The `Universe` also gives higher-order topology objects, including `ResidueGroups` and `SegmentGroups`. We can access all residues in the `Universe` with:

And all segments with:

`ResidueGroups` and `SegmentGroups` also behave similarly to `AtomGroups`, with many of their methods returning `numpy` arrays with each element corresponding to a single residue or segment, respectively.

We can more easily get the number of glycine residues using a `ResidueGroup`.

or perhaps more Pythonic:

# 4. Accessing coordinates

The most important attribute of your atoms is undoubtedly their positions!

Again, the position information is made available via an `AtomGroup` in the `.positions` attribute:

This returns a `numpy` array, which can be manipulated as you have previously learnt.

### Challenge: 

- #### calculate the center of geometry of the C$_\alpha$ atoms
- #### select all atoms that are below the plane x=4.0

## 4.1 Trajectory manipulation

Only a single frame of trajectory information is accessible from an `AtomGroup` at any one time,
a `Universe` is created with the first frame of trajectory information loaded.

To access data from a different frame, we must manipulate the `Universe.trajectory` object.

By slicing the `.trajectory` attribute of a `Universe`, we change the currently loaded frame

After we have changed the currently loaded frame, the data we access from an `AtomGroup` will now correspond to the data from that frame:

A more common pattern for moving through the trajectory data is to slice the `.trajectory` object inside a `for` loop:

The trajectory can be sliced in any way that a `numpy` array or `AtomGroup` can be sliced, for example:

## Challenges

- #### How could you iterate through the trajectory backwards?


- #### How could you slice the trajectory to correspond to when the system time is after 600 ps?

## 4.2 The `Timestep` object

When the trajectory is sliced, it returns a `Timestep` object.
This object represents the data in the currently loaded trajectory frame.

It has some useful attributes such as:

# 5 Bonds, angles, and dihedrals

We can also get at connectivity information between atoms, such as bonds, angles, and dihedrals 

Want the actual value?

These work the same way as `AtomGroup`s. They're sliceable, and indexing them works too to give individual bonds, angles, dihedrals.

## Special Functions

The `split` function can be used to return a list of atomgroups that are separated by the specified level. For example `u.atoms.split('residue')` will return a list of all residues as atomgroups.

Another method is `groupby`. You use it to select for a topolgy attribute (*resnames*, *names*, *masses*, ...). `groupby` will return a dictionary where the keys are all unique values of a given topology attribute and the values are a group of atoms that have this value.

Multiple topology attributes can also be given.