#Note
* pytraj/cpptraj use Amber mask for atom selection
* To compat with cpptraj output, pytraj use two types of indexing, 0-based and 1-based. Whenever using an integer, pytraj used 0-based index and whenever using a `string`, pytraj used 1-based index

In [1]:
from pytraj import io

# load Topology file for demonstraction.
# if you download `pytraj` from github, you can find the below top file in $PYTRAJHOME/tests/data/
top = io.load("../tests/data/Tc5b.top")
traj = io.load("../tests/data/md1_prod.Tc5b.x", top)

### Two types of indexing in pytraj

In [2]:
# 0-based indexing whenever using integer
top[0]

<N-atom, resnum=0, n_bonds=4>

In [3]:
# 1-based indexing whenever using `string` for atom mask
top['@1']

<N-atom, resnum=0, n_bonds=4>

## Atom Mask Selection in AMBER

(from Amber14 manual: http://ambermd.org/doc12/Amber14.pdf)

#### Amber Masks
A "mask" is a notation which selects atoms or residues for special treatment. A frequent usage is ﬁxing or
tethering selected atoms or residues during minimization or molecular dynamics.
The following lines are partially copied from the original AMBER documentation. For more details, refer to the
entire section of that documentation describing the ambmask utility.
The "mask" selection expression is composed of "elementary selections". *These start with ":" to select by
residues, or "@" to select by atoms.* Residues can be selected by numbers (given as numbers separated by commas,
or as ranges separated by a dash) or by names (given as a list of residue names separated by commas). The same
holds true for atom selections by atom numbers or atom names. In addition, atoms can be selected by AMBER
atom type, in which case "@" must be immediately followed by "%". The notation ":*" means all residues and
"@*" means all atoms. The following examples show the usage of this syntax.

Residue Number List Examples
```
:1-10 = "residues 1 to 10"
:1,3,5 = "residues 1, 3, and 5"
:1-3,5,7-9 = "residues 1 to 3 and residue 5 and residues 7 to 9"
```
Residue Name List Examples
````
:LYS = "all lysine residues"
:ARG,ALA,GLY = "all arginine and alanine and glycine residues"
```
Atom Number List Examples Note that these masks use the actual sequential numbers of atoms in the ﬁle.
This is tricky and a serious source of error. You must know these numbers correctly. Using the atom numbers of
a PDB ﬁle written out by an AMBER tool is an appropriate way to avoid pitfalls. Do not use the original atom
numbers from the raw PDB ﬁle you started with.
 
@12,17 = "atoms 12 and 17"
@54-85 = "all atoms from 54 to 85"
@12,54-85,90 = "atom 12 and all atoms from 54 to 85 and atom 90"
 
Atom Name List Examples  
@CA = all atoms with the name CA (i.e., all C-alpha atoms)
@CA,C,O,N,H = all atoms with names CA or C or O or N or H
(i.e., the entire protein backbone)
 

### let's try some example

``` python
    >>> top[':1-10'] # a list of atoms from residue 1 to 10
    >>> top[':1,3,5'] # a list of atoms in residue 1, 3, 5 (index starts from 1 when using string index)
    >>> top[[0, 2, 4]] # a list of atoms with indcies 0, 2, 4 (index starts from 0 when using integer index)
    >>> top['@CA'] # a list of CA atoms
    >>> top[':2-10@CA'] # a list of CA atoms from residue 2 to 10 (index starts from 1)
```

In [4]:
top[[0, 2, 4]]

[<N-atom, resnum=0, n_bonds=4>,
 <H2-atom, resnum=0, n_bonds=1>,
 <CA-atom, resnum=0, n_bonds=4>]

In [5]:
print (traj)
# get new Trajectory, keep only coords of residues 1 to 3, 5, 7 to 9 (index starts from 1)
t = traj[':1-3,5,7-9']
print (t)
print (t.top.residue_names)

<pytraj.Trajectory with 10 frames: <Topology with 1 mols, 20 residues, 304 atoms, 310 bonds, non-PBC>>
           
<pytraj.Trajectory with 10 frames: <Topology with 3 mols, 7 residues, 126 atoms, 124 bonds, non-PBC>>
           
{'ASP ', 'ASN ', 'LEU ', 'LYS ', 'TYR ', 'GLN '}


In [6]:
# all carbons except backbone alpha and carbonyl carbon
top['@C= & !@CA,C'][:5] # print only first 5 atoms

[<CB-atom, resnum=0, n_bonds=4>,
 <CG-atom, resnum=0, n_bonds=3>,
 <CB-atom, resnum=1, n_bonds=4>,
 <CG-atom, resnum=1, n_bonds=4>,
 <CD1-atom, resnum=1, n_bonds=4>]

In [7]:
# all SER and ARG atoms except those which are in residues 1-10 and which are CA or CB
top[':SER,ARG & !(:1-10 | @CA,CB)'][:5] # print only first 5 atoms

[<N-atom, resnum=12, n_bonds=3>,
 <H-atom, resnum=12, n_bonds=1>,
 <HA-atom, resnum=12, n_bonds=1>,
 <HB2-atom, resnum=12, n_bonds=1>,
 <HB3-atom, resnum=12, n_bonds=1>]

In [8]:
# all heavy atoms
new_top = top.strip_atoms('@H=', copy=True)
print (new_top.atom_names)

{'NE  ', 'O   ', 'CA  ', 'OE1 ', 'CD  ', 'ND2 ', 'CZ  ', 'NZ  ', 'CG  ', 'CG2 ', 'CE1 ', 'CE2 ', 'CE3 ', 'CG1 ', 'CD2 ', 'N   ', 'C   ', 'OH  ', 'NH2 ', 'CZ2 ', 'OD1 ', 'NH1 ', 'CD1 ', 'OD2 ', 'NE2 ', 'OXT ', 'OG  ', 'CE  ', 'NE1 ', 'CZ3 ', 'CB  ', 'CH2 '}


In [9]:
# all H in LYS
new_top = top.strip_atoms('!(:LYS@H=)', copy=True)
print (new_top.atom_names)
print (new_top.residue_names)
print (new_top)

{'HA  ', 'HB2 ', 'HE3 ', 'H   ', 'HD3 ', 'HZ2 ', 'HG3 ', 'HD2 ', 'HB3 ', 'HG2 ', 'HE2 ', 'HZ1 ', 'HZ3 '}
{'LYS '}
<Topology with 13 mols, 1 residues, 13 atoms, 0 bonds, non-PBC>


In [16]:
# inplace-strip all H atoms
top.strip_atoms('@H=')
print (top.atom_names)

{'NE  ', 'O   ', 'CA  ', 'OE1 ', 'CD  ', 'ND2 ', 'CZ  ', 'NZ  ', 'CG  ', 'CG2 ', 'CE1 ', 'CE2 ', 'CE3 ', 'CG1 ', 'CD2 ', 'N   ', 'C   ', 'OH  ', 'NH2 ', 'CZ2 ', 'OD1 ', 'NH1 ', 'CD1 ', 'OD2 ', 'NE2 ', 'OXT ', 'OG  ', 'CE  ', 'NE1 ', 'CZ3 ', 'CB  ', 'CH2 '}


### what's about being interested only in atom indices?

In [10]:
# use () instead of []. [] is normally used for list, dictionary while () is used for callable function
top('@CA').indices

array('i', [4, 18, 37, 58, 77, 94, 118, 137, 159, 171, 178, 193, 199, 210, 221, 228, 260, 274, 288, 294])

In [11]:
top('@CA') # return an AtomMask object to pass around.

<pytraj.AtomMask.AtomMask at 0x2aaac918f048>

### pytraj/cpptraj support distance-based mask selection too
```
    we need to load Frame object (as trajectory snapshot with xyz coords and other methods come with)
```

In [12]:
# load traj with trajectory filename and preloaded Topology
traj = io.load("../tests/data/md1_prod.Tc5b.x", top)
traj

<pytraj.Trajectory with 10 frames: <Topology with 1 mols, 20 residues, 304 atoms, 310 bonds, non-PBC>>
           

In [13]:
# to use distance-based mask selction, we need to set_reference_frame
# example: set_reference_frame for the last frame
top.set_reference_frame(traj[-1])

# do the mask selection. Pick up all atoms within 5.0 Angstrom from given atom 1 (or 0 if using integer as indexing)

top("@1 <:5.0").indices

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249])

In [14]:
# what can we do with atom mask?
# we can use AtomMask object to save new traj having only selected atoms

# save to AtomMask object to pass around
atm = top("@1 <:5.0")

# getting new Trajectory object having only selected atoms

print ("before")
print (traj)
print ("after")
new_traj = traj[atm]
print (new_traj)

before
<pytraj.Trajectory with 10 frames: <Topology with 1 mols, 20 residues, 304 atoms, 310 bonds, non-PBC>>
           
after
<pytraj.Trajectory with 10 frames: <Topology with 2 mols, 3 residues, 59 atoms, 57 bonds, non-PBC>>
           


In [15]:
# wanna get coords from new_traj?

# tolist(), coords of first atom of first frames
print (new_traj.tolist()[0][0])

[-16.492000000000001, 12.433999999999999, -11.018000000000001]


``` python
    >>> # try those too
    >>> print (new_traj.to_ndarray())
    >>> print (new_traj.tolist())
    >>> print (new_traj.xyz)
    >>> print (new_traj[0, :])
```