# Structural Bioinformatics

## Table of content

1. [Structural-Bioinformatics](#Structural-Bioinformatics)
   1. [Table of content](#Table-of-content)
2. [Basics](#Basics)
   1. [3D-QSAR](#3D-QSAR)
   2. [Molecular Dynamics simulations](#Molecular-Dynamics-simulations)
   3. [Homology modelling](#Homology-modelling)
   4. [Docker](#Docker)
   5. [Structure Assessment](#Structure-Assessment)
3. [Protein Structure](#Protein-Structure)
4. [PyMol](#PyMol)
   1. [Working with PyMol](#Working-with-PyMol)
   2. [Getting Structure Files](#Getting-Structure-Files)
   3. [Selections](#Selections)
      1. [Selection criteria](#Selection-criteria)
      2. [Picking multiple values](#Picking-multiple-values)
      3. [Picking one protein/Macromolecule](Picking-one-protein/Macromolecule)
   4. [PyMol Scripts](#PyMol-Scripts)
   5. [Log Files](#Log-Files)
   6. [Creating Movies with PyMol](#Creating-Movies-with-PyMol)
   7. [Measurements in PyMol](#Measurements-in-PyMol)
      1. [Measurements using commands](#Measurements-using-commands)
   8. [States](#States)
5. [Modelling Structures](#Modelling-Structures)
   1. [Homology modelling](#Homology-modelling)
   2. [Threading](#Threading)
   3. [Ab initio modelling](#Ab-initio-modelling)
   4. [AlphaFold](#AlphaFold)
   5. [SwissModel](#SwissModel)
   6. [Comparing Structures](#Conparing-Structures)
6. [Biopython PDB](#Biopython-PDB)
   1. [Reading in pdb files](#Reading-in-pdb-files)
   2. [Accessing atoms](#Accessing-atoms)
   3. [Simple calculations](#Simple-calculations)
      1. [Distance between two atoms](#Distance-between-two-atoms)
      2. [Angles](#Angles)
      3. [Dihedrals](#Dihedrals)
7. [Exercises](#Exercises)
   1. [Exercise 1 - Using PyMol](#Exercise-1---Using-PyMol)
   2. [Exercise 2 - New pdb files](#Module-2---New-pdb-files)
   3. [Exercise 3 - Selections in PyMol](#Exercise-3---Selections-in-PyMol)
   4. [Exercise 4 - PyMol Scripts](#Exercise-4---PyMol-Scripts)
   5. [Exercise 5 - A simple Log-File](#Exercise-5---A-simple-Log-File)
   6. [Exercise 6 - Measuring in PyMol](#Exercise-6---Measuring-in-PyMol)
   7. [Exercise 7 - Measurement Script](#Exercise-7---Measurement-Script)
   8. [Exercise 8 - Modelling with SwissModel](#Exercise-8---Modelling-with-SwissModel)
   9. [Exercise 9 - Comparing structures](#Exercise-9---Comparing-structures)
   10. [Exercise 12 - Structural calculations](#Exercise-12---Structural-calculations)

# Basics

Structural Bioinformatics works with 3D structures of molecules, proteins, DNA, RNA, small organic molecules like ATP

some subfields:

- Drug-Design with 3D-QSAR
- Analysis of protein dynamics with Molecular simulations
- Creation of new protein structures in silico, with the help of algorithms using Homology Modelling, Threading or ab initio Modelling
- Research of Protein-Protein or Protein-Ligand interactions with programs like Docker
- Assessment of structures

## 3D-QSAR

3D quantitative structure-activity relationship

- analyzing a binding pocket regarding bio-physical/biochemical properties
- finding small organic molecules fitting these criteria
- calculating potential binding constants to determine whether they bind better or worse than existing drugs

## Molecular Dynamics simulations

- molecular dynamics simulations or MD simulations
- uses energy functions and force fields calculations to simulate the motion of molecules
- e.g. the contents of the part of a cell including all proteins present there

## Homology modelling

- we don't know all 3D structures of all proteins
- to be able to analyze these proteins we create models for the proteins we don't knwo
- may not be necessayr for much longer since AlphaFold did millions of structures and offered them for free to the scientific community. Neural network fed with sequence and structures from pdb that now generates very good structures, no one knows how 

## Docker

- analyzed the binding of small organic molecules/proteins/nucleic acids on other proteins
- calculates potential binding configurations
- 2nd step after QSAR

## Structure Assessment

- 3D structures, especially older ones, are rather error prone
- necessary to analyze the quality of the structure
  - simple geometric calculations (bond lengths, angles)
  - nearings to determine the overall energy state (calculates the distance of all non-covalently bound atoms)

# Protein Structures

3D structures are the result of amny different interactions

- Hydrogen Bonds
- Van-der-Waals Bonds (non polar interactions)
- Disulfide Bonds
- Salt bridges
- Dipol interactions

Folding depends on factors like temperature, pH, solvent, and others, often we don't have the structure with the lowest possible energy, becasue other factors are maintained:

- Creating a hydrophile surface
- Creating a hydrophobic core
- Having a functional active centre
- Be able to bind molecules like ATP and GTP
- Catalyze essential reactions in the cell

Structures:

- primary structure
  - aa sequence
- secondary structure
  - alpha Helix (or other special casses of helix)
  - beta sheet (parallel or anti-parallel)
  - Turn/Loop
- tertiary structure
  - complete 3D structure of one macromolecule, i.e. one aa chain
- quartery structure
  - assembly of multiple proteins into one bigger complex
  - homogenous (i.e. always the same structure)
  - heterogenous (i.e. several different structures)


Naming of the C atoms

- starts with the chiral C (C alpha or CA)
- continues down the side chain: beta, gamma etx.
- Note that e.g. Valine has *2* C gamma atoms (CG1 and CG2)

![image.png](attachment:acfbac81-115e-4c42-9696-b496a82a096a.png)

![image.png](attachment:7dd608d7-c6ff-4198-ae9e-8ae2b324ad44.png)

# PyMol

- one of the most common tools to look at protein structures
- can produce high quality 3D imagesof small molecules or biological macromolecules
- it's a visualization tool at base to which editing and measurements funtions were included
- structure files end on `.pdb`

## Working with PyMol

- when starting a PyMol Session through
  - File -> Open
  - File -> get PDB; opens a Window where we can directly search and download the data from PDB into "Lisa" or your working directory
  - the default folder can be changed with `set fetch_path path` to e.g. create a folder specifially for all structures
- all structures are put in the same window, i.e. session
- once structures are loaded they appear on the right side and can be activated by clicking on them
- all structures are there but only activated structures are displayed, the rest are hidden
- default representation:
  - proteins and nucleic acids (i.e. everything in the Atom Section) is in cartoon representation
  - organic cofactors in the HeteroAtom section are in stick representation
  - inorganic cofactors (e.g. sulfates etc.) are in sphere representation

each structure has five menus:

- Action (A)
  - center: sets the center of the canvas to the center of this protein
- Show(S)
  - allows the choice how to display the protein
  - show as -> choice will choose one single representation
  - show -> choice will add the representation to the already existing picture
  - options:
    - lines
    - spheres
    - mesh
    - ribbon
    - cartoon
    - sticks
    - dots
    - surface
- Hide (H)
  - the opposite of show, removes the representaion layer chosen
  - e.g. if I have 5 representation on top of each other and only want to remove one I don't havew to reset everything but can just remove the one through Hide
- LABEL (L)
- Color (C)
  - allows to color according to specific uses:
  - element:
    - yellow: sulfur
    - red: oxygen
    - blue: nitrogen
    - or any other as chosen
  - chain:
  - secondary structure element: helices, sheets,loops
  - by representation (e.g. color of the cartoon)
  - spectrum: start at one end of the aa chain and continue to the other end changing color gradually

3 Button Viewing:

- left click: rotationg
- right click: (or 2 finger click on trackpad): zooming
- "mouse wheel" (works on apple mouse as well, on trackpad moving two fingers up and down): scrolling through layers (clipping)
- cmd + right click: moves the whole camera

Moving a single molecule:

- 2 Button editing
- cmd + shift + left click

The viewing plane:

we can adjust which "layer" we see either through scrolling with the mouse wheel or through `clip`. The layer is called a `slab` that is everything visible between the `near` plane and the `far` plane. Using `clip` in the command line we can adjust it:

- `clip near, x` moves the near plane x Angstrom (negative number: further away, positive number: closer to me), changes it's size!
- `clip far, x` moves the far plane x Angstrom (negative number: further away, positive number: closer to me), changes it's size
- `clip move, x` will move the whole slap keeping it's size
- `clip slab, x` will set the width of the slab to x Angstrom. For normal proteins 500 should be enough to see everything



Creating images

- switch to a white background: Display -> Background -> White (alpha background = no background in exported image)
- rendering the image, image size in px size can be added
- either through the command line or top right Draw/Ray
- renders everything in the window exactly as it appears

```
ray

# with size 1024x768 pixels
ray 1024,768
```

- to save: File -> Export Image As -> PNG...
- or having used the button Draw/Ray simply copy or save

![image.png](attachment:fedcea84-7cb1-4395-911a-86b7c7463ccc.png)

## Getting Structure Files

There are two formats of structure files, `.pdb` or the newer `.mmCIF`. Pymol and Python both work with both formats, some programs only work with one or the other.

Databases for structure files:

- pdb files from RCSB-PDB can be accessed
  - from the website (www.rcsb.org)
  - directly in PyMol using File -> get PDB
  - directly in PyMol using the command `fetch` followed by four letter structure code

```
fetch 1bl8
```

- AlphaFold (www.https://alphafold.ebi.ac.uk) from the website
- Modbase (https://modbase.compbio.ucsf.edu)
  - Download through "Perform action on this model" -> Coordinate File (PDB)
  - creating by a lab when RCSB-PDB didn't allow modelled structures anymore
  - rather old modelled structures


## Selections

PyMol allows to create selections containing only atoms that fulfill certain criteria, such as:

- chains
- specific amino acids (e.g. all aromatic)
- specific atoms
- ligands

Remember the makeup of a pdb file to write the criterium correctly!

Creating a selection can be done by writing them into the command line:

General syntax:

```
select selection_name, criterium value
```

If it was done correctly a selections object appears in the side bar with the name in brackets `(selection_name)`

### Selection criteria

|Statement|Description|
|--|--|
|symbol |Means the atom symbol in the last column of pdb files, e.g. O or N|
name |Refers to the atom name, e.g. CA or CB|
resn| Refers to the name of the residue, e.g. HIS or ALA|
resi |Refers to the residue id or residue number, e.g. 200 or 250|
chain |Refers to the protein chain, e.g. A or B
ss| Refers to the secondary structure of the protein
||H: Helix
||S: Sheet
||L: Loop
||““: Unstructured|
|id |Refers to the Atom number, is not very practical as long as you don‘t know the exact number of atoms in your selection|

Examples:

```
#choosing all cystein residues
select CYS, resn cys

# choosing the N-terminus
select nterm, resi 1-10

#choosing a helix in chain A:
select HelixA, chain A & ss H

#choosing cystein residues and nterminus
select big_selection, resn cys | redi 1-10
```

- logic AND is `&`; logic OR is `|`
- a criterium can be negated with `NOT`
- wildcards are accepted (`*`)
 
example: all secondary structures except undefined:

```
select sec_stru, NOT ss ""
```

- including a measuremen: selecting all residues that are within 5 Angstrom of Object1:

```
select Obj2, byres all withing 5 of Obj1
```

**Website with the naming of the atoms in amino acids:**

- activate Toggel labels to see the names
- Replace the last 3 letters with the respective 3 letter code to see other amino acids

https://www.rcsb.org/ligand/TRP

**Side note**

To see the sequence go to `Display -> Sequence` to display the aa sequence (including all ligands and associated water molecules) at the top over the picture. `O` stands water (the oxygen in H2O, since H is not represented). 

- Most often residue counting starts at the protein and ends with the ligands, bit this ios not always the case!!!
- something to keep in mind when selecting by `resi`

### Picking multiple values

- for `name` or `resn` multiple arguments can be added together by `+`
- for `resi` or `id` use ranges
- `AND` or `&`; `OR` or `|`; both versions possible
- a criterium can be negated with `NOT`

Examples:

```
select proteinbackbone, name CA+C+N+O
select Aromatic, resn HIS+PHE+TYR+TRP
select nterm, resi 1-10
select ntermAt, id 1-50
select Test, id 1-50+100-150

#this is short for:
select proteinbackbone, name CA | name C | name N | name O
select Aromatic, resn HIS | resn PHE | resn TYR | resn TRP
etc.
```

### Picking one protein/Macromolecule

- per default selections are executed on all structures/object available
- if the selection is to apply to only one object (which can also be a previous selection) it has to be specified:

```
select selection_name, object_name & criterium value
```

Example:

```
select TRP_3FQK, 3FQK & resn TRP
select CA_TRP_3FQK, TRP_3FQK & name CA
```






## PyMol Scripts

Especially useful to create the exact same image over and over again or when things have to be measured. PyMol scripts have the extension `.pml`

The working directory is where the PyMol script is saved!

Helful website:

https://pymolwiki.org/index.php/Category:Commands

- The first command should always be `reinitialize` to reset the session to an empty one
- `cd DIR_NAME` changes the working directory
- `load` opens a file
- `fetch` downloads the file
- `show, filename` activates the different representations of our structure/selection of the structure
  - `show cartoon`: activates the cartoon for everything
  - `show lines, Modell`:  activate lines only for the object Modell
  - `show dots, ss h`: shows the dots only for the helices in (all) object(s)
- `show_as, objectname` changes the representation as specified
- `hide`: hides certain representations
  - e.g. `hide everything`
- `color`: switch between coloring options
  - `color red`
  - `color atomic`
    - to choose a particular cholor scheme for atomic follow the abive by: `util.cbay, obejectname`
    - cbay = color by atom, y = yellow to turn the carbon yellow
    - cbab to turn carbon blue, cabg to turn carbon green etc.
  - also possible to color only parts of a structure, multiple conditions have to be placed in brackets
    - `color yellow, ss h`
    - `color red, (1kim and bame C*)`
- `bg colorname` sets the background
  - `bg white`
- `center objectname`: places a ceratin object at the center of the session
- `zoom Objectname`: zooms into a certain object
  - `zoom complete=1`: shows the everything
  - just `zoom`goes to default zoom
- `orient` also shows all objects
- `ray` renders the image
- `png imagename.png` saves the image
- comments can be added with `#`


The script is started through File -> Run Script... where you open the script and it is automatically executed



Example:
```
reinitialize
fetch 1KIM
hide everything
show cartoon
ray
bg white
png image1.png
```

## Log Files

Simple way to make sessions repeatable

- everything done is recorded and saved in a log file which can be reused later

Version 1 to use it via the File menu:

- File -> Log File -> Open: create the log file
- File -> Log File -> Close: saves everything in the log file
- WHEN DOING THIS NEVER USE `reinitialize`!!!

Version 2 to use it via the command line:

- `log_open filename` to create the log file, which is saved either in the last folder where †I opened something or my default folder (unless I use an absolute path with the filename)
- `log_close` to close and save all changes in the log file

Log files can be run like normal PyMol Scripts: File -> 

## Creating Movies with PyMol

https://pymolwiki.org/index.php/MovieSchool

General appraoch:

1. Reset the movie maker

```
mclear
```

2. Define the number of frames, images in your moviem, in PyMol all movies have 30 Frames/second

```
mset state xNumber_of_frames

mset 1 x36
```

3. Add some images to your frames

e.g. rotating the image:
```
util.mroll(1,36,1)
```

4. (Interpolate between the images in your frames)

5. Export the movie

Export Movie As

## Measurements in PyMol

Wizard -> Measurement opens the Measurement Wizard Sidebar. 

Best to work in stick representation -> makes it easier to click the atoms of interest

> The measurements cannot be exported in PyMol, this is really for a few somple measurements and their graphical representation. For many measurements or further statistical analysis use Biopython, where the same measurements can also be calculates

Measurement Modes are under `Distances`

- `Distances`: distance between two atoms
  - choose 2 atoms, result is in Angstrom
- `Distances`: to Rings: distance between center of aromatic rings of other atoms
  - choose any atom of a ring and the atom of interest
  - if two atoms are chosen that are not part of a ring it's the same a default Distances
- `Angles`: measure angles three atoms form in 3D space
   - choose 3 atoms, the angle of the triangle is calculates at the tip that is defined by the second atom chosen
- `Dihedral`: rotation angle around a bond defined by four atoms
  - rotation angle over a chemical bond
  - we choose 4 atoms that are in a row
  - bond of interest is between the 2nd and 3rd atom chosen
  - we see the angle in 3D between the line between 1&2 and the line between 3&4 through the connection of 2&3 (or something like that)
- some more measurements for neighboring atoms

Dihedral angles:
![image.png](attachment:fc1e9e7b-7231-448f-ae43-b650d83a8e4d.png)


![image.png](attachment:c3052ad3-6c25-41c9-a6ab-dd695bd749d6.png)

### Measurements using commands

- `distance ,selection1, selection2`
- `angle ,selection1, selection2, selection3`
- `dihedral ,selection1, selection2, selection3, selection4`

**NOTE the space AFTER `distance` and BEFORE the `,`** this is a placeholder for the name which is left emtpy here

the measurements can be named:

```
distance name, selection1, selection2
```

the selections can be single or multiple selections, but always goes to actual atoms, impossible to measure rings

If several atoms are selected the distance is calculated for all combinations

Examples:

```
distance , resi 10 and name CA and chain A, resi 40 and name CA and chain A
distance , resi 10 and name CA and chain A, resi 35-42 and name CA and chain A
```

for angle and dihedral or several measurements the command gets very long, useful to save the atom(s) in a selection and run the measurement on that

```
select atom1, resi 544 and name CB and chain A
select atom2, resi 544 and name CG and chain A
select atom3, resi 544 and name CD and chain A
select atom4, resi 544 and name NE and chain A

dihedral ,atom1, atom2, atom3, atom4
``



## States

Structures can have different states (models) of the structure present in the pdb-file (e.g. 6ANF). They are all in the pdb file, several coordinate sections under Model1 Model2 et.

- This is something, that is especially prevalent when dealing with NMR-structures and allows us to compare different conformations
- You can switch between different states with the „movie-menu“ by clicking on the different arrow-buttons or at the bottom at Global Frames

![image.png](attachment:e84b59b0-b8ad-46ce-a152-28771f067ce6.png)
![image.png](attachment:21da30d3-4551-483e-b593-db3d4ac7a489.png)
![image.png](attachment:30dfc674-dc72-44ef-a402-e0ce4819434c.png)

- You can also split up the structure into it‘s different states via the action menu
- Afterwards you have multiple copies of the structure, one for each state, available in the sidebar 


![image.png](attachment:e4e76103-600c-4968-b620-eb5af524cecf.png)

# Modelling Structures

Finding protein structures via X-Ray, crystallography, Cryo-EM or NMR can't keep up with the amount of sequences known through modern sequences approaches. 

To fill this gap *in silico* methods to find protein structures were developed

Postulation of Christian Anfinsen: 

Only one possible structure that is stable with a given primary structure. The 3D structure of a native protein in its normal physiological milieu is the one in which the Gibbs free energy of the whole system is lowest; that is, that the native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment.

Levinthal's Paradox:

Extremely large number of possible conformations of proteins according to the possible dihedral angles and bond angles, yet only one is observed in nature.

The protein folding problem:

1. What is the folding code?
2. What is the folding mechanism?
3. Can be predict the native structure of a protein from its aa sequence?
   - according to AlphaFold: yes, but we don't know how.

Three methods for protein structure modelling:

1. Homolopgy modelling
2. Threading
3. Ab initio modelling

## Homology modelling

- Modelled using a known protein strucure, e.g. from X-Ray
- unknown sequence is aligned to the known sequence
  - gaps are filled using a knowledge based approach
  - gaps should be in unstructured regions
- protein backbone is generated using the coordinates of the template as blue-print
- side chains of the aa are added to the structure, again using the coordinates of the template as a partial blueprint

## Threading

- also requires a template
- template is based on structural data as well as sewuence alignment
- can use several templates, e.g. for different domains
- remaiing process is similar to homolgy modelling

## Ab initio modelling

- Creates a protein structure without a template
- The structure is generated from scratch with the help of energy functions
- Very time and resource consuming
- Only feasible for small proteins

## AlphaFold

- state-of-the-art AI developed by DeepMind
- computationally predicts protein structures with unprecedented accuracy and speed
- released > 200 million protein predictions together with EMBL
  - include nearly all catalogued protein sequences known
- dominated CASP
  - Critical Assessment of Structure Prediction
  - was long dominated by Zhan Lab (I-TASSER) and Baker lab (Rosetta)
  - groups have to create models for proteins where the structure is experimentally resolved but not yet published
  - the models are compared and a winner is found
  - scores are usually 70-75, AlphaFold scored 90 and henceforth refrained from entering the competition

## SwissModel

One web page you can use to generate Homology Models is the SwissModel Repository 

https://swissmodel.expasy.org/interactive

- paste your sequence into an entry field or upload a fasta file (blue)
- Afterwards by clicking on Build Model you can start the process
- gives back 1+ results, for each results:
  - info about the template
  - which parts of the sequence was used
  - quality measurements
    - GMQE (highest 100, lowest 0), percentage how confident in structure
  - can be downloaded by clicking on the name
- if the result is not statisfactory more models can be generated using different templates by going to the tab Templates

## Comparing Structures

To find the ideal structure

- load them all into PyMol and superpose them structurally
- pick one structure onto which the others will the superposed (usually the experimental one is best)
- Action Menu -> Align -> all to this

# Biopython PDB

Biopython contains a large cubmodule allowing to work with pdb files

https://biopython.org/docs/latest/api/Bio.PDB.html

- reading in pdb-files
- calculating bond-lengths, bond-angles and dihedral angles
- NearestNeighbor searches using kD-Trees

## Reading in pdb files

- requires the PDBParser function of the sub sub module PDB parser
- create a parser object
- read in the pdb-file with the `get_structure()` method of the parser object
- `id` is an name we can freely choose and `filename.pdb` is the file. Both are strings

```
from Bio.PDB.PDBParser import PDBParser
parser = PDBParser()
structure = parser.get_structure(id, filename.pdb)
```

Note that with friles from the RCSB-PDB there could be a warning, which is most often ok to ignore:

```
PDBConstructionWarning: WARNING: Chain A is discontinuous at line
```




In [3]:
from Bio.PDB.PDBParser import PDBParser

parser = PDBParser()
structure = parser.get_structure('1kim', '1kim.pdb')

print(structure)

<Structure id=1kim>




## Accessing atoms

We access atoms following the logic of nested lists/dictionaries:

```
structure[ModelNo][Chain][ResidueNumber][AtomName]
```

- ModelNo refers to which state we want to look at if there are several. If there is only one it is always 0
- Chain is the macromolecule (chain in PyMol)
- ResidueNumber as index (resi in PyMol)
- AtomName (name in PyMol)

Coordinates of atoms are then accessed using the `get_vector()` method:

- to get access to the coordinates of the CA-atom of the residue number 47 in the macro-molecule chain A of the first model of our file you could use:

```
structure[0]["A"][47]["CA"].get_vector()
```

Another way is by looping over the whole structure and then make use of the methods for the residues:

```
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.resname == "PRO":
                print("Proline")
            if residue.has_id("CA"):
                print("C-alpha has coordinates", structure[model][chain][residue]["CA"])
```



In [20]:
print(structure[0])
print(structure[0]["A"])
print(structure[0]["A"][47])
print(structure[0]["A"][47]["CA"])

<Model id=0>
<Chain id=A>
<Residue PRO het=  resseq=47 icode= >
<Atom CA>


In [31]:
print(structure[0]["A"][47]["CA"].get_vector())

test <Vector 45.99, 59.64, 30.02>


In [47]:
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.resname == "PRO":
                print("Proline with id", residue.id)
                if residue.has_id("CA"):
                    print(f"C-alpha has coordinates: {residue["CA"].get_vector()}")

Proline with id (' ', 47, ' ')
C-alpha has coordinates: <Vector 45.99, 59.64, 30.02>
Proline with id (' ', 57, ' ')
C-alpha has coordinates: <Vector 48.31, 76.39, 58.42>
Proline with id (' ', 82, ' ')
C-alpha has coordinates: <Vector 53.06, 77.96, 43.53>
Proline with id (' ', 84, ' ')
C-alpha has coordinates: <Vector 50.14, 83.40, 44.03>
Proline with id (' ', 131, ' ')
C-alpha has coordinates: <Vector 43.85, 82.56, 41.75>
Proline with id (' ', 141, ' ')
C-alpha has coordinates: <Vector 36.48, 70.64, 32.35>
Proline with id (' ', 154, ' ')
C-alpha has coordinates: <Vector 52.03, 55.90, 36.60>
Proline with id (' ', 155, ' ')
C-alpha has coordinates: <Vector 50.10, 59.20, 35.97>
Proline with id (' ', 165, ' ')
C-alpha has coordinates: <Vector 40.43, 76.78, 47.11>
Proline with id (' ', 173, ' ')
C-alpha has coordinates: <Vector 37.58, 82.90, 56.78>
Proline with id (' ', 184, ' ')
C-alpha has coordinates: <Vector 29.89, 83.76, 60.25>
Proline with id (' ', 195, ' ')
C-alpha has coordinates: <

In [23]:
print(dir(structure[0]["A"][47]))

['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_generate_full_id', '_id', '_reset_full_id', 'add', 'center_of_mass', 'child_dict', 'child_list', 'copy', 'detach_child', 'detach_parent', 'disordered', 'flag_disordered', 'full_id', 'get_atoms', 'get_full_id', 'get_id', 'get_iterator', 'get_level', 'get_list', 'get_parent', 'get_resname', 'get_segid', 'get_unpacked_list', 'has_id', 'id', 'insert', 'internal_coord', 'is_disordered', 'level', 'parent', 'resname', 'segid', 'set_parent', 'transform', 'xtra']


In [14]:
print(dir(structure))

['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_generate_full_id', '_id', '_reset_full_id', 'add', 'atom_to_internal_coordinates', 'center_of_mass', 'child_dict', 'child_list', 'copy', 'detach_child', 'detach_parent', 'full_id', 'get_atoms', 'get_chains', 'get_full_id', 'get_id', 'get_iterator', 'get_level', 'get_list', 'get_models', 'get_parent', 'get_residues', 'has_id', 'header', 'id', 'insert', 'internal_to_atom_coordinates', 'level', 'parent', 'set_parent', 'transform', 'xtra']


In [17]:
print(structure.id)
print()
print(structure.header)

1kim

{'name': 'crystal structure of thymidine kinase from herpes simplex virus type i complexed with deoxythymidine', 'head': 'transferase', 'idcode': '1KIM', 'deposition_date': '1997-11-12', 'release_date': '1998-05-20', 'structure_method': 'x-ray diffraction', 'resolution': 2.14, 'structure_reference': ["p.p.tung,j.respass,w.c.summers 3'-amino thymidine affinity matrix for the purification of herpes simplex virus thymidine kinase yale j.biol.med. v. 69 495 1996 issn 0044-0086 ", 'd.g.brown,r.visse,g.sandhu,a.davies,p.j.rizkallah, c.melitz,w.c.summers,m.r.sanderson crystal structures of the thymidine kinase from herpes simplex virus type-1 in complex with deoxythymidine and ganciclovir nat.struct.biol. v. 2 876 1995 issn 1072-8368 ', 'd.j.mcgeoch,m.a.dalrymple,a.j.davison,a.dolan, m.c.frame,d.mcnab,l.j.perry,j.e.scott,p.taylor the complete dna sequence of the long unique region in the genome of herpes simplex virus type 1 j.gen.virol. v. 69 1531 1988 issn 0022-1317 ', 'm.r.sanderson,

## Simple calculations

### Distance between two atoms

Simply substract the two vectors and the resulting distance in A will be given as a single float

```
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CA") and residue.has_id("N"):
                distance = residue["CA"] - residue["N"]
```

`residue.has_id()` ensures that the atoms are actually present in the atom!!


### Angles

- uses the `calc_angle()`function of the PDB submodule
- returns the angle in radiant!!
- to receive the angle in degrees multiply it with 180/pi: `(180/math.pi) * x`
- we calculate the angle at vector2!!


```
import Bio.PDB as BP
import math

# in radiant
BP.calc_angle(vector1, vector2, vector3)

# in degree
(180/math.pi) * BP.calc_angle(vector1, vector2, vector3)
```

Example:

```
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("N") and residue.has_id("CA") and residue.has_id("C"):
                (180/math.pi) * BP.calc_angle(residue["N"].get_vector(),residue["CA"].get_vector(), residue["C"].get_vector())
```




### Dihedrals

- uses the `calc_dihedral()`function of the PDB submodule
- returns the angle in radiant!!
- to receive the angle in degrees multiply it with 180/pi: `(180/math.pi) * x`

```
import Bio.PDB as BP
import math

# in radiant
BP.calc_angle(vector1, vector2, vector3, vector4)

# in degree
(180/math.pi) * BP.calc_angle(vector1, vector2, vector3, vector4)
```

Example:

```
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("N") and residue.has_id("CA") and residue.has_id("C"):
                (180/math.pi) * BP.calc_dihedral(residue["CA"].get_vector(), residue["CB"].get_vector(),residue["CG"].get_vector(), residue["CD1"].get_vector())
```

# Exercises

## Exercise 1 - Using PyMol

Use PyMol to look at the ten different pdb files in the Nextcloud and fulfill the tasks below by saving images of the corresponding structure:

1. For 3FQK and 7T10 color by chain and create a cartoon representation
2. For 4P5J and 7Y5I color by elements and create a stick representation
3. For 7PY8 and 7XCM color by secondary structures and create a ribbon representation
4. For 6ANF and 7QOV color by spectrum and create a dots representation
5. For 1KIM and 7UN2 color by chain and atom and create a cartoon and a stick representation

**color by chain and create a cartoon representation**

3FQK (rendered):

![image.png](attachment:7ac9b018-f53d-4693-805b-d4278bc00ec5.png)

7T10 (drawn): 

![image.png](attachment:ec28b352-4d10-4105-a10a-41f01ddec960.png)

**color by elements and create a stick representation**

4P5J: 

![image.png](attachment:5681e257-db76-4fea-9e57-def51a5ee771.png)

7Y51:

![image.png](attachment:e3f3f131-7519-4c63-b769-230ca09f64b7.png)

**color by secondary structures and create a ribbon representation**

7PY8

![image.png](attachment:219b0f8e-fc85-447a-bf0d-12cb8562837d.png)

7XCM

![image.png](attachment:b5f9d594-8319-486c-9281-5e7e7ad32d17.png)

**color by spectrum and create a dots representation**

6ANF 

![image.png](attachment:69f447e3-b37b-4828-a7ca-ea6153e3d9e1.png)

7QOV 

![image.png](attachment:77b3a00f-f773-4624-b9a3-8c4727b79e4b.png)

**color by chain and atom and create a cartoon and a stick representation**

1KIM 

![image.png](attachment:f740f82c-2624-4dba-b958-8a276db125ae.png)

7UN2 

![image.png](attachment:dc1009cd-7970-47cf-848b-2a32f622e6bc.png)

## Exercise 2 - New pdb files

1. Download the structures in the left column from the RCBS-PDB
   - Load all of them into a PyMol session and color by chains
2. Use the fetch command to load the structures in the right column of the table into a second PyMol session
   - Show a Stick representation and color by Element
3. Use the UniProt identifiers in the second table to search for structures in the Alphafold database (first result)
   - Open all of them in a third session
   - Show a Lines representation and color by spectrum
- Create an image at the end for each session

|1|2|3|
|-|-|-|
|5HJK| 1OIU|B4DTL8|
|2ASD| 3BVC|A0A5N3XQ52|
|3QWE| 5SDG|O14905|
|7YXC| 7KOL|P05067|
|1ZUI| 1BL8|P09850|
|4BNM| 4LCZ|Q96FZ2|
|6GHJ| 2GA||







![image.png](attachment:b1a8b34d-8e1e-4b87-a2d8-336a1332fdac.png)


![image.png](attachment:d86d8604-8c10-466b-9fda-b1af33aabe80.png)

![image.png](attachment:7c83ed6c-283c-4ae9-84e1-a882919f9fae.png)

## Exercise 3 - Selections in PyMol 

- Create the Selections in the table below

|Selection| Structures|
|-|-|
|All CG Atoms, including CG1 and CG2 |1KIM, 7TF2, 7XCM, 1BL8|
|All β-sheets |7PY8, 7QR2, 7T10, 3FQK|
|All helices |7T10, 1BL8, 3FQK, 4P5J|
|All unstructured regions |7PY8, 3FQK, 7TF2, 7QOV|
|All aromatic amino acids (TYR, PHE, HIS, TRP)| 7QOV, 3FQK, 1BL8, 1T10|
|All Glutamines (GLN) and Glutamates (GLU) |1BL8, 1KIM, 1QOV, 7TF2|

**This would be to load each selection into their own session**

```
fetch 1kim
fetch 7TF2
fetch 7XCM
fetch 1BL8
select CG_all, (1kim | 7TF2 | 7XCM | 1BL8) & name CG*

fetch 7PY8
fetch 7QR2
fetch 7T10
fetch 3FQK
select beta, (7PY8 | 7QR2| 7T10 | 3FQK) & ss S

fetch 7T10
fetch 1BL8
fetch 3FQK
fetch 4P5J
select helix, (7T10 | 1BL8 | 3FQK | 4P5J) & ss H

fetch 7PY8
fetch 3FQK
fetch 7TF2
fetch 7QOV
select unstr, (7PY8| 3FQK | 7TF2 | 7QOV) & ss ""

fetch 7QOV
fetch 3FQK
fetch 1BL8
fetch 1T10
select arom, (7QOV | 3FQK | 1BL8 | 1T10) & resn TYR+PHE+HIS+TRP

fetch 1BL8
fetch 1KIM
fetch 1QOV
fetch 7TF2
select gl, (1BL8 | 1KIM | 1QOV | 7TF2) & resn GLU + GLN

```
Note: GL* would also select GLY

**This would be to load everythin selection into one session**

```
fetch 1BL8
fetch 1KIM
fetch 1QOV
fetch 1T10
fetch 3FQK
fetch 4P5J
fetch 7TF2
fetch 7PY8
fetch 7QOV
fetch 7QR2
fetch 7T10
fetch 7XCM


select CG_all, (1kim | 7TF2 | 7XCM | 1BL8) & name CG*
select beta, (7PY8 | 7QR2| 7T10 | 3FQK) & ss S
select helix, (7T10 | 1BL8 | 3FQK | 4P5J) & ss H
select unstr, (7PY8| 3FQK | 7TF2 | 7QOV) & ss ""
select arom, (7QOV | 3FQK | 1BL8 | 1T10) & resn TYR+PHE+HIS+TRP
select gl, (1BL8 | 1KIM | 1QOV | 7TF2) & resn GLU + GLN 

```
Note: GL* would also select GLY

## Exercise 4 - PyMol Scripts

Write a PyMol Script that downloads the structure 7Y5I from the RCSB-PDB and fulfills the following tasks with it:

- Create a cartoon representation and color it red
- Create an image of the centered structure
- Hide the structure
- Create a selection of the Ligand TLA and create a stick representation of it
- Create an image of the centered and zoomed in Ligand
- Do the same for the Ligand ARG in the chain A with a residue id of 402 (resi 402)

```
# Module 3.4 Exercise 4

#setting the session
reinitialize

#fetching the structure
fetch 7Y5I

# showing the cartoon in red
hide everything
show cartoon
color red

#center the structure
center 7Y5I
clip slab, 500

#create an image
bg white
ray
png Exercise4_Image1.png

#hide everything
hide everything

# create selection, show as stick, center and zoom
select TLA, resn TLA
show_as stick, TLA
center TLA
zoom TLA
clip slab, 500

#create an image
ray
png Exercise4_Image2_TLA.png


#hide everything
hide everything

# create selection, show as stick, center and zoom
select ARG, resi 402 & chain A
show_as stick, ARG
center ARG
zoom ARG
clip slab, 500

#create an image
bg white
ray
png Exercise4_Image3_ARG.png

```


## Exercise 5 - A simple Log-File

- Open a new PyMol Session
- Open a log-file and then execute the following tasks:
  - Download the PDB-Structure 8I35 via fetch
  - Hide everything and show only a cartoon representation of the structure
  - Color the structure by chain and save a first image
  - Create a selection of the chain A, and show only this selection in a stick representation
  - Save a second image
  - Show the whole structure as a surface representation and save a third image
- Close the log-file at the end

## Exercise 6 - Measuring in PyMol

Use the Measurement Tool in PyMol to measure the following in the structure 7T10:

- Distances between SG atoms of Cysteines
  - Disulfide bonds have Cysteins that are 2 Å apart, all others more (but as little as 3.6 A)
- Distances between the nitrogen atoms of Histidine (ND1 and NE2)
  - 2.1 Å
- The CA-CB-SG angle in Cysteines
  - 116.9°
- The NE-CZ-NH1 and NE-CZ-NH2 angles in Arginines
  - 121.5° and 119.3°
- The CB-CG-NE-CD dihedral in Arginines
  - 83.7°, 177.2°, -63.4°
- The CG-CD-CE-NZ dihedral in Lysines
  - -176.6°, -42.1°, -177.6
- Distances between the ring systems of aromatic amino acids (Histidine, Phenylalanine, Tyrosine and Tryptophan)

## Exercise 7 - Measurement Script

- Write a PyMol Script that downloads the structure 7QR2 and measures the the thing shown in the table below
- The atoms are written with their atom name and the residue number in brackets

|Distances |Angles |Dihedrals|
|-|-|-|
|CA (72) – CA (87)| S (501) – S (502) – S (503)| O1 (505) – C1 (505) – C2 (505) – O2 (505)|
|CA (353) – CA (370)| S (501) – S (502) – S (504)| O1 (506) – C1 (506) – C2 (506) – O2 (506)|
|S (501) – S (502)| S (501) – S (503) – S (504)| O1 (507) – C1 (507) – C2 (507) – O2 (507)|
|S (501) – S (503)| S (502) – S (503) – S (504)|
|S (501) – S (504)|||
|S (502) – S (503)|||
|S (502) – S (504)|||
|S (503) – S (504)|||


```
#Exercise 7

reinitialize

log_open Exercise7

fetch 7QR2

distance dis_CA1, resi 72 & name CA, resi 87 and name CA
distance dis_CA2, resi 353 and name CA, resi 320 and name CA

distance dis_S50x, resi 501+502+503 and name S, resi 502+503+504 and name S

angle ang_S50x, resi 501+502 and name S, resi 502+503 and name S, resi 503+504 and name S

dihedral dih_505, resi 505 & name O1, resi 505 and name C1, resi 505 and name C2, resi 505 and name O2
dihedral dih_506, resi 506 & name O1, resi 506 and name C1, resi 506 and name C2, resi 506 and name O2
dihedral dih_507, resi 507 & name O1, resi 507 and name C1, resi 507 and name C2, resi 507 and name O2

```


## Exercise 8 - Modelling with SwissModel

Use the Mystery Protein Sequences to create some models with the help of the SwissModel Repository

I look at 

Mystery1

```
INGTEGPNVYVPFSNVTGVVESPFEQPQYYCAEPWQFSMLCPYMFLLIVCGFRINF
LTAYMSVHHKKLRTNLNYILLNLTVADLFMVFGGFTTTLYCGLHGYFVYGPTGCSLE
GFFATLGGEIALNSQVVLAIERYIHVEEPMSIFVFGEGHVIMGHQFTCIMALACAAP
PLVGWSRYIGSCIQCTWGIDYYTSKPEYNNKSFVIRRFVVHFTIPMIVIFFCYGQLV
FCVKEAAAQWLESATTQKAEKEVTRMHIIMVIFFLICWGPYASVAFYIHTHQGSNFG
PWFMTLKAFFAKMSIEYNCVIYIMLNIQFRMCMLTTLCCGKNKLGEDDDSATASKTE
TMNVQP
```


A

## Exercise 9 - Comparing structures

- Use at least one of the fasta-files **B4DTL8**, A0A1C7D1B9 and B3JI28 to create models with the help of the SwissModel repository
- Look up known structures in the UniProt (https://www.uniprot.org/) for these files as well and download them (both PDB and Alphafold versions)
- Compare the structures to each other by aligning them on to each other with the help of PyMol

AlphaFold

![image.png](attachment:403fb23d-8af0-42e5-9d71-18856dec01cd.png)
![image.png](attachment:51dbea79-f884-4105-bd3a-bb3db39ba4af.png)
![image.png](attachment:ab1cb800-6f93-4b1d-9b64-9386c51b10ca.png)
![image.png](attachment:bf317792-faf6-4beb-b080-0cbca5352c69.png)
![image.png](attachment:2b3c9705-1204-4fec-bb57-922a8e27068b.png)
![image.png](attachment:71032bfc-3216-47cf-97cd-692a06d2a381.png)
![image.png](attachment:36e84b92-5796-4e17-af76-9d5ef19f5eee.png)
![image.png](attachment:19487855-3e95-415c-98e9-afee97c412a2.png)
![image.png](attachment:a31947cb-1b0e-4fce-adf3-288140a8174f.png)

> the models correspong very well to the AlphaFold Structure but not at all to the PDB structure??

## Exercise 12 - Structural calculations

- Write a Python script that reads in the structure 7UN2
- The script should calculate the distances, angles and dihedrals described in the table below

|Type |Amino acid |Atoms|
|--|--|--|
|Distance |ASP |CG – OD1|
|Distance |ASP| CG – OD2|
|Distance| HIS |CE1 – ND1|
|Distance| HIS| CE1 – NE2|
|Angle| VAL| CA – CB – CG1|
|Angle| TYR| CE1 – CZ – OH|
|Dihedral |LYS |CG – CD – CE – NZ|
|Dihedral| GLN |CB – CG – CD – OE1|

In [53]:
from Bio.PDB.PDBParser import PDBParser
import Bio.PDB as BP
import math


parser = PDBParser()
structure = parser.get_structure("7UN2", "7UN2.pdb")



In [65]:
# 1. CG and OD1 of ASP
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CG") and residue.has_id("OD1") and residue.resname == "ASP":
                distance = residue["CG"] - residue["OD1"]
                print(f"{residue.resname} no {residue.id[1]} has {distance} A between CG and OD1")

ASP no 2 has 1.24886953830719 A between CG and OD1
ASP no 10 has 1.255157470703125 A between CG and OD1
ASP no 16 has 1.2497494220733643 A between CG and OD1
ASP no 19 has 1.2505394220352173 A between CG and OD1
ASP no 28 has 1.2520105838775635 A between CG and OD1
ASP no 71 has 1.249117136001587 A between CG and OD1
ASP no 78 has 1.2458832263946533 A between CG and OD1
ASP no 80 has 1.2483317852020264 A between CG and OD1
ASP no 82 has 1.2528865337371826 A between CG and OD1
ASP no 136 has 1.2517863512039185 A between CG and OD1
ASP no 139 has 1.2613810300827026 A between CG and OD1
ASP no 145 has 1.248380422592163 A between CG and OD1
ASP no 151 has 1.2579995393753052 A between CG and OD1
ASP no 192 has 1.2444372177124023 A between CG and OD1
ASP no 203 has 1.2496263980865479 A between CG and OD1
ASP no 208 has 1.2313071489334106 A between CG and OD1
ASP no 218 has 1.2541840076446533 A between CG and OD1
ASP no 235 has 1.2478957176208496 A between CG and OD1
ASP no 2 has 1.2471270561

In [66]:
# 2. CG and OD2 of ASP
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CG") and residue.has_id("OD2") and residue.resname == "ASP":
                distance = residue["CG"] - residue["OD2"]
                print(f"{residue.resname} no {residue.id[1]} has {distance} A between CG and OD2")

ASP no 2 has 1.249785304069519 A between CG and OD2
ASP no 10 has 1.252985954284668 A between CG and OD2
ASP no 16 has 1.2502385377883911 A between CG and OD2
ASP no 19 has 1.2500027418136597 A between CG and OD2
ASP no 28 has 1.2386349439620972 A between CG and OD2
ASP no 71 has 1.2527213096618652 A between CG and OD2
ASP no 78 has 1.2456270456314087 A between CG and OD2
ASP no 80 has 1.2561410665512085 A between CG and OD2
ASP no 82 has 1.2465587854385376 A between CG and OD2
ASP no 136 has 1.2451852560043335 A between CG and OD2
ASP no 139 has 1.2465533018112183 A between CG and OD2
ASP no 145 has 1.2492443323135376 A between CG and OD2
ASP no 151 has 1.2543292045593262 A between CG and OD2
ASP no 192 has 1.246705174446106 A between CG and OD2
ASP no 203 has 1.2508825063705444 A between CG and OD2
ASP no 208 has 1.2355178594589233 A between CG and OD2
ASP no 218 has 1.2540134191513062 A between CG and OD2
ASP no 235 has 1.2515993118286133 A between CG and OD2
ASP no 2 has 1.25642228

In [68]:
# 3. CE1 and ND1 of HIS
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CE1") and residue.has_id("ND1") and residue.resname == "HIS":
                distance = residue["CE1"] - residue["ND1"]
                print(f"{residue.resname} no {residue.id[1]} has {distance} A between CE1 and ND1")

HIS no 24 has 1.3222111463546753 A between CE1 and ND1
HIS no 51 has 1.3227673768997192 A between CE1 and ND1
HIS no 127 has 1.3233673572540283 A between CE1 and ND1
HIS no 180 has 1.3230255842208862 A between CE1 and ND1
HIS no 205 has 1.3226341009140015 A between CE1 and ND1
HIS no 24 has 1.3201824426651 A between CE1 and ND1
HIS no 51 has 1.3200249671936035 A between CE1 and ND1
HIS no 121 has 1.3258552551269531 A between CE1 and ND1
HIS no 127 has 1.3227794170379639 A between CE1 and ND1
HIS no 180 has 1.3215783834457397 A between CE1 and ND1
HIS no 205 has 1.3237192630767822 A between CE1 and ND1


In [69]:
# 4. CE1 and NE2 of HIS
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CE1") and residue.has_id("NE2") and residue.resname == "HIS":
                distance = residue["CE1"] - residue["NE2"]
                print(f"{residue.resname} no {residue.id[1]} has {distance} A between CE1 and NE2")

HIS no 24 has 1.3234599828720093 A between CE1 and NE2
HIS no 51 has 1.3198864459991455 A between CE1 and NE2
HIS no 127 has 1.3221014738082886 A between CE1 and NE2
HIS no 180 has 1.3205078840255737 A between CE1 and NE2
HIS no 205 has 1.3218494653701782 A between CE1 and NE2
HIS no 24 has 1.3234623670578003 A between CE1 and NE2
HIS no 51 has 1.3184579610824585 A between CE1 and NE2
HIS no 121 has 1.320971131324768 A between CE1 and NE2
HIS no 127 has 1.3220763206481934 A between CE1 and NE2
HIS no 180 has 1.3200384378433228 A between CE1 and NE2
HIS no 205 has 1.3247451782226562 A between CE1 and NE2


In [74]:
#5. Angle between CA CB CG1 in VAL
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CA") and residue.has_id("CB") and residue.has_id("CG1") and residue.resname == "VAL":
                A = residue["CA"].get_vector()
                B = residue["CB"].get_vector()
                C = residue["CG1"].get_vector()
                angle1 = (180/math.pi) * BP.calc_angle(A,B,C)
                print(f"{residue.resname} no {residue.id[1]} has {angle1} degrees at CA to CB to CG1")

VAL no 5 has 111.51190802842757 degrees at CB
VAL no 7 has 110.80083680343076 degrees at CB
VAL no 32 has 111.14907856947588 degrees at CB
VAL no 47 has 110.13878291339982 degrees at CB
VAL no 57 has 109.84810814936323 degrees at CB
VAL no 64 has 110.08078910240448 degrees at CB
VAL no 65 has 110.33794976488223 degrees at CB
VAL no 75 has 110.94831902023704 degrees at CB
VAL no 79 has 110.12541041549987 degrees at CB
VAL no 84 has 109.92257468738701 degrees at CB
VAL no 89 has 111.8473503550374 degrees at CB
VAL no 91 has 109.54890906424384 degrees at CB
VAL no 129 has 110.27413953239213 degrees at CB
VAL no 159 has 110.56551129860755 degrees at CB
VAL no 170 has 110.28244079859995 degrees at CB
VAL no 179 has 110.09826721992697 degrees at CB
VAL no 187 has 110.97073646787268 degrees at CB
VAL no 188 has 109.97144398334997 degrees at CB
VAL no 5 has 110.5233714963317 degrees at CB
VAL no 7 has 110.09735840834435 degrees at CB
VAL no 32 has 111.16955219694611 degrees at CB
VAL no 47 has

In [76]:
#6. Angle between CE1 CZ OH in TYR
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id("CE1") and residue.has_id("CZ") and residue.has_id("OH") and residue.resname == "TYR":
                A = residue["CE1"].get_vector()
                B = residue["CZ"].get_vector()
                C = residue["OH"].get_vector()
                angle1 = (180/math.pi) * BP.calc_angle(A,B,C)
                print(f"{residue.resname} no {residue.id[1]} has {angle1} degrees at CE1 to CZ to OH")

TYR no 12 has 120.05840632065686 degrees at CE1 to CZ to OH
TYR no 22 has 112.73145007324213 degrees at CE1 to CZ to OH
TYR no 54 has 119.9932844098675 degrees at CE1 to CZ to OH
TYR no 67 has 119.67371877374983 degrees at CE1 to CZ to OH
TYR no 77 has 120.60352627301945 degrees at CE1 to CZ to OH
TYR no 100 has 118.89684050432474 degrees at CE1 to CZ to OH
TYR no 176 has 119.3231149901595 degrees at CE1 to CZ to OH
TYR no 12 has 119.09185865618662 degrees at CE1 to CZ to OH
TYR no 22 has 111.53181105830656 degrees at CE1 to CZ to OH
TYR no 54 has 119.98252284995023 degrees at CE1 to CZ to OH
TYR no 67 has 119.1921350142701 degrees at CE1 to CZ to OH
TYR no 77 has 121.3503288621001 degrees at CE1 to CZ to OH
TYR no 100 has 119.86383658044667 degrees at CE1 to CZ to OH
TYR no 176 has 119.09679251452762 degrees at CE1 to CZ to OH


In [82]:
#7. Dihedral between CG CD CE NZ in LYS
a1 = "CG"
a2 = "CD"
a3 = "CE" 
a4 = "NZ"
res = "LYS"

for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id(a1) and residue.has_id(a2) and residue.has_id(a3) and residue.has_id(a4) and residue.resname == res:
                A = residue[a1].get_vector()
                B = residue[a2].get_vector()
                C = residue[a3].get_vector()
                D = residue[a4].get_vector()
                dihed = (180/math.pi) * BP.calc_dihedral(A,B,C,D)
                print(f"{residue.resname} no {residue.id[1]} has {dihed} degrees at {a1}, {a2}, {a3}, {a4}")




LYS no 30 has -176.2860543372668 degrees at CG, CD, CE, NZ
LYS no 35 has -172.96201386463198 degrees at CG, CD, CE, NZ
LYS no 36 has -65.89048125270061 degrees at CG, CD, CE, NZ
LYS no 39 has 68.53858342661715 degrees at CG, CD, CE, NZ
LYS no 46 has -75.09111727906695 degrees at CG, CD, CE, NZ
LYS no 59 has -61.32128221934154 degrees at CG, CD, CE, NZ
LYS no 101 has -97.2195233194012 degrees at CG, CD, CE, NZ
LYS no 114 has -163.62967527780003 degrees at CG, CD, CE, NZ
LYS no 116 has 151.98336199363547 degrees at CG, CD, CE, NZ
LYS no 135 has 62.85693328172327 degrees at CG, CD, CE, NZ
LYS no 138 has -70.88501658317315 degrees at CG, CD, CE, NZ
LYS no 200 has -157.26789201047984 degrees at CG, CD, CE, NZ
LYS no 30 has -169.67055897236364 degrees at CG, CD, CE, NZ
LYS no 35 has -176.56063152061498 degrees at CG, CD, CE, NZ
LYS no 36 has -67.58098855589655 degrees at CG, CD, CE, NZ
LYS no 39 has -67.18315356957433 degrees at CG, CD, CE, NZ
LYS no 46 has -80.71030566378906 degrees at CG, 

In [81]:
#8. Dihedral between CB CG CD OE1 in GLN
a1 = "CB"
a2 = "CG"
a3 = "CD" 
a4 = "OE1"
res = "GLN"

for model in structure:
    for chain in model:
        for residue in chain:
            if residue.has_id(a1) and residue.has_id(a2) and residue.has_id(a3) and residue.has_id(a4) and residue.resname == res:
                A = residue[a1].get_vector()
                B = residue[a2].get_vector()
                C = residue[a3].get_vector()
                D = residue[a4].get_vector()
                dihed = (180/math.pi) * BP.calc_dihedral(A,B,C,D)
                print(f"{residue.resname} no {residue.id[1]} has {dihed} degrees at {a1}, {a2}, {a3}, {a4}")




GLN no 43 has 26.152529279650025 degrees at CB, CG, CD, OE1
GLN no 132 has -14.370071429775123 degrees at CB, CG, CD, OE1
GLN no 137 has 53.908260558222615 degrees at CB, CG, CD, OE1
GLN no 143 has -34.03305918166893 degrees at CB, CG, CD, OE1
GLN no 166 has 109.61751765635107 degrees at CB, CG, CD, OE1
GLN no 43 has 33.778301757555575 degrees at CB, CG, CD, OE1
GLN no 132 has -9.53130320743544 degrees at CB, CG, CD, OE1
GLN no 137 has 56.70215426106122 degrees at CB, CG, CD, OE1
GLN no 143 has -33.44904290047449 degrees at CB, CG, CD, OE1
GLN no 166 has 15.124680689881318 degrees at CB, CG, CD, OE1
