# Beginner's Python—Session One Biochemistry Answers

## DBE calculator

A **double bond equivalent (DBE)** is equal to the number of unsaturations present in a organic molecule. In this context it includes the total number of double bonds or ring systems within a molecule. The equation to calculate this value is shown below.

$$\text{DBE} = c + 1 - \frac{h}{2} - \frac{x}{2} + \frac{n}{2}     $$

$$ c = \text{no. of carbon atoms in the molecule}  \\
h = \text{no. of hydrogen atoms in the molecule}  \\
x = \text{no. of halogen (Cl, Br, I or F) atoms in the molecule} \\
n = \text{no. of nitrogen atoms in the molecule}$$

For the molecule below, create variables `c`, `h`, `x` and `n` containing the values corresponding to the formula above. 



<center><img src="https://raw.githubusercontent.com/warwickdatascience/beginners-python/master/session_one/session_one_subject_questions/resources/molecule.png" width="200" align="center"/> 

In [2]:
c = 10
h = 9
x = 1
n = 0

Use the variables defined above to create a new variable, `dbe`, using the formula above. Compare this result with one obtained by counting the double bonds and rings within the molecule. 

In [4]:
dbe = c + 1 - (h/2) - (x/2) + (n/2)
print("There are", dbe, "DBE equivalents")

There are 6.0 DBE equivalents


Convert the value of `dbe` into an integer using the `int()` function.

In [5]:
print(int(dbe))

6


## Determining Protein Composition of Enzymes

Proteins are crucial for the existence of life as we know it, and the most important of these are **enzymes**. Amino acids bind together side-by-side, forming the basis of all proteins.

Below is an example of a simple protein, each <span style="color:#FF200F">**c**</span><span style="color:#FF820F">**o**</span><span style="color:#FFC80F">**l**</span><span style="color:#A2F80F">**o**</span><span style="color:#4682E2">**u**</span><span style="color:#3B2FCE">**r**</span> representing a different amino acid.

<center><img src="https://raw.githubusercontent.com/warwickdatascience/beginners-python/master/session_one/session_one_subject_questions/resources/aa_chain_edited.png" width="600" align="center"/> 

The enzyme **CrmG** is a transaminase with an amino acid length of **523 amino acids.** Its sequence is below.

```MTHPSGEPVYADAVLNGWLTSMGLGVEYVRAEGNTVYYLDDEGREVPVLDHACGFGSLIFGHNHP
EIIAHAKAALDAGTVVHAQLSRQPRANQISRILNDIMRRETGRDDRYNAIFANSGAEANEICMKH
AELERQERITALFAEIDAELDTAREALTTGTATLDTASLPLVGGGAGDVDGVIADIHRHNDERRA
ERPLFLTLDGSFHGKLVGSIQLTQNEPWRTPFTALSSPARFLPADEPELIGKIVEDERRSVLTLS
LDKDTVRVVERDFPVVAAIFVEPVRGGSGMKTVTPELAEELHRLRDTLGCPLVVDEVQTGIGRTG
AFFGSALLGIRGDYYTLAKAIGGGIVKNSVALIRQDRFLPAMEVIHSSTFAKDGLSASIALKVLE
MVEADGGRVYQRVRERGQRLEAMLESVRADHSDVVSAVWGTGLMLALELRDQSNATSQAIREKAA
HGFLGYVLAGFLLREHHIRVLPAGPRSGFLRFSPSLYITDEEIDRTETALRSLFTALRDQDGDRLVLS```

Each amino acid is represented by a unique single letter code (e.g. **M** represents the amino acid Methionine).

When assigning a string to a variable, the `.count('')` method (we'll learn more about methods later) can be used to numerate the total number of a specific character in the string. An example is shown below.

In [6]:
# Input crmg's amino acid sequence as a string
crmg = "MTHPSGEPVYADAVLNGWLTSMGLGVEYVRAEGNTVYYLDDEGREVPVLDHACGFGSLIFGHNHPEIIAHAKAALDAGTVVHAQLSRQPRANQISRILNDIMRRETGRDDRYNAIFANSGAEANEICMKHAELERQERITALFAEIDAELDTAREALTTGTATLDTASLPLVGGGAGDVDGVIADIHRHNDERRAERPLFLTLDGSFHGKLVGSIQLTQNEPWRTPFTALSSPARFLPADEPELIGKIVEDERRSVLTLSLDKDTVRVVERDFPVVAAIFVEPVRGGSGMKTVTPELAEELHRLRDTLGCPLVVDEVQTGIGRTGAFFGSALLGIRGDYYTLAKAIGGGIVKNSVALIRQDRFLPAMEVIHSSTFAKDGLSASIALKVLEMVEADGGRVYQRVRERGQRLEAMLESVRADHSDVVSAVWGTGLMLALELRDQSNATSQAIREKAAHGFLGYVLAGFLLREHHIRVLPAGPRSGFLRFSPSLYITDEEIDRTETALRSLFTALRDQDGDRLVLS"

# Count the occurences of three amino acids
arginine_count = crmg.count('R')
cysteine_count = crmg.count('C')
glycine_count = crmg.count('G')

# Print the results
print("There are", arginine_count, "arginines,",
      cysteine_count, "cysteines and",
      glycine_count, "glycines" )

There are 45 arginines, 3 cysteines and 47 glycines


Calculate the total number of lysine's **(K)**, serine's **(S)** and alanine's **(A)** in the enzyme CrmG using the `.count` method, as shown above. Once the above code cell has been run, it will be possible to use the already defined variable `crmg` in your code without re-entering it.

In [7]:
lysine_count = crmg.count('K')
serine_count = crmg.count('S')
alanine_count = crmg.count('A')

print("There are", lysine_count, "lysines,",
      serine_count, "serines and",
      alanine_count, "alanines" )

There are 11 lysines, 30 serines and 54 alanines


 Remember, case matters in Python. Does the counter work the same counting `.count('k')` as it does with `.count('K')`?

In [8]:
print(crmg.count('k'))

0


We obtain a count of zero since there aren't any 'k's, just 'K's.

What type of variable is produced when `.count('')` is used? Check using  `type()`

In [9]:
print(type(lysine_count))

<class 'int'>


## Percentage Make-up

Convert the amino acid totals for Lysine **(K)**, Serine **(S)** and Alanine **(A)** into a percentage make-up of the CrmG protein and print the results.

**NOTE:** There are **523** amino acids in CrmG

In [10]:
k_percent = (lysine_count / 523) * 100
s_percent = (serine_count / 523) * 100
a_percent = (alanine_count / 523) * 100

print("Lysine Percentage:", k_percent)
print("Serine Percentage:", s_percent)
print("Alanine Percentage:", a_percent)

Lysine Percentage: 2.1032504780114722
Serine Percentage: 5.736137667304015
Alanine Percentage: 10.325047801147228


## What Does the CrmG Enzyme Look Like?

We can visualisae the structure of the CrmG protein using a tool called PyMOL, which was in turn developed using Python. The amino acid chain is depicted in <span style="color:#3B2FCE">**purple**</span>, and the water molecules that surround it in <span style="color:#FFA41B">**yellow**</span>. The small compund in <span style="color:#1FED2C">**green**</span> (centre) is the coezyme PLP.

<center><img src="https://raw.githubusercontent.com/warwickdatascience/beginners-python/master/session_one/session_one_subject_questions/resources/crmg_qual.png" width="800" align="center"/> 