# [01] Chemistry/Biology in Python (Beginner's) 

## **Basic** - DBE calculator

A **double bond equivalent (DBE)** is equal to the number of unsaturations present in a organic molecule. In this context it includes the total number of double bonds or ring systems within a molecule. The equation to calculate this value is shown below.

$$DBE = c + 1 - \frac{h}{2} - \frac{x}{2} + \frac{n}{2}     $$

$$ c = no.\, of \,carbon \,atoms\, in\, the\, molecule  \\
h = no.\, of \,hydrogen \,atoms\, in\, the\, molecule  \\
x = no.\, of \,halogen\,(Cl,\, Br,\,I\, or\, F) \,atoms\, in\, the\, molecule  \\
n = no.\, of \,nitrogen \,atoms\, in\, the\, molecule$$

 ### **Task 1**

For the molecule below assign and calculate values for the variables `c`, `h`, `x` and `n`. 



<center><img src="./STUFF/molecule1.png" width="200" align="center"/> 

**NOTE:** Variables are case sensitive, for equations to work **maintain the case.**

In [26]:
#Your variables (c, h, x and n) and their values go here
c = 10
h = 9
x = 1
n = 0

Once you have assigned values to all 4 variables create an equation to calculate `dbe`. Compare this result with one obtained by counting the  double bonds and rings within the molecule. 

In [27]:
#Create an equation that calculates DBE and print the result
dbe = c + 1 - (h/2) -(x/2) + (n/2)
print("\nThere are",dbe, "DBE equivalents.\n")


There are 6.0 DBE equivalents.



Convert the result for `dbe` into an integer using the `int()` function.

In [21]:
print(int(dbe))

6


## **Intermediate** - Determining protein composition of Enzymes

### Background

Proteins are the crucial for the existence of life as we know it and the most important of them are **enzymes**. Amino acids bind together side by side forming the basis of all proteins.

Below is an example of a simple protein; each <span style="color:#FF200F">**c**</span><span style="color:#FF820F">**o**</span><span style="color:#FFC80F">**l**</span><span style="color:#A2F80F">**o**</span><span style="color:#4682E2">**u**</span><span style="color:#3B2FCE">**r**</span> representing a different amino acid.

<center><img src="./STUFF/aa_chain_edited.png" width="600" align="center"/> 

The enzyme **CrmG** is a transaminase with an amino acid length of **523 amino acids.** Its sequence is below.

*MTHPSGEPVYADAVLNGWLTSMGLGVEYVRAEGNTVYYLDDEGREVPVLDHACGFGSLIFGHNHP
EIIAHAKAALDAGTVVHAQLSRQPRANQISRILNDIMRRETGRDDRYNAIFANSGAEANEICMKH
AELERQERITALFAEIDAELDTAREALTTGTATLDTASLPLVGGGAGDVDGVIADIHRHNDERRA
ERPLFLTLDGSFHGKLVGSIQLTQNEPWRTPFTALSSPARFLPADEPELIGKIVEDERRSVLTLS
LDKDTVRVVERDFPVVAAIFVEPVRGGSGMKTVTPELAEELHRLRDTLGCPLVVDEVQTGIGRTG
AFFGSALLGIRGDYYTLAKAIGGGIVKNSVALIRQDRFLPAMEVIHSSTFAKDGLSASIALKVLE
MVEADGGRVYQRVRERGQRLEAMLESVRADHSDVVSAVWGTGLMLALELRDQSNATSQAIREKAA
HGFLGYVLAGFLLREHHIRVLPAGPRSGFLRFSPSLYITDEEIDRTETALRSLFTALRDQDGDRLVLS*

*Each amino acid is represented by a unique single letter code (e.g **M** represents the amino acid Methionine).*

### Example

When assigning a string to a variable the `.count('')` function can be used to numerate the total number of each character in the string.

In [16]:
#Run this cell
#inputing crmg's amino acid sequence as a string
crmg = "MTHPSGEPVYADAVLNGWLTSMGLGVEYVRAEGNTVYYLDDEGREVPVLDHACGFGSLIFGHNHPEIIAHAKAALDAGTVVHAQLSRQPRANQISRILNDIMRRETGRDDRYNAIFANSGAEANEICMKHAELERQERITALFAEIDAELDTAREALTTGTATLDTASLPLVGGGAGDVDGVIADIHRHNDERRAERPLFLTLDGSFHGKLVGSIQLTQNEPWRTPFTALSSPARFLPADEPELIGKIVEDERRSVLTLSLDKDTVRVVERDFPVVAAIFVEPVRGGSGMKTVTPELAEELHRLRDTLGCPLVVDEVQTGIGRTGAFFGSALLGIRGDYYTLAKAIGGGIVKNSVALIRQDRFLPAMEVIHSSTFAKDGLSASIALKVLEMVEADGGRVYQRVRERGQRLEAMLESVRADHSDVVSAVWGTGLMLALELRDQSNATSQAIREKAAHGFLGYVLAGFLLREHHIRVLPAGPRSGFLRFSPSLYITDEEIDRTETALRSLFTALRDQDGDRLVLS"

#Defining the amino acid count variable and using the .count function to tally the total numbers of each amino acid
arginine_count = crmg.count('R')
cysteine_count = crmg.count('C')
glycine_count = crmg.count('G')

#Printing the results ("\n" within a string creates a new line)
print("\nThere are", arginine_count, "arginines,", cysteine_count, "cysteines and", glycine_count, "glycines." )


There are 45 arginines, 3 cysteines and 47 glycines.


### **Task 2**

Calculate the total number of Lysine's **(K)**, Serine's **(S)** and Alanine's **(A)** in the enzyme CrmG using the `.count` fucntion as shown above. Once the above code cell has been run, it will be possible to use the already defined variable `crmg` in your code without re-entering it.

*Amino acid codes are shown in brackets and bold font*

**TIP:** Remember! Case matters in python. Does the counter work the same counting `.count('k')` as it does with `.count('K')`? If not, why?

**ANSWER:** No, because the string values in `crmg` are written in upper case.

In [32]:
#There is no need to redefine crmg as it is stored already
lysine_count = crmg.count('K')
serine_count = crmg.count('S')
alanine_count = crmg.count('A')

#Printing the results ("\n" within a string creates a new line)
print("\nThere are", lysine_count, "lysines,", serine_count, "serines and", alanine_count, "alanines." )


There are 11 lysines, 30 serines and 54 alanines.


What type of variable is produced when `.count('')` is used? Check using  `type()`

In [47]:
type(alanine_count)

int

### Further calculation

Convert the amino acid totals for Lysine **(K)**, Serine **(S)** and Alanine **(A)** into a percentage make-up of the CrmG protein and print the results.

**NOTE:** There are **523** amino acids in CrmG

In [46]:
k_percent = (lysine_count /523)*100
s_percent = (serine_count /523)*100
a_percent = (alanine_count /523)*100
print("Lysine:", k_percent,"%", "\nSerine: ", s_percent,"%", "\nAlanine:",a_percent, "%",)

Lysine: 2.1032504780114722 % 
Serine:  5.736137667304015 % 
Alanine: 10.325047801147228 %


### What the CrmG enzyme looks like, visualized in PyMOL

The amino acid chain is depicted in <span style="color:#3B2FCE">**purple**</span>, and the water compound that surround it in <span style="color:#FFA41B">**yellow**</span>. The small molecule in <span style="color:#1FED2C">**green**</span> (center) is the coezyme PLP.

<center><img src="./STUFF/crmg_qual.png" width="800" align="center"/> 

If you have any questions/issues regarding these excercises do not hesitate to get in touch: r.cvek@warwick.ac.uk