# Week 46

In [1]:
try:
    import fysisk_biokemi
    print("Already installed")
except ImportError:
    %pip install -q "fysisk_biokemi[colab] @ git+https://github.com/au-mbg/fysisk-biokemi.git"

In [2]:
import numpy as np

## Extinction coefficient of human myoglobin

The protein of human myoglobin is given below

In [3]:
sequence = """GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGA
TVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNY
KELGFQG"""

We want to calculate the *extinction coefficient* of this protein, we
have seen that this can be calculated using the formula

<span id="eq-extinction">$$
\epsilon(280 \mathrm{nm}) = N_{Trp} A + N_{Tyr} B + N_{Cys} C 
 \qquad(1)$$</span>

Where $N_{Trp}$ is the number of Tryptophan in the protein (and likewise
for the other two terms), and the three constants $A$, $B$ and $C$ are
given as

$$
\begin{align}
A &= 5500 \ \mathrm{M^{−1} cm^{−1}} \\
B &= 1490 \ \mathrm{M^{−1} cm^{−1}} \\
C &= 125 \ \mathrm{M^{−1} cm^{−1}}
\end{align}
$$

In order to calculate the formula we need to know the count of the
relevant residues, we can use Python to get that - for example we can
count the number of Tryptophan like so;

In [4]:
N_trp = sequence.count("W")

#### (a) Determine the number of residues

In the cell below find the number of residues

In [5]:
N_tyr = sequence.count("Y")
N_cys = sequence.count("C")

In [7]:
print(f"{N_trp = }")
print(f"{N_tyr = }")
print(f"{N_cys = }")

N_trp = 2
N_tyr = 2
N_cys = 1

#### (b) Calculate the extinction coefficient

Use equation
(<a href="#eq-extinction" class="quarto-xref">Equation 1</a>) to
calculate the extinction coefficient of human myoglobin.

In [8]:
A = 5500
B = 1490
C = 125

In [10]:
epsilon = A * N_trp + B * N_tyr + C*N_cys
print(epsilon)

14105

What are the units of this value?

#### (c) Compare residue count to ProtParam

ProtParam is an online tool that calculates various physical and
chemical parameters from a given protein sequence and is used worldwide
in research laboratories.

<figure>
<img
src="https://raw.githubusercontent.com/au-mbg/fysisk-biokemi/refs/heads/main/lessons/figures/week_46/protpram_input.png"
alt="Protpram tool input page." />
<figcaption aria-hidden="true">Protpram tool input page.</figcaption>
</figure>

Go to ProtPram at this link: <https://web.expasy.org/protparam/> and
paste the sequence and click **Compute Parameters**. You should then see
the calculated parameters, similar to in the image below

<figure>
<img
src="https://raw.githubusercontent.com/au-mbg/fysisk-biokemi/refs/heads/main/lessons/figures/week_46/protpram_output.png"
alt="Protpram tool calculated properties." />
<figcaption aria-hidden="true">Protpram tool calculated
properties.</figcaption>
</figure>

On the output page you will see the number of residues, does that match
your calculation?

#### (d) ProtPram extinction coefficient

Scrolling further down you will find that ProtParam also gives the
extinction coefficient for the protein. As the absorbance of Cys
residues varies slightly dependent on whether disulfide bridges
(cystines) are present or not, ProtParam list the extinction coefficient
in both reduced and oxidized states of the Cys residues.

Consider the following questions

-   Why are the extinction coefficients of myoglobin the same whether
    cystines are formed or not?

-   Does the extinction coefficient computed in ProtParam match the
    value you have calculated?

#### (e) Calculate the absorbance

Using the extinction coefficient and the molecular weight given by
ProtParam, calculate the absorbance at 280 nm of a myoglobin solution at
a concentration of 1 mg/mL in a cuvette with a light path of 1 cm.

In [11]:
molecular_weight = 17052.61 
path_length = 1 # cm 
concentration = 1 # mg/mL

Remember to convert the concentration to $\mathrm{mol/L}$.

In [14]:
A280 = concentration / molecular_weight * path_length * epsilon
print(A280)

0.827146108425631

This value is what is known as the A280(0.1%) of a protein, i.e. the
absorbance of a given protein at a concentration of 0.1% weight/volume
(= 1 g/L = 1 mg/mL).

#### (f) Compare the absorbance to Protpram

Does the number reported by ProtParam correspond to the number you just
calculated in (e)?

## Average properties of amino acids and proteins.

In the accompanying Excel file (`AA_frequency.xlsx`), you will find a
tablet that contains the molecular weight of the 20 common amino acid
residues, i.e. their weight as residues in a peptide chain.
Additionally, you will find their relative frequency in E. coli
proteins, where a frequency of 0.01 means that this residue constitutes
1 % of the residues in a protein.

#### (a) Load the data file.

Use the widget below to load the `AA_frequency.xlsx` file.

In [15]:
from IPython.display import display 
from fysisk_biokemi.widgets import DataUploader
uploader = DataUploader()
uploader.display()

In [16]:
df = uploader.get_dataframe()
display(df)

In [17]:
from IPython.display import display 
from fysisk_biokemi.datasets import load_dataset
df = load_dataset('AA_frequency')
display(df)

#### (b) Average molecular weight

Calculate the average molecular weight of a residue in a protein?

> **Tip**
>
> You can use `np.sum` to sum all values in an array.

In [19]:
average_mw = np.sum(df['MW of AA residue'] * df['Frequency in proteins'])
print(f"{average_mw = :3.3f}")

average_mw = 110.566

#### (c) Weight of 300-residue protein

What would the molecular weight of a 300-residue protein most likely be,
if you did not know its sequence?

In [21]:
mw_300 = average_mw * 300
print(f"{mw_300 = :3.3f}")

mw_300 = 33169.941

------------------------------------------------------------------------

In many proteins, you will be working with a mixture of proteins. This
could for example be a cell lysate or a biological fluid for protein
abundance analysis, or the early stages of a protein purification
process. In these situations, you cannot work with a molecule specific
extinction coefficient. Instead, we would use the average values, which
we will determine below.

------------------------------------------------------------------------

#### (d) Average concentration of residues

Calculate the average concentration of amino acid residues in a protein
mixture at 1 mg/mL.

In [23]:
c_residue_avg = 1 / average_mw
print(f"{c_residue_avg = :3.3f} M")

c_residue_avg = 0.009 M

#### (e) Absorbance

Calculate the absorbance from such a mixture under the assumption that
only Trp and Tyr contribute.

In [25]:
freq = df.set_index("Name")["Frequency in proteins"]
f_trp = freq["Tryptophan (Trp/W)"]
f_tyr = freq["Tyrosine (Tyr/Y)"]

c_trp = c_residue_avg * f_trp
c_tyr = c_residue_avg * f_tyr
print(f"{c_trp = :3.5f}")
print(f"{c_tyr = :3.5f}")

c_trp = 0.00012
c_tyr = 0.00030

In [27]:
L = 1 # Path length
A280 = L * (5500 * c_trp + 1490 * c_tyr)
print(f"{A280 = :3.3f}")

A280 = 1.091

#### (f) Adsorbance $\rightarrow$ concentration

For a cell lysate, you measure and absorbance of 0.78 at a path length
of 0.5 cm. What is the protein concentration?

In [29]:
# Set known values:
A = 0.78 # Unitless
l = 0.5 # cm

# Extract frequencies
freq = df.set_index("Name")["Frequency in proteins"]
f_trp = freq["Tryptophan (Trp/W)"]
f_tyr = freq["Tyrosine (Tyr/Y)"]

# Calculate extintinction coefficent in [L/(mol cm)]
eps_mix = 5500 * f_trp + 1490 * f_tyr
# Calculate molar concentration
c_res = A / (l * eps_mix) # [mol/L]
# Calculate concentration:
conc_mg_per_mL = c_res * average_mw # [g/mol] = [mg/mL] because mw is [g/mol]
print(f"Protein concentration = {conc_mg_per_mL:.3f} mg/mL")

Protein concentration = 1.429 mg/mL