In [1]:
# Setting up the Colab environment. DO NOT EDIT!
try:
  from applied_biostats import setup_environment
except ImportError:
  !pip -q install applied-biostats-helper
  from applied_biostats import setup_environment
finally:
  grader = setup_environment('Module02_lab')

# Lab

## Introduction

In this lab, we will delve into the exciting world of viral sample preparation for nanopore sequencing for COVID-19 sequencing.
Specifically, we will be comparing the yield of two different RT-PCR methods: the Paragon system and the PacBio system.

Before we can sequence our samples, we need to make sure that we have a sufficient amount of DNA to work with.
This means calculating the appropriate dilutions of our samples to ensure that we have enough material to generate accurate and reliable results.

The Paragon system uses over 200 pairs of overlapping PCR primers that generate DNA fragments that are, on average, 285 basepairs long.
On the other hand, the PacBio system uses 35 overlapping fragments that have an average length of 2200 bp.
It will be interesting to see which method produces a higher yield of DNA and is more efficient in preparing our samples for sequencing.

Get ready to dive into some math and use Python to compare these two RT-PCR methods!

## Learning Objectives

By the end of this activity, you will be able to:

 - Apply basic math in Python to summarize an experimental result.
 - Create f-strings to display dynamic results.
 - Compare and contrast the Nanopore and PacBio systems in terms of sample preparation for nCov2 sequencing.

# Protocol Evaluation

In the COVID sequencing lab we often have to evaluate multiple techniques for preparing viral samples for sequencing.
There are *many* aspects to this comparison, but one important one is the yield of the reaction that prepares DNA for sequencing.
This reaction is called an RT-PCR reaction because it _reverse-transcribes_ the nCov2 RNA into DNA through a _polymerase chain reaction_.

Currently, our lab uses a system with over 200 pairs of overlapping PCR primers that generate DNA fragments that are, on average, **285 basepairs** long.
We'll call this the _Paragon_ system.
Short RT-PCR fragments are ideal when dealing with degraded RNA because it allows for the amplification of even shredded RNA.
However, because no fragment is longer than ~300bp it is difficult to tell whether a mutation in one gene is 'linked' with a mutation in another gene.

So, we are exploring another technique, which we'll call the _PacBio_ system.
This system uses 35 overlapping fragments that have an average length of **2200 bp**.
These longer fragments help with understanding the linkage between more distant locations but are more difficult to run on degraded samples.

In this lab, you will work through a basic exercise in evaluating the yield of the _Paragon_ and _PacBio_ systems.

## Q1: Extract the relevant information from the text above

|               |    |
| --------------|----|
| Points        | 2  |
| Public Checks | 2  |
| Hidden Tests  | 0  |

In [1]:
# It is often useful to define all of your variables at the beginning.
# It is helpful to include units using '#' to keep track of your calculations

dna_weight = 650  # g/mole/bp

# bp
paragon_amplicon_length = 285  # SOLUTION
pacbio_amplicon_length = 2200  # SOLUTION

In [2]:
print(f'paragon_amplicon_length = {paragon_amplicon_length}')

paragon_amplicon_length = 285


In [3]:
print(f'pacbio_amplicon_length = {pacbio_amplicon_length}')

pacbio_amplicon_length = 2200


## Q2: Calculate the molecular weight of each template

In this lab, we are comparing the yield of two different RT-PCR methods for preparing viral samples for nanopore sequencing.
In order to accurately compare the yield of each method, it is important to know the molecular weight of the templates being used.
Therefore, as a part of this lab, we need to calculate the molecular weight of each template.
This will allow us to accurately compare the yield of each method and understand how the different characteristics of each method may impact the overall results.

|               |    |
| --------------|----|
| Points        | 2  |
| Public Checks | 4  |
| Hidden Tests  | 0  |

_Feeling stuck?_

Try doing it on **paper** first.

In [6]:
paragon_template_weight = paragon_amplicon_length*dna_weight  # SOLUTION
pacbio_template_weight = pacbio_amplicon_length*dna_weight  # SOLUTION

In [7]:
# Complete the cell above before running this one
print(f'The Paragon templates weigh {paragon_template_weight} g/mole')
print(f'The PacBio templates weigh {pacbio_template_weight} g/mole')

The Paragon templates weigh 185250 g/mole
The PacBio templates weigh 1430000 g/mole


In [8]:
print('Is paragon_template_weight an int or float:', isinstance(paragon_template_weight, (float, int)))

Is paragon_template_weight an int or float: True


In [9]:
print('Is pacbio_template_weight an int or float:', isinstance(pacbio_template_weight, (float, int)))

Is pacbio_template_weight an int or float: True


In [10]:
print(f'paragon_template_weight = {paragon_template_weight:0.1f}')

paragon_template_weight = 185250.0


In [11]:
print(f'paragon_template_weight = {paragon_template_weight:0.1f}')

paragon_template_weight = 185250.0


In order to investigate the impact of degradation on the yield of each protocol, you will examine two samples.
One sample has been freshly isolated and the other has been left at room temperature for 72 hours before preparation.
We then ran each sample according to the manufacturer's guidelines.

We obtained the following results after quantification of **15 ul** of final volume.

sample | Fresh | Degraded
-------|-------|---------
Paragon| 21.4 ng/ul  | 19.3 ng/ul
PacBio | 38.1 ng/ul | 7.4 ng/ul

## Q3: What is the _molarity_ of each _Paragon_ sample?

|               |    |
| --------------|----|
| Points        | 4  |
| Public Checks | 4  |
| Hidden Tests  | 2  |

_Feeling stuck?_

Try doing it on **paper** first.

A typical molarity for this protocol is between 50 and 500 fmols/ul.

In [12]:
# BEGIN SOLUTION NO PROMPT
paragon_fresh_conc = 21.4  # ng/ul
paragon_degraded_conc = 19.3  # ng/ul
# END SOLUTION
""" # BEGIN PROMPT
# Add variables to hold the concentration of each sample
"""; # END PROMPT

# Answer in fmoles/ul
paragon_fresh_molarity = paragon_fresh_conc * 1E-9 / paragon_template_weight / 1E-15 # SOLUTION
paragon_degraded_molarity = paragon_degraded_conc * 1E-9 / paragon_template_weight / 1E-15 # SOLUTION

In [13]:
# If the cell above is correct, this will print the results.
print(f'The fresh Paragon-prepped sample has a concentration of {paragon_fresh_molarity:0.1f} fmoles/ul')
print(f'The degraded Paragon-prepped sample has a concentration of {paragon_degraded_molarity:0.1f} fmoles/ul')

The fresh Paragon-prepped sample has a concentration of 115.5 fmoles/ul
The degraded Paragon-prepped sample has a concentration of 104.2 fmoles/ul


In [14]:
print('Is paragon_fresh_molarity a float:', isinstance(paragon_fresh_molarity, float))

Is paragon_fresh_molarity a float: True


In [15]:
print('Is paragon_degraded_molarity a float:', isinstance(paragon_degraded_molarity, float))

Is paragon_degraded_molarity a float: True


In [16]:
print('Is paragon_fresh_molarity reasonable [50, 500]:',
      (paragon_fresh_molarity > 50) & (paragon_fresh_molarity < 500))

Is paragon_fresh_molarity reasonable [50, 500]: True


In [17]:
print('Is paragon_degraded_molarity reasonable [50, 500]:',
      (paragon_degraded_molarity > 50) & (paragon_degraded_molarity < 500))

Is paragon_degraded_molarity reasonable [50, 500]: True


In [18]:
# HIDDEN
print(f'paragon_fresh_molarity = {paragon_fresh_molarity:0.1f}')

paragon_fresh_molarity = 115.5


In [19]:
# HIDDEN
print(f'paragon_degraded_molarity = {paragon_degraded_molarity:0.1f}')

paragon_degraded_molarity = 104.2


## Q4: What is the yield of each _PacBio_ sample?

Now, let's calculate the yield of the fresh and degraded PacBio samples.
The yield is the total amount of DNA produced in the reaction (as measured in femptomoles).
This information is important because it allows us to compare the efficiency of the PacBio system in both fresh and degraded samples and determine whether it is a suitable method for our purposes.

In [20]:
# BEGIN SOLUTION NO PROMPT
pacbio_fresh_conc = 38.1 # ng/ul
pacbio_degraded_conc = 7.4  # ng/ul
# END SOLUTION
""" # BEGIN PROMPT
# Add variables to hold the concentration of each sample in ng/ul
"""; # END PROMPT

In [21]:
# Calculate the molarity of each sample

pacbio_fresh_molarity = pacbio_fresh_conc * 1E-9 / pacbio_template_weight / 1E-15 # SOLUTION
pacbio_degraded_molarity = pacbio_degraded_conc * 1E-9 / pacbio_template_weight / 1E-15 # SOLUTION

In [22]:
print(f'The fresh PacBio-prepped sample had a molarity of {pacbio_fresh_molarity:0.1f} fmoles/ul')
print(f'The degraded PacBio-prepped sample had a molarity of {pacbio_degraded_molarity:0.1f} fmoles/ul')

The fresh PacBio-prepped sample had a molarity of 26.6 fmoles/ul
The degraded PacBio-prepped sample had a molarity of 5.2 fmoles/ul


In [23]:
# Calculate the total vield

pacbio_fresh_yield = pacbio_fresh_molarity*15  # SOLUTION
pacbio_degraded_yield = pacbio_degraded_molarity*15  # SOLUTION

In [24]:
print(f'The fresh PacBio-prepped sample had a yield of {pacbio_fresh_yield:0.1f} fmoles')
print(f'The degraded PacBio-prepped sample has a yield of {pacbio_degraded_yield:0.1f} fmoles')

The fresh PacBio-prepped sample had a yield of 399.7 fmoles
The degraded PacBio-prepped sample has a yield of 77.6 fmoles


In [25]:
print('Is pacbio_fresh_yield a float:', isinstance(pacbio_fresh_yield, float))

Is pacbio_fresh_yield a float: True


In [26]:
print('Is pacbio_degraded_yield a float:', isinstance(pacbio_degraded_yield, float))

Is pacbio_degraded_yield a float: True


In [27]:
print('Is pacbio_fresh_yield reasonable [50, 500]:',
      (pacbio_fresh_yield > 50) & (pacbio_fresh_yield < 500))

Is pacbio_fresh_yield reasonable [50, 500]: True


In [28]:
print('Is pacbio_degraded_yield reasonable [50, 500]:',
      (pacbio_degraded_yield > 50) & (pacbio_degraded_yield < 500))

Is pacbio_degraded_yield reasonable [50, 500]: True


In [29]:
print(f'pacbio_fresh_molarity = {pacbio_fresh_molarity:0.1f}')

pacbio_fresh_molarity = 26.6


In [30]:
print(f'pacbio_degraded_molarity = {pacbio_degraded_molarity:0.1f}')

pacbio_degraded_molarity = 5.2


In [31]:
print(f'pacbio_fresh_yield = {pacbio_fresh_yield:0.1f}')

pacbio_fresh_yield = 399.7


In [32]:
print(f'pacbio_degraded_yield = {pacbio_degraded_yield:0.1f}')

pacbio_degraded_yield = 77.6


## Q5: Which samples are _usable_?

In order to determine which samples are suitable for use in the sequencing protocol, you must first determine if they have sufficient concentration of DNA.
As stated in the protocol, you need to provide 200 fmoles of DNA in 10 ul of dH20.
Based on the yield data that you calculated for the fresh and degraded PacBio and Paragon samples, you will need to determine which samples have a concentration of DNA that meets or exceeds this requirement.
In other words, you will need to determine which samples are usable for the sequencing protocol based on their DNA concentration.

|               |    |
| --------------|----|
| Points        | 6  |
| Public Checks | 1  |
| Hidden Tests  | 4  |

In [35]:
# BEGIN SOLUTION NO PROMPT

# Do the necessary calculations in this cell or the create more above.
# Remember, you can use variables that you have calculated in cells before.

wanted_dna = 200 #fmoles

pfa = wanted_dna/paragon_fresh_molarity
pda = wanted_dna/paragon_degraded_molarity

bfa = wanted_dna/pacbio_fresh_molarity
bda = wanted_dna/pacbio_degraded_molarity

print(f'Paragon Fresh Required: {pfa:0.2f} ul', f'Paragon Degraded Required: {pda:0.2f} ul')
print(f'PacBio Fresh Required: {bfa:0.2f} ul', f'PacBio Degraded Required: {bda:0.2f} ul')

# END SOLUTION
""" # BEGIN PROMPT
# Do the necessary calculations in this cell or the create more above.
# Remember, you can use variables that you have calculated in cells before.
# Finally, write a set of formatted print statements that describe the volume of DNA required for each reaction.
"""; # END PROMPT

Paragon Fresh Required: 1.73 ul Paragon Degraded Required: 1.92 ul
PacBio Fresh Required: 7.51 ul PacBio Degraded Required: 38.65 ul


In [37]:
# put a 'yes' or 'no' for each sample

paragon_fresh_usable = 'yes'  # SOLUTION
paragon_degraded_usable = 'yes'  # SOLUTION

pacbio_fresh_usable = 'yes'  # SOLUTION
pacbio_degraded_usable = 'no'  # SOLUTION

In [38]:
choices = {'yes', 'no'}
names = ['paragon_fresh_usable', 'paragon_degraded_usable', 
          'pacbio_fresh_usable', 'pacbio_degraded_usable']
answers = [paragon_fresh_usable, paragon_degraded_usable, 
          pacbio_fresh_usable, pacbio_degraded_usable]
for name, ans in zip(names, answers):
    print(f'Is {name} a str:', isinstance(ans, str))
    print(f'Is {name} "yes" or "no":', ans in choices)

Is paragon_fresh_usable a str: True
Is paragon_fresh_usable "yes" or "no": True
Is paragon_degraded_usable a str: True
Is paragon_degraded_usable "yes" or "no": True
Is pacbio_fresh_usable a str: True
Is pacbio_fresh_usable "yes" or "no": True
Is pacbio_degraded_usable a str: True
Is pacbio_degraded_usable "yes" or "no": True


In [39]:
# HIDDEN
print(f'paragon_fresh_usable = "{paragon_fresh_usable.lower()}"')

paragon_fresh_usable = "yes"


In [40]:
# HIDDEN
print(f'paragon_degraded_usable = "{paragon_degraded_usable.lower()}"')

paragon_degraded_usable = "yes"


In [41]:
# HIDDEN
print(f'pacbio_fresh_usable = "{pacbio_fresh_usable.lower()}"')

pacbio_fresh_usable = "yes"


In [42]:
# HIDDEN
print(f'pacbio_degraded_usable = "{pacbio_degraded_usable.lower()}"')

pacbio_degraded_usable = "no"


# Conclusion

Congratulations on completing this lab! You have successfully compared the yield of two different RT-PCR methods for preparing viral samples for genomic sequencing.
You have determined that the Nanopore system is ideal for degraded samples due to its shorter PCR fragments, while the PacBio system is better for fresher samples and is able to produce longer sequencing fragments.
This is just the beginning of your journey with Python and data analysis.
Keep up the great work as we continue to expand on these techniques and explore more complicated results in future assignments.

--------------------------------------------

## Submission

Check:
 - That all tables and graphs are rendered properly.
 - Code completes without errors by using `Restart & Run All`.
 - All checks **pass**.
 - Excess code cells and print statments have been removed to create a _clean_ submission.

Remember, as this is a lab, there are hidden tests that you will be evaluated against.
Just because all checks pass does not mean everything is correct.
Double-check your work!

Then save the notebook and the `File` -> `Download` -> `Download .ipynb`. Upload this file to BBLearn.