![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fhackathon&branch=master&subPath=ColonizingMars/ChallengeTemplates/challenge-option-1-should-we-colonize-Mars.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

In [126]:
# remove the '#' below and run this line if you are having an error that says "no module named Bio"
#!pip install biopython

# 🚀 Spacer's Challenge : Using Data and Biology to Decide What We Should Grow On Mars


## 📌 Mission Statement

You are a top scientist sent from Earth tasked with the most important mission of all: ensuring our survival on other planets. Before sending you off to space, your team gave you a variety of crop seeds, however there's a problem! The bag containing the crop seeds split open and spilled everywhere during your space travel! You picked up the ones you could find, but you now have no idea what it grows or what is special about them.

Now on Mars, it is your job to [decide which crops to grow](https://www.sciencealert.com/growing-potatoes-on-mars-might-actually-work-hints-a-new-experiment). We will be using some basic bioinformatics concepts to try and find this out. To help you with this, you will work with a special robot-helper who will help write code for you, named ***Crop-3PO***.
![](http://biowyse.eu/wp-content/uploads/2016/05/agrospace_01.jpg)


**🤖 Crop-3PO:** Welcome! The people of Mars are counting on you! In order to use the special command, we need to import a special library by typing `import pandas`. We'll also need a few other libraries. 

**🤖 Crop-3PO:** I wrote it down in the code cell below, however because I'm just a robot I can't execute the code! I'll need your help to do that. The easiest way to do this is by clicking `cell -> Run All`. If you don't see anything or think something's broken, you can click `Run`in the toolbar above while you have the code cell selected.


In [1]:
# ❗️ Run this cell!
# import libraries
import pandas 
import plotly.express as px
import plotly.graph_objects as go
from Bio import SeqIO

In [6]:
# ❗️ Run this cell!
# import data
cropdata = pandas.read_csv("./data/cropdatafromearth.csv")
traitdata = pandas.read_csv("./data/isaaa_croptraits.csv")
GMtraitsdata = pandas.read_csv("./data/traitsofGMcrops.csv")
approvaldata = pandas.read_csv("./data/foodapprovals.csv")

## 🌽 Before we get started, what even is a GMO? 

*According to [gmotesting.com](http://www.gmotesting.com/GMOs/What-is-a-GMO):*

> **GMO** is an acronym for genetically modified organism and is commonly used to refer to genetically altered crops that are grown in many areas around the globe.  Genetic engineering provides the ability to confer desired traits on plants such as herbicide tolerance and/or virus or insect resistance.  

> The process of genetic modification involves splicing, or cutting, genes from one organism, such as a bacterium, virus, or animal, and inserting them into a recipient organism, such as a plant, so that the recipient is now able to express new traits provided by the donor genes.  The genetic material (commonly called a transgene) is inserted into the nucleus of a plant cell where it integrates into the plant DNA.  If integration of the DNA is successful, the plant cell, now described as a transgenic cell, divides and grows into a genetically modified (or transgenic) plant.  The genetic modification is permanent and will be passed on to the seeds of the transgenic (GMO) plant.

### **🤖 Crop-3PO:** GMOs have grown in popularity internationally over the last 30 years as  our scientific methods become more reliable. Take a look at some of these charts that show which countries started adopting GMOs earlier than others. 

In [124]:
# ❗️ Run this cell!
# Display a histogram of the number of approved GMOS by country
fig = px.bar(approvaldata, x="Country", y = '1992–2003', title="Number of Approved GMOs between 1992-2003")
fig.update_yaxes(range=[0, 400])
fig.show()

In [125]:
# ❗️ Run this cell!
# Display a histogram of the number of approved GMOS by country
fig = px.bar(approvaldata, x="Country", y = '2004–2014', title="Number of Approved GMOs between 2004-2014")
fig.show()

### 📚 Look at the trends of approved GMOs by countries over the last three decades. What differences do you see? List some reasons why this might be.


Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

## 🌽 How are GMOs made?

![](https://i1.wp.com/sitn.hms.harvard.edu/wp-content/uploads/2015/08/Untitled2.png?w=900)
Image from: http://sitn.hms.harvard.edu/flash/2015/how-to-make-a-gmo/

## 🌽  What are plant traits? How are traits selected for?

Plant traits are desirable phenotypes (observable expression of a gene, it's functional use), and can be almost anything as long as they are useful. Century after century, farmers and botanists have selected plants that have desirable traits (sweet taste, size, etc) and cross bred them over time to produce more of that crop. With the use of modern technology however, we have successfully determined various genes that result in different plant traits. Some traits are not just the addition or enhancement of something, but can also be a surpresion (for instead, delayed ripening). Instead of breeding over many generations, we can directly insert the gene and have more direct and reliable results.

To learn more about the science of how this is actually done, read here: https://www.nationalgeographic.com/environment/future-of-food/food-technology-gene-editing/

In [4]:
# ❗️ Run this cell!
# Display a histogram breaking down the GMO varieties in most popular crops
fig = px.histogram(cropdata, x="Crop", color="Variety", title = "Breakdown of GMO Varieties in Top 90% of Crops")
fig.show()

## 🌽 How are plant genes and proteins related to traits? 


Genes are actually only one part of the story. The way a gene becomes useful is by being translated into a protein in the cell. This protein then goes on to have different functions based on it's physical properties. 

*From [The University of Kentucky's College of Agriculture:](https://entomology.ca.uky.edu/ef130)* 
> To transform a plant into a GMO plant, the gene that produces a genetic trait of interest is identified and separated from the rest of the genetic material from a donor organism. Most organisms have thousands of genes, a single gene represents only a tiny fraction of the total genetic makeup of an organism. 

>A donor organism may be a bacterium, fungus or even another plant. In the case of Bt corn, the donor organism is a naturally occurring soil bacterium, Bacillus thuringiensis, and the gene of interest produces a protein that kills Lepidoptera larvae, in particular, European corn borer. This protein is called the Bt delta endotoxin. Growers use Bt corn as an alternative to spraying insecticides for control of European and southwestern corn borer. 

In [102]:
# ❗️ Run this cell!
# histogram of different genes associated with traits
fig = px.histogram(traitdata, x="Trait", title="Number of genes associated with traits", 
                   labels={'Trait':'Phenotypic Trait'}).update_xaxes(categoryorder="total descending")
fig.show()

### 📚 Look at the distribution of traits in the above two graphs. Which traits are the most popular? Why do you think that is?  


Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

### **🤖 Crop-3PO:** As you can see below, there are many genes associated with traits! 

In [113]:
# ❗️ Run this cell!
# View the ENTIRE data frame
pandas.set_option('display.max_rows', None)
traitdata

Unnamed: 0,Trait,Gene,Product,Source,Function
0,"2,4-D herbicide tolerance",aad-12,Delftia acidovorans,aryloxyalkanoate di-oxygenase 12 (AAD-12) protein,"catalyzes the side chain degradation of 2,4-D herbicide"
1,Altered lignin production,ccomt (inverted repeat),Medicago sativa (alfalfa),dsRNA that suppresses endogenous S-adenosyl-L-methionine: trans-caffeoyl CoA 3-O-methyltransferase (CCOMT gene) RNA transcript levels via the RNA interference (RNAi) pathway,reduces content of guaiacyl (G) lignin
2,Altered lignin production,EgCAld5H,Eucalyptus grandis,CAld5H enzyme,regulates the syringyl monolignol pathway;
3,Anti-allergy,7crp,synthetic form of tolerogenic protein from Cryptomeria japonica,modified cry j 1 and cry j 2 pollen antigens containing seven major human T cell epitopes,triggers mucosal immune tolerance to cedar pollen allergens
4,Antibiotic resistance,aad,Escherichia coli,3''(9)-O-aminoglycoside adenylyltransferase enzyme,allows selection for resistance to aminoglycoside antibiotics such as spectinomycin and streptomycin
5,Antibiotic resistance,aph4 (hpt),Escherichia coli,hygromycin-B phosphotransferase (hph) enzyme,allows selection for resistance to the antibiotic hygromycin B
6,Antibiotic resistance,bla,Escherichia coli,beta lactamase enzyme,detoxifies beta lactam antibiotics such as ampicillin
7,Antibiotic resistance,hph,Streptomyces sp.,hygromycin phosphotransferase,allows selection for resistance to the antibiotic hygromycin B
8,Antibiotic resistance,nptII,Escherichia coli Tn5 transposon,neomycin phosphotransferase II enzyme,allows transformed plants to metabolize neomycin and kanamycin antibiotics during selection
9,Antibiotic resistance,spc,Escherichia coli,spectinomycin adenyl transferase enzyme (not expressed in plant tissues),"confers resistance to spectinomycin/streptomycin antibiotics, which permits prokaryotic selection"


# 🧬 💻  Bioinformatics Hands-on portion

Now that you understand a little more of what GMO's are, it's time to try and find out what our seeds are. Luckily, you have a sequencing machine that can tell you the DNA sequence of a seed. Crop-3PO also has a list of genes, but only knows a few of their DNA sequences. (Note that these sequences are made up, and that actual DNA sequences are much longer!)

You sort your seeds out into 10 different groups based on colour, size, and hardness. You believe these ones are the same. By the end you have a total of 10 groups with only a few seeds each. To avoid wasting seeds, you tell Crop-3PO to sequence only one from each group. You call these your samples.

### **🤖 Crop-3PO:** I know lots of genes and sequences! Here is an example of what `cry1Ab` looks like:

In [7]:
# ❗️ Run this cell!
# Here, we can see a gene sequence associated with cry1AB_1
genes_dict = SeqIO.to_dict(SeqIO.parse("./data/genes.fa", "fasta"))
print(genes_dict["cry1Ab_1"].format("fasta"))

>cry1Ab_1
GACCCCACCAACCCAGCCCTGCGCGAGGAGATGCGCATCCAGTTCAACGACATGAACTCT
GCCCTGACCACCGCCATCCCACTCTTCGCTGTCCAGAACTACCAGGTCCCTCTCCTGTCT



### **🤖 Crop-3PO:** Here are all the genes I know, some I only know part of the sequence for. If it looks similar, you can act like they are the same as the trait. For instance `tNOS` can be considered `nos` for our purposes

In [23]:
# ❗️ Run this cell!
#it will also be helpful for us to get a list of all the genes and samples we have
geneslist = [] 
geneslist.append(genes_dict.keys())
geneslist

[dict_keys(['cry1Ab_1', 'tNOS', 'cry1Ac', 'Cry2Ab2', 'Epsps_1', 'Prsv_cp_1', 'CEL1', 'bar', 'pat'])]

### **🤖 Crop-3PO:** Here is a list of all the samples I successfully sequenced!

In [24]:
# ❗️ Run this cell!
# list of our total samples
sampleslist = [] 
sampleslist.append(samples_dict.keys())
sampleslist

[dict_keys(['sample1', 'sample2', 'sample3', 'sample4', 'sample5', 'sample6', 'sample7', 'sample8', 'sample9', 'sample10'])]

### **🤖 Crop-3PO:** Here's an example of what the sequence for Sample 1 looks like.

In [8]:
# ❗️ Run this cell!
# let's look at some of the information in our first sample
samples_dict = SeqIO.to_dict(SeqIO.parse("./data/samples.fa", "fasta"))
print(samples_dict["sample1"].format("fasta"))

>sample1
GTCTGTGGTTGCTGTTATAGGCCTTCCAAACGATCCATCTGTTAGGTTGCATGAGGCTTT
GGGATACACAGCCCGGGGTACATTGCGCGCAGCTGGATACAAGCATGGTGGATGGCATGA
CCTTTGGGTCACGATCTCCCACCTTACTGGAATTTAGTCCCTGCTATAATTTGCCTTGCA
TATAAGTTGCGTTACTTCAGCGTCCTAACCGCACCCTTAGCACGAAGACAGATTTGTTCA
TTCCCATACTCCGGCGTTGGCAGGGGGTTCGCATGTCCCACGTGAAACGTTGCTAAACCC
TCAGGTTTCTGAGCGACAAAAGCTTTAAACGGGAGTTCGCGCTCATAACTTGGTCCGAAT
GCGGGTTCTTGCATCGTTCGACTGAGTTTGTTTCATGTAGAACGGGCGCAAAGTATACTT
AGTTCAATCTTCAATACCTCGTATCATTGTACACCTGCCGGTCACCACCCGACCCCACCA
ACCCAGCCCTGCGCGAGGAGATGCGCATCCAGTTCAACGACATGAACTCTGCCCTGACCA
CCGCCATCCCACTCTTCGCTGTCCAGAACTACCAGGTCCCTCTCCTGTCTAACGATGTGG
GGACGGCGTTGCAACTTCGAGGACCTAATCTGACCGACCTAGATTCGGCACTGTGGGCAA
TATGAGGTATTGGCAGACACCCAGTGCCGAACAACACCTGACCTAACGGTAAGAGAGTCT
CATAATGCGTCCGGCCGCGTGCCCAGGGTATATTTGGACAGTATCGAATGGACTGAGATG
AACCTTTACACCGATCCGGAAACGGGTGCGTGGATTAGCCAGGAGCAAACGAAAAATCCT
GGGCTACTTGATGTCTTGTGACGTTCTTAGAGATGGACGAAATGTTTCACGACCTAGGAT
AAGGTCGCCCTACAAAATAGGCCAAGACTTTCTTTAGTCCGCTGATGGGACACTATATGA
AAAGCGTTCTAAGCA

### **🤖 Crop-3PO:** That's probably pretty difficult for a human to read. Luckily, as a machine I can very quickly match the sequences of our genes and tell you whether a sample is likely to have the gene or not. Let me try doing this, and you can see my output down below.

In [98]:
# ❗️ Run this cell!
# now we're going to match the genes to each sample
for sample in samples_dict: 
    x.append(samples_dict[sample].name)
    y.append(samples_dict[sample].seq)
    temp2 = []
    for gene in genes_dict:
        temp1 = []
        if genes_dict[gene].seq in samples_dict[sample].seq: 
            temp1.append(genes_dict[gene].name)
            temp2.append(temp1)
    print("🔬\033[1;36m", samples_dict[sample].name, "\033[1;0m contains matches for \033[1;32m", temp2, "\033[1;0m")

🔬[1;36m sample1 [1;0m contains matches for [1;32m [['cry1Ab_1'], ['Prsv_cp_1'], ['bar'], ['pat']] [1;0m
🔬[1;36m sample2 [1;0m contains matches for [1;32m [['CEL1'], ['pat']] [1;0m
🔬[1;36m sample3 [1;0m contains matches for [1;32m [['cry1Ab_1'], ['Prsv_cp_1'], ['CEL1']] [1;0m
🔬[1;36m sample4 [1;0m contains matches for [1;32m [['cry1Ab_1']] [1;0m
🔬[1;36m sample5 [1;0m contains matches for [1;32m [['Cry2Ab2'], ['Epsps_1']] [1;0m
🔬[1;36m sample6 [1;0m contains matches for [1;32m [['Cry2Ab2'], ['Epsps_1']] [1;0m
🔬[1;36m sample7 [1;0m contains matches for [1;32m [['cry1Ab_1'], ['cry1Ac']] [1;0m
🔬[1;36m sample8 [1;0m contains matches for [1;32m [['tNOS'], ['Epsps_1'], ['pat']] [1;0m
🔬[1;36m sample9 [1;0m contains matches for [1;32m [['tNOS'], ['bar'], ['pat']] [1;0m
🔬[1;36m sample10 [1;0m contains matches for [1;32m [['cry1Ac'], ['Prsv_cp_1'], ['bar']] [1;0m


###  **🤖 Crop-3PO:** On a larger scale, if we had sequence matches for every single gene, we would likely see something like this:
![](./images/genome.png)

Image obtained from: https://www.nature.com/articles/s41598-019-51668-x/figures/3

### **🤖 Crop-3PO:** Pretty cool trick huh? Now all that's left to do is try to figure out which ones are useful for us. You'll probably want to look at the genes in each sample and think about which traits they're associated with, and whether or not that is useful for us to have here on Mars. I've listed them down below.

In [12]:
# trait data narrowed down to our genes
traitdata.loc[traitdata['Gene'].isin(['cry1Ab','nos','cry1Ac','cry2Ab2','epsps (Ag)','prsv_cp','cel1','bar','pat'])]

Unnamed: 0,Trait,Gene,Product,Source,Function
34,Glufosinate herbicide tolerance,bar,Streptomyces hygroscopicus,phosphinothricin N-acetyltransferase (PAT) enzyme,eliminates herbicidal activity of glufosinate ...
35,Glufosinate herbicide tolerance,pat,Streptomyces viridochromogenes,phosphinothricin N-acetyltransferase (PAT) enzyme,eliminates herbicidal activity of glufosinate ...
39,Glyphosate herbicide tolerance,epsps (Ag),Arthrobacter globiformis,5-enolpyruvylshikimate-3-phosphate-synthase en...,confers tolerance to glyphosate herbicides
53,Lepidopteran insect resistance,cry1Ab,Bacillus thuringiensis subsp. kurstaki,Cry1Ab delta-endotoxin,confers resistance to lepidopteran insects by ...
56,Lepidopteran insect resistance,cry1Ac,Bacillus thuringiensis subsp. Kurstaki strain ...,Cry1Ac delta-endotoxin,confers resistance to lepidopteran insects by ...
60,Lepidopteran insect resistance,cry2Ab2,Bacillus thuringiensis subsp. kumamotoensis,Cry2Ab delta-endotoxin,confers resistance to lepidopteran insects by ...
121,Nopaline synthesis,nos,Agrobacterium tumefaciens strain CP4,nopaline synthase enzyme,"catalyses the synthesis of nopaline, which per..."
137,Viral disease resistance,prsv_cp,Papaya ringspot virus (PRSV),coat protein (CP) of the papaya ringspot virus...,confers resistance to papaya ringspot virus (P...
144,Volumetric Wood Increase,cel1,Arabidopsis thaliana,CEL1 recombinant protein,promotes a faster growth


### 📚 Based on the information available to us, what traits does each sample have?

Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

### 📚 Assuming the samples are the same plant (e.g. all are Corn, etc), which two crops would you try to cross first to achieve a "stacked" trait? 

### A stacked trait is a trait that has multiple benefits.

Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

## 📌 References 
#### The data used in this notebook was obtained from the following sources:
 
Rhodora R Aldemita, Ian Mari E Reaño, Renando O Solis & Randy A Hautea (2015) Trends in global approvals of biotech crops (1992–2014), GM Crops & Food, 6:3, 150-166, DOI: 10.1080/21645698.2015.1056972 

Debode, F., Hulin, J., Charloteaux, B. et al. Detection and identification of transgenic events by next generation sequencing combined with enrichment technologies. Sci Rep 9, 15595 (2019). https://doi.org/10.1038/s41598-019-51668-x

ISAAA's GM Approval Database. http://www.isaaa.org/gmapprovaldatabase/

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)