![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fhackathon&branch=master&subPath=ColonizingMars/ChallengeTemplates/challenge-option-1-should-we-colonize-Mars.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# 🚀 Spacer's Challenge : Using Data and Biology to Decide What We Should Grow On Mars


## 📌 Mission Statement

You are a top scientist sent from Earth tasked with the most important mission of all: ensuring our survival on other planets. Before sending you off to space, your team gave you a variety of crop seeds and a data set containing information on them, including some biological traits that were designed to give them an edge in this new environment. 

Now on Mars, it is your job to use this data to [decide which crops to grow](https://www.sciencealert.com/growing-potatoes-on-mars-might-actually-work-hints-a-new-experiment). As part of this mission, you will be asked to convince the other astronauts to explain your reasoning using computational thinking and data science skills. 

![](http://biowyse.eu/wp-content/uploads/2016/05/agrospace_01.jpg)

## 📌 Obtaining the Crop Data

### OBJECTIVE

Our first mission objective is to access this data. In order to make sure the data could make it safely, they placed it in a special device. You plug in the device to your computer, but you realize you need to use a special computer command to open it. On this mission, you will work with a special robot-helper who will help write code for you, ***Crop-3PO***.

**🤖 Crop-3PO:** Welcome! The people of Mars are counting on you! In order to use the special command, we need to import a special library by typing `import pandas`. We'll also need a few other libraries. 

**🤖 Crop-3PO:** I wrote it down in the code cell below, however because I'm just a robot I can't execute the code! I'll need your help to do that. Make sure whenever you see `# ❗️ Run this cell!` you click the run icon in the toolbar above while you have the code cell selected! Otherwise we will get error messages. Here, I'll start in the cell below!

In [4]:
#!pip install biopython



In [84]:
# ❗️ Run this cell!
import pandas 
import plotly.express as px
import plotly.graph_objects as go
from Bio import SeqIO

**🤖  Crop-3PO:** Now that we've told the computer we are going to use a special set of commands, I'll grab the data using code and assign them variable names, make sure you run the code cells so that it works!

In [85]:
# ❗️ Run this cell!
cropdata = pandas.read_csv("./cropdatafromearth.csv")
traitdata = pandas.read_csv("./isaaa_croptraits.csv")
GMtraitsdata = pandas.read_csv("./traitsofGMcrops.csv")
approvaldata = pandas.read_csv("./foodapprovals.csv")

In [98]:
#explore percent of all corn, soybean, etc planted - viz 


## 📌 GMO Approvals over time

In [105]:
fig = px.bar(approvaldata, x="Country", y = '1992–2003', title="Number of Approved GMOs between 1992-2003")
fig.update_yaxes(range=[0, 400])
fig.show()

In [104]:
fig = px.bar(approvaldata, x="Country", y = '2004–2014', title="Number of Approved GMOs between 2004-2014")
fig.show()

# 📚 Look at the trends of approved GMOs by countries over the last three decades. What differences do you see? List some reasons why this might be.


Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

### 🌽 Why is this useful in food supply? 
http://sitn.hms.harvard.edu/flash/2015/how-to-make-a-gmo/

# Genotypes and Phenotypes

### 🌽  What are plant traits? 
https://www.nationalgeographic.com/environment/future-of-food/food-technology-gene-editing/

![](./images/natgeo_geneedit.jpg)

In [79]:
## explore crop traits

In [102]:
# histogram of different genes associated with traits
fig = px.histogram(traitdata, x="Trait", title="Number of genes associated with traits", 
                   labels={'Trait':'Phenotypic Trait'}).update_xaxes(categoryorder="total descending")
fig.show()

# 📚 Look at the distribution of traits above. Which traits are the most popular? Why do you think that is?  


Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

### 🌽 How are plant genes and proteins related to traits? 

https://www.isaaa.org/gmapprovaldatabase/geneslist/default.asp


In [6]:
# explore genes associated with crop traits

In [88]:
traitdata[:145]

Unnamed: 0,Trait,Gene,Product,Source,Function
0,"2,4-D herbicide tolerance",aad-12,Delftia acidovorans,aryloxyalkanoate di-oxygenase 12 (AAD-12) protein,"catalyzes the side chain degradation of 2,4-D ..."
1,Altered lignin production,ccomt (inverted repeat),Medicago sativa (alfalfa),dsRNA that suppresses endogenous S-adenosyl-L-...,reduces content of guaiacyl (G) lignin
2,Altered lignin production,EgCAld5H,Eucalyptus grandis,CAld5H enzyme,regulates the syringyl monolignol pathway;
3,Anti-allergy,7crp,synthetic form of tolerogenic protein from Cry...,modified cry j 1 and cry j 2 pollen antigens c...,triggers mucosal immune tolerance to cedar pol...
4,Antibiotic resistance,aad,Escherichia coli,3''(9)-O-aminoglycoside adenylyltransferase en...,allows selection for resistance to aminoglycos...
...,...,...,...,...,...
140,Viral disease resistance,wmv_cp,Watermelon Mosaic Potyvirus 2 (WMV2),coat protein of watermelon mosaic potyvirus 2 ...,confers resistance to watermelon mosaic potyvi...
141,Viral disease resistance,zymv_cp,Zucchini Yellow Mosaic Potyvirus (ZYMV),coat protein of zucchini yellow mosaic potyvir...,confers resistance to zucchini yellow mosaic p...
142,Visual marker,dsRed2,Discosoma sp.,red fluorescent protein,"produces red stain on transformed tissue, whic..."
143,Visual marker,uidA,Escherichia coli,beta-D-glucuronidase (GUS) enzyme,produces blue stain on treated transformed tis...


# 🧬 💻  Bioinformatics Hands-on portion

Now that you understand a little more of what GMO's are, it's time to try and find out what our seeds are. Luckily, you have a sequencing machine that can tell you the DNA sequence of a seed. Crop-3PO also has a list of genes, but only knows a few of their DNA sequences.  

You sort your seeds out into 10 different groups based on colour, size, and hardness. You believe these ones are the same. By the end you have a total of 10 groups with only a few seeds each. To avoid wasting seeds, you tell Crop-3PO to sequence only one from each group. You call these your samples.

### **🤖 Crop-3PO:** I know lots of genes and sequences! Here is an example of what `cry1Ab` looks like:

In [81]:
# Here, we can see a gene sequence associated with cry1AB_1
genes_dict = SeqIO.to_dict(SeqIO.parse("genes.fa", "fasta"))
print(genes_dict["cry1Ab_1"].format("fasta"))

>cry1Ab_1
GACCCCACCAACCCAGCCCTGCGCGAGGAGATGCGCATCCAGTTCAACGACATGAACTCT
GCCCTGACCACCGCCATCCCACTCTTCGCTGTCCAGAACTACCAGGTCCCTCTCCTGTCT



### **🤖 Crop-3PO:** Here are all the genes I know, some I only know part of the sequence for. If it looks similar, you can act like they are the same as the trait. For instance `tNOS` can be considered `nos` for our purposes

In [23]:
#it will also be helpful for us to get a list of all the genes and samples we have
geneslist = [] 
geneslist.append(genes_dict.keys())
geneslist

[dict_keys(['cry1Ab_1', 'tNOS', 'cry1Ac', 'Cry2Ab2', 'Epsps_1', 'Prsv_cp_1', 'CEL1', 'bar', 'pat'])]

### **🤖 Crop-3PO:** Here is a list of all the samples I successfully sequenced!

In [24]:
# list of our total samples
sampleslist = [] 
sampleslist.append(samples_dict.keys())
sampleslist

[dict_keys(['sample1', 'sample2', 'sample3', 'sample4', 'sample5', 'sample6', 'sample7', 'sample8', 'sample9', 'sample10'])]

### **🤖 Crop-3PO:**

In [26]:
#let's look at some of the information in our first sample
samples_dict = SeqIO.to_dict(SeqIO.parse("samples.fa", "fasta"))
print(samples_dict["sample1"].format("fasta"))

>sample1
GTCTGTGGTTGCTGTTATAGGCCTTCCAAACGATCCATCTGTTAGGTTGCATGAGGCTTT
GGGATACACAGCCCGGGGTACATTGCGCGCAGCTGGATACAAGCATGGTGGATGGCATGA
CCTTTGGGTCACGATCTCCCACCTTACTGGAATTTAGTCCCTGCTATAATTTGCCTTGCA
TATAAGTTGCGTTACTTCAGCGTCCTAACCGCACCCTTAGCACGAAGACAGATTTGTTCA
TTCCCATACTCCGGCGTTGGCAGGGGGTTCGCATGTCCCACGTGAAACGTTGCTAAACCC
TCAGGTTTCTGAGCGACAAAAGCTTTAAACGGGAGTTCGCGCTCATAACTTGGTCCGAAT
GCGGGTTCTTGCATCGTTCGACTGAGTTTGTTTCATGTAGAACGGGCGCAAAGTATACTT
AGTTCAATCTTCAATACCTCGTATCATTGTACACCTGCCGGTCACCACCCGACCCCACCA
ACCCAGCCCTGCGCGAGGAGATGCGCATCCAGTTCAACGACATGAACTCTGCCCTGACCA
CCGCCATCCCACTCTTCGCTGTCCAGAACTACCAGGTCCCTCTCCTGTCTAACGATGTGG
GGACGGCGTTGCAACTTCGAGGACCTAATCTGACCGACCTAGATTCGGCACTGTGGGCAA
TATGAGGTATTGGCAGACACCCAGTGCCGAACAACACCTGACCTAACGGTAAGAGAGTCT
CATAATGCGTCCGGCCGCGTGCCCAGGGTATATTTGGACAGTATCGAATGGACTGAGATG
AACCTTTACACCGATCCGGAAACGGGTGCGTGGATTAGCCAGGAGCAAACGAAAAATCCT
GGGCTACTTGATGTCTTGTGACGTTCTTAGAGATGGACGAAATGTTTCACGACCTAGGAT
AAGGTCGCCCTACAAAATAGGCCAAGACTTTCTTTAGTCCGCTGATGGGACACTATATGA
AAAGCGTTCTAAGCA

### **🤖 Crop-3PO:** That's probably pretty difficult for a human to read. Luckily, as a machine I can very quickly match the sequences of our genes and tell you whether a sample is likely to have the gene or not. Let me try doing this, and you can see my output down below.

In [98]:
# now we're going to match the genes to each sample
for sample in samples_dict: 
    x.append(samples_dict[sample].name)
    y.append(samples_dict[sample].seq)
    temp2 = []
    for gene in genes_dict:
        temp1 = []
        if genes_dict[gene].seq in samples_dict[sample].seq: 
            temp1.append(genes_dict[gene].name)
            temp2.append(temp1)
    print("🔬\033[1;36m", samples_dict[sample].name, "\033[1;0m contains matches for \033[1;32m", temp2, "\033[1;0m")

🔬[1;36m sample1 [1;0m contains matches for [1;32m [['cry1Ab_1'], ['Prsv_cp_1'], ['bar'], ['pat']] [1;0m
🔬[1;36m sample2 [1;0m contains matches for [1;32m [['CEL1'], ['pat']] [1;0m
🔬[1;36m sample3 [1;0m contains matches for [1;32m [['cry1Ab_1'], ['Prsv_cp_1'], ['CEL1']] [1;0m
🔬[1;36m sample4 [1;0m contains matches for [1;32m [['cry1Ab_1']] [1;0m
🔬[1;36m sample5 [1;0m contains matches for [1;32m [['Cry2Ab2'], ['Epsps_1']] [1;0m
🔬[1;36m sample6 [1;0m contains matches for [1;32m [['Cry2Ab2'], ['Epsps_1']] [1;0m
🔬[1;36m sample7 [1;0m contains matches for [1;32m [['cry1Ab_1'], ['cry1Ac']] [1;0m
🔬[1;36m sample8 [1;0m contains matches for [1;32m [['tNOS'], ['Epsps_1'], ['pat']] [1;0m
🔬[1;36m sample9 [1;0m contains matches for [1;32m [['tNOS'], ['bar'], ['pat']] [1;0m
🔬[1;36m sample10 [1;0m contains matches for [1;32m [['cry1Ac'], ['Prsv_cp_1'], ['bar']] [1;0m


### **🤖 Crop-3PO:** Pretty cool trick huh? Now all that's left to do is try to figure out which ones are useful for us. You'll probably want to look at the genes in each sample and think about which traits they're associated with, and whether or not that is useful for us to have here on Mars. 

In [90]:
# trait data narrowed down to our genes

# 📚 Based on the information available to us, what traits does each sample have?

Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

# 📚 Assuming the samples are the same plant (e.g. all are Corn, etc), which two crops would you try to cross first to achieve a "stacked" trait? 

### A stacked trait is a trait that has multiple benefits.

Fill in the boxes marked with ✏️ by double clicking on it and filling in your answer. Remember to save your answers by clicking `File > Save and Checkpoint`!

✏️

✏️

✏️

# References 
### The data used in this notebook was obtained from the following sources:
https://www.tandfonline.com/doi/full/10.1080/21645698.2015.1056972

ISAAA's GM Approval Database. http://www.isaaa.org/gmapprovaldatabase/

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)