First you need to link your Google Drive to the notebook in order to access the files needed for this module.

Run the cell below and follow instructions to mount the drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Installing Biopython

At the beginning of each module, we will install **Biopython**. Biopython is a large open-source application programming interface (API) used in both bioinformatics software development and in everyday scripts for common bioinformatics tasks. It contains several packages that you will need to import which will allow you to run the analyses required for this project. 

REF:
* Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., & de Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics (Oxford, England), 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163


In [None]:
!pip install biopython

# Investigating the biological impact of the mutation and its possible role in human disease
For this section, your research will focus on investigating the biological impact of the mutation you are studying. To do this, you will use the OMIM and KEGG databases.

## OMIM Search for information on genetic diseases

The **OMIM** (Online Mendelian Inheritance of Man) database contains short, referenced reviews about genetic loci and genetic diseases. It
can be a very useful resource for finding out what type of research has been done on a gene or a disease.

REF:
* http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

## Install and import the necessary packages:

The **romim package** was created to query the OMIM database but it runs in R. 

**R** is another programming language so you will need to install **rpy2** to run R code in Google Colab.

**Methods** and **remotes** are R packages that help us both install the package and use the functions in the code.

The **XML** package will be used to read the results that you obtain from your database searches. 

REF:
* https://github.com/davetang/romim

In [None]:
# Install the rpy2 interface to run R code
%load_ext rpy2.ipython

In [None]:
%%R # This must precede all R code in Colab, to allow R code to run 

# Installing the main package
# Note how different it is from Python code
remotes::install_github('davetang/romim')

# Import the library associated with the package
library(romim)

# Intalling several packages
install.packages('XML')
install.packages('methods')
install.packages("remotes")

# Press 1 and ENTER when prompted

## Obtaining the ID number (called mim number) associated with the SOD1 gene entry in OMIM

In [None]:
%%R # This must precede all R code in Colab, to allow R code to run 

# To access OMIM, we will use this key which will work as our password to access the database
set_key('4PUvWRqSSD2BuprIVAP_VQ') 

# Write SOD1 in the parenthesis below to obtain the entry for SOD1
gene_to_omim('###', show_query=TRUE)

Now, the link above is a file with our information. Let's download and parse it.

In [None]:
%%R
# Saving the file:

# Paste the url between the quotes below
url <- '#####'


# Set destination (write the file path and a file name followed by .xml)
destfile <- '/content/drive/MyDrive/Colab_Notebooks/sod_project_files/FILE NAME HERE.xml'

# Download the file
download.file(url, destfile)


In [None]:
%%R
# Parse the file for readability
# Find the file in your Drive and copy and paste the path where it says WHERE IS THE FILE? Leave the quotation marks!

result <- xmlParse(file = 'WHERE IS THE FILE?')

# Read the file
read <- read_xml('WHERE IS THE FILE?' )

# Find the mim number
num <- xml_find_all(read, ".//mimNumber")

# Display the number
num

Write down the mim number for the entry from the results above.

Answer here

## Using OMIM to obtain more information about the gene
This time you will search the OMIM 'SOD1' entry for information.

The function 'get_omim' helps you do just that since we can set certain arguments to 'TRUE' and obtain specific information about the entry.

Run the next cell to see a list of Arguments that you can access.

In [None]:
%%R
help(get_omim)

# Obtain information from the entry

In [None]:
# Search OMIM again to obtain description information
%%R
set_key('KEY HERE')

# Using mim number to get the entry, set 'text' argument to true
omim_result <- get_omim(###, text = ###)

# Save the xml to a file 
saveXML(omim_result, file='FILE NAME HERE.xml')

In [None]:
#@title Load the results by providing the file name in this form (include file extension .xml)

# MAKING RESULTS LOOK GOOD
import xml.etree.ElementTree as ET
import csv
import pandas as pd

file_name = "" #@param {type:"string"}

tree = ET.parse(file_name)
root = tree.getroot()
 
Ref_data4 = open('refdata4.csv', 'w')
 
csvwriter = csv.writer(Ref_data4)
allele_head = []

 
count = 0
for member in root.findall('.//textSection'):
    allele = []
    ref_list = []

    if count == 0:
      des = member.find('.//textSectionTitle').tag
      allele_head.append(des)
      
      mut = member.find('.//textSectionContent').tag
      allele_head.append(mut)
     
      csvwriter.writerow(allele_head)
      count = count + 1
       
    des = member.find('.//textSectionTitle').text
    allele.append(des)
    
    mut = member.find('.//textSectionContent').text
    allele.append(mut)

    csvwriter.writerow(allele)
  
Ref_data4.close()

data4= pd.read_csv("refdata4.csv")
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

data4

## Answer the following questions:
Input your answer in the cell below each question and press SHIFT+ENTER.

Read the section titled **Description**.

1. What metal does SOD2 contain?  


Answer here

2. Where is SOD2 found in cells?



Answer here

Within the section **Animal Model**, search for the paragraph about the H46R mutation (Hint:  look for “Liu et al.” at the start of the paragraph.) You can use CTRL+F on your machine.

3. What did Liu et al. (2000) find out about the copper binding site in the H46R mutant?


Answer here


### Now read information about allelic variants 
An allele is a variant of a gene were the DNA sequence differs between two or more variants. 

Allelic variation describes the presence or number of different allele forms at a particular locus (locus or loci = place) on a chromosome.

REF:  
* https://warwick.ac.uk/fac/sci/lifesci/research/vegin/geneticimprovement/diversitycollection/allelicvariation/


In [None]:
# Search OMIM again but with SOD2 mim Number to obtain description info
%%R
set_key('KEY HERE')

omim_result <- get_omim(###, allelicVariantList = ###)

saveXML(omim_result, file='NAME OF FILE.xml')

### Display the results in the form of a table

In [None]:
#@title Load the results by providing the file name in this form (include file extension .xml)

# MAKING RESULTS LOOK GOOD
import xml.etree.ElementTree as ET
import csv
import pandas as pd

file_name = "" #@param {type:"string"}

tree = ET.parse(file_name)
root = tree.getroot()
 
Ref_data4 = open('refdata4.csv', 'w')
 
csvwriter = csv.writer(Ref_data4)
allele_head = []

 
count = 0
for member in root.findall('.//allelicVariant'):
    allele = []
    ref_list = []

    if count == 0:
      des = member.find('.//mutations').tag
      allele_head.append(des)
      
      mut = member.find('.//text').tag
      allele_head.append(mut)
     
      csvwriter.writerow(allele_head)
      count = count + 1
       
    des = member.find('.//mutations').text
    allele.append(des)
    
    mut = member.find('.//text').text
    allele.append(mut)

    csvwriter.writerow(allele)
  
Ref_data4.close()

data4= pd.read_csv("refdata4.csv")
pd.set_option('display.max_colwidth',10000)

data4



## Answer the following questions:
Input your answer in the cell below each question and press SHIFT+ENTER.

4. How does the mean survival after disease onset differ for patients with ALS who have the H46R SOD1 mutations from patients with other types of mutations?



Answer here

5. What did the Aoiki et al. 1993 article report about why the H46R SOD1 enzyme is less active?

Answer here

## Now search for information about ALS in the OMIM database

In [None]:
%%R # This must precede all R code in Colab, to allow R code to run 

# To access OMIM, we will use this key which will work as our password to access the database
set_key('KEY HERE') 

# First lets get a list of the entries associated with SOD2
# Write SOD2 in the parenthesis below to create a list of the entries
my_list <- gene_to_omim('####')

# Now lets obtain the mim number (with get_omim) and list it with our entries
my_list_omim <- sapply(my_list, get_omim)

# This will append the title of the entry to the list
sapply(my_list_omim, get_title)

Write down the ID number for 'AMYOTROPHIC LATERAL SCLEROSIS' from the results above. (ID precedes the entry title)

Answer here

## Using OMIM to obtain more information about the disease
This time you will search the OMIM 'AMYOTROPHIC LATERAL SCLEROSIS' entry for information.

### Start by setting 'text' to TRUE

In [None]:
%%R
set_key('####') # The key must be added before every request

# Using mim number to get the article list
# Write the mim number inside the parenthesis and set 'text' to 'TRUE'
omim_result <- get_omim(### , ### = ###)

# Save the results as an XML file
saveXML(omim_result, file='NAME OF FILE.xml') # Write a file name

# File name will display was output for this cell


In [None]:
#@title Load the results by providing the file name in this form (include file extension .xml)

# MAKING RESULTS LOOK GOOD
import xml.etree.ElementTree as ET
import csv
import pandas as pd

file_name = "" #@param {type:"string"}

tree = ET.parse(file_name)
root = tree.getroot()
 
Ref_data5 = open('refdata4.csv', 'w')
 
csvwriter = csv.writer(Ref_data5)
allele_head = []

 
count = 0
for member in root.findall('.//textSection'):
    allele = []
    ref_list = []

    if count == 0:
      des = member.find('.//textSectionTitle').tag
      allele_head.append(des)
      
      mut = member.find('.//textSectionContent').tag
      allele_head.append(mut)
     
      csvwriter.writerow(allele_head)
      count = count + 1
       
    des = member.find('.//textSectionTitle').text
    allele.append(des)
    
    mut = member.find('.//textSectionContent').text
    allele.append(mut)

    csvwriter.writerow(allele)
  
Ref_data5.close()

data5= pd.read_csv("refdata4.csv")
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

data5

## Answer the following questions:
Input your answer in the cell below each question and press SHIFT+ENTER.

Read the section titled **Clinical features**.

6. What distinguishes Type 1, Type 2, and Type 3 forms of ALS?


Answer here