First you need to link your Google Drive to the notebook in order to access the files needed for this module.

Run the cell below and follow instructions to mount the drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Installing Biopython

At the beginning of each module, we will install **Biopython**. Biopython is a large open-source application programming interface (API) used in both bioinformatics software development and in everyday scripts for common bioinformatics tasks. It contains several packages that you will need to import which will allow you to run the analyses required for this project. 

REF:
* Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., & de Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics (Oxford, England), 25(11), 1422â€“1423. https://doi.org/10.1093/bioinformatics/btp163


In [None]:
!pip install biopython/

# Investigating the biological impact of the mutation and its possible role in human disease
For this section, your research will focus on investigating the biological impact of the mutation you are studying. To do this, you will use the OMIM and KEGG databases.

## OMIM Search for information on genetic diseases

The **OMIM** (Online Mendelian Inheritance of Man) database contains short, referenced reviews about genetic loci and genetic diseases. It
can be a very useful resource for finding out what type of research has been done on a gene or a disease.

REF:
* http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

## Install and import the necessary packages:

The **romim package** was created to query the OMIM database but it runs in R. 

**R** is another programming language so you will need to install **rpy2** to run R code in Google Colab.

**Methods** and **remotes** are R packages that help us both install the package and use the functions in the code.

The **XML** package will be used to read the results that you obtain from your database searches. 

REF:
* https://github.com/davetang/romim

In [None]:
# Install the rpy2 interface to run R code
%load_ext rpy2.ipython

In [None]:
%%R # This must precede all R code in Colab, to allow R code to run 

# Installing the main package
# Note how different it is from Python code
remotes::install_github('davetang/romim')

# Import the library associated with the package
library(romim)

# Intalling several packages
install.packages('XML')
install.packages('methods')
install.packages("remotes")

# Press 1 and ENTER when prompted

## Obtaining the ID number (called mim number) associated with the HMG gene entry in OMIM

In [None]:
%%R # This must precede all R code in Colab, to allow R code to run 

# To access OMIM, we will use this key which will work as our password to access the database
set_key('4PUvWRqSSD2BuprIVAP_VQ') 

# Write HMGCR in the parenthesis below to obtain the entry for the HMG gene
gene_to_omim('HMGCR', show_query=TRUE)


Now, the link above is a file with our information. Let's download and parse it.

In [None]:
%%R
# Saving the file:

# Paste the url between the quotes below
url <- '####'

# Set destination (write the file path and a file name followed by .xml)
destfile <- '/content/drive/MyDrive/Colab_Notebooks/hmg_project_files/FILE NAME HERE.xml'

# Download the file
download.file(url, destfile)


In [None]:
%%R
# Parse the file for readability
# Find the file in your Drive and copy and paste the path
result <- xmlParse(file = 'WHERE IS THE FILE?')

# Read the file
read <- read_xml('WHERE IS THE FILE?' )

# Find the mim number
num <- xml_find_all(read, ".//mimNumber")

# Display the number
num

Write down the mim number for the entry from the results above.

Answer here

## Using OMIM to obtain more information about the gene
This time you will search the OMIM 'Hgm' entry for information.

The function 'get_omim' helps you do just that since we can set certain arguments to 'TRUE' and obtain specific information about the entry.

Run the next cell to see a list of Arguments that you can access.

In [None]:
%%R
help(get_omim)

# Obtain information from the entry

In [None]:
# Search OMIM again to obtain description information
%%R
set_key('KEY HERE')

# Using mim number to get the entry, set 'text' argument to true
omim_result <- get_omim(###, text = ###)

# Save the xml to a file
saveXML(omim_result, file="FILE NAME HERE.xml")

In [None]:
#@title Load the results by providing the file name in this form (include file extension .xml)

# MAKING RESULTS LOOK GOOD
import xml.etree.ElementTree as ET
import csv
import pandas as pd

file_name = "" #@param {type:"string"}

tree = ET.parse(file_name)
root = tree.getroot()
 
Ref_data4 = open('refdata4.csv', 'w')
 
csvwriter = csv.writer(Ref_data4)
allele_head = []

 
count = 0
for member in root.findall('.//textSection'):
    allele = []
    ref_list = []

    if count == 0:
      des = member.find('.//textSectionTitle').tag
      allele_head.append(des)
      
      mut = member.find('.//textSectionContent').tag
      allele_head.append(mut)
     
      csvwriter.writerow(allele_head)
      count = count + 1
       
    des = member.find('.//textSectionTitle').text
    allele.append(des)
    
    mut = member.find('.//textSectionContent').text
    allele.append(mut)

    csvwriter.writerow(allele)
  
Ref_data4.close()

data4= pd.read_csv("refdata4.csv")
pd.set_option('display.max_colwidth',10000)

data4

## Answer the following questions:
Input your answer in the cell below each question and press SHIFT+ENTER.

Read the sections titled **Gene Function**, **Biochemical Features**

1. What other role(s), if any, does HMG-CoA Reductase play besides cholesterol biosynthesis?



Answer here

2. If an inhibitor was found that irreversibly inhibited HMG-CoA Reductase, so that no cholesterol could be formed, would this be desirable?  Why or why not?



Answer here

3. Describe how statin molecules bind HMG-CoA Reductase.  Are statins competitive or non-competitive inhibitors? (Refer to https://chem.libretexts.org/Courses/University_of_Arkansas_Little_Rock/CHEM_4320_5320%3A_Biochemistry_1/05%3A_Michaelis-Menten_Enzyme_Kinetics/5.4%3A_Enzyme_Inhibition for a review, if needed.)



Answer here

4. Before the crystal structure was solved of statins bound to HMG-CoA Reductase, researchers already knew what type of inhibitor statins were by performing enzyme assays in the presence of statins and plotting the data in a Lineweaver-Burke plot.  Based on your answer in question 4, which of the following plots describes what the researchers found?



In [None]:
from IPython.display import IFrame
IFrame('https://drive.google.com/file/d/1NRe4pv2vR3l4sVlhA7LbtcjkDStOQcnr/preview', width=600, height=300)

Answer here

## Using OMIM to obtain information about Hypercholesterolemia
This time you will search the OMIM 'HYPERCHOLESTEROLEMIA' entry for information.

The function 'get_omim' helps you do just that since we can set certain arguments to 'TRUE' and obtain specific information about the entry.

Run the next cell to see a list of Arguments that you can access.

In [None]:
# Search OMIM again to obtain description information
%%R
set_key('KEY HERE')

# Using mim number to get the entry, set 'text' argument to true
omim_result <- get_omim(###, ### = ###)

# Save the xml to a file 
saveXML(omim_result, file='FILE NAME HERE.xml')

In [None]:
#@title Load the results by providing the file name in this form (include file extension .xml)

# MAKING RESULTS LOOK GOOD
import xml.etree.ElementTree as ET
import csv
import pandas as pd

file_name = "" #@param {type:"string"}

tree = ET.parse(file_name)
root = tree.getroot()
 
Ref_data4 = open('refdata4.csv', 'w')
 
csvwriter = csv.writer(Ref_data4)
allele_head = []

 
count = 0
for member in root.findall('.//textSection'):
    allele = []
    ref_list = []

    if count == 0:
      des = member.find('.//textSectionTitle').tag
      allele_head.append(des)
      
      mut = member.find('.//textSectionContent').tag
      allele_head.append(mut)
     
      csvwriter.writerow(allele_head)
      count = count + 1
       
    des = member.find('.//textSectionTitle').text
    allele.append(des)
    
    mut = member.find('.//textSectionContent').text
    allele.append(mut)

    csvwriter.writerow(allele)
  
Ref_data4.close()

data4= pd.read_csv("refdata4.csv")
pd.set_option('display.max_colwidth',10000)

data4

Read the **Description**, **Pathogenesis**, **Clinical Management**, and **Gene Therapy** sections. 

## Answer the following questions:##
Input your answer in the cell below each question and press SHIFT+ENTER.

5. What is the primary genetic defect in the disease?


Answer here

6. How is HMG-CoA Reductase related to the disease, familial hypercholesterolemia?



Answer here

7. Is your patient the only known case of someone with familial hypercholesterolemia not responding to statin therapy?  Explain your answer.


Answer here

8. What types of gene therapy have been tried? 


Answer here

9. Who are the authors for the article describing these experiments?


Answer here