First you need to link your Google Drive to the notebook in order to access the files needed for this module.

Run the cell below and follow instructions to mount the drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Installing Biopython

At the beginning of each module, we will install **Biopython**. Biopython is a large open-source application programming interface (API) used in both bioinformatics software development and in everyday scripts for common bioinformatics tasks. It contains several packages that you will need to import which will allow you to run the analyses required for this project. 

REF:
* Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., & de Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics (Oxford, England), 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163


In [None]:
!pip install biopython

## KEGG Search for Metabolic Pathways

**KEGG** (Kyoto Encyclopedia of Genes and Genomes) is a website is used for
accessing metabolic pathways. At this website, you can search a process, gene, protein, or metabolite and obtain diagrams of all the metabolic pathways
associated with your query. 

REF:    
* http://www.genome.ad.jp/kegg/
* https://widdowquinn.github.io/2018-03-06-ibioic/02-sequence_databases/09-KEGG_programming.html


## Install and import the necessary packages from BioPython, Python and IPython:

**ReportLab** is an open-source library fpr generating PDFs and graphics.

**Image** is a class that is part of the **display** submodule of **IPython** (interactive Python). It allows for images to be displayed in the notebook.

**Bio.KEGG** is a BioPython module with code to work with data from the KEGG database. **REST** is a submodule that makes navigating KEGG easier.

**Pandas** is a Python library that is used to analyze data. It will help create dataframes (tables of data) to store the data obtained from KEGG.

**io** is a module part of the Python standard library that will allow usto manage files.

REF:
* https://pypi.org/project/reportlab/
* https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html
* https://biopython.org/docs/dev/api/Bio.KEGG.html
* https://www.w3schools.com/python/pandas/default.asp
* https://www.journaldev.com/19178/python-io-bytesio-stringio

In [None]:
!pip install reportlab

In [None]:
# Show images inline
from IPython.display import Image

# Import Biopython modules to interact with KEGG
from Bio.KEGG import REST

# Import Pandas, so we can use dataframes
import pandas as pd

# Import io to manage files
import io

In [None]:
# This code that will help us display the PDF output
def PDF(filename):
    return HTML('<iframe src=%s width=700 height=350></iframe>' % filename)

# This code returns a Pandas dataframe, given tabular text
def to_df(result):
    return pd.read_table(io.StringIO(result), header=None)

In [None]:
# Find a specific entry with a precise search term 

# Use the gene's identification number obtained in Notebook #3

# Store results in a variable called result
result = REST.kegg_find('genes', '3156').read()

# This just lets us see more of the columns
pd.set_option('display.max_colwidth',10000)

# Display results as a datatframe (df)
to_df(result)

Write down the code in the second column that corresponds to HMGCR below.

Answer here

In [None]:
# Get the entry information	with the ID in the previous cell

# Store it in a variable called result2
#### = REST.kegg_get('####').read()

# Display the results
print(####)

From PATHWAY above, select Terpenoid backbone biosynthesis and annotate the code before it:


Answer here

In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites, which are modified by a sequence of chemical reactions catalyzed by enzymes.

In the next cell, you will obtain a chart of **Terpenoid backbone biosynthesis**. 

Terpenoids, also known as isoprenoids, are a large class of natural products consisting of isoprene (C5) units. There are two biosynthetic pathways, the mevalonate pathway and the non-mevalonate pathway or the MEP/DOXP pathway, for the terpenoid building blocks.: isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). The action of prenyltransferases then generates higher-order building blocks: geranyl diphosphate (GPP), farsenyl diphosphate (FPP), and geranylgeranyl diphosphate (GGPP), which are the precursors of monoterpenoids (C10), sesquiterpenoids (C15), and diterpenoids (C20), respectively. Condensation of these building blocks gives rise to the precursors of sterols (C30) and carotenoids (C40). The MEP/DOXP pathway is absent in higher animals and fungi, but in green plants the MEP/DOXP and mevalonate pathways co-exist in separate cellular compartments. The MEP/DOXP pathway, operating in the plastids, is responsible for the formation of essential oil monoterpenes and linalyl acetate, some sesquiterpenes, diterpenes, and carotenoids and phytol. The mevalonate pathway, operating in the cytosol, gives rise to triterpenes, sterols, and most sesquiterpenes.

In [None]:
# Get map of Terpenoid backbone biosynthesis

# Input the ID for the Terpenoid backbone biosynthesis preceded by hsa
# Store results in variable called result3

#### = REST.kegg_get('ID HERE', 'image').read()

# Display image
Image(####)


## Answer the following questions:
Input your answer in the cell below each question and press SHIFT+ENTER.


1. What metabolite is directly reduced by inhibiting HMG-CoA Reductase and what other pathways (besides cholesterol biosynthesis) does this affect?
 

Answer here