# How Cells Divide


In [None]:
# imports
from datascience import Table
import matplotlib
matplotlib.use('Agg')
from datascience import Table
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('fivethirtyeight')

In this lab, we will be looking and RNA-seq data from three phases of cell division: G1, G2 and M phases.

The data in this lab is from "Regulation of mRNA translation during mitosis by Tanenbaum et al". This paper can be found at https://elifesciences.org/articles/07957. Data for this lab was taken from GEO and can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67902.


First, let's load in expression data for multiple genes during different phases in the cell division process.

In [None]:
# Read in csv file into a table
table = Table.read_table('https://raw.githubusercontent.com/data-8/mcb-88-connector/gh-pages/data/lab5/GSE67902_Supplemental_table_S1_mRNA_analysis.csv')

table

This table contains gene expression quantifications for about 9800 genes in the G2, M and G1 phases of cell division.

<h2 style="color:red">** Question 1**</h2> 
What does M, G1 and G2 stand for? Breifly describe each of these three phases.


## <span style="color:red">Student Answer</span>

stands for M


<h2 style="color:red">** Question 2**</h2> 
Name and explain the 4 basic phases of mitosis. Here is a reference describing the 4 phases: https://www.khanacademy.org/science/biology/cellular-molecular-biology/mitosis/a/phases-of-mitosis

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

## Looking at the data

Notice that the table has 6 columns with numerical data. The first 3 numeric columns are RPKM values. The next three columns are total read counts.

<h2 style="color:red">** Question 3**</h2> 

What does RPKM stand for? How might RPKM values differ from the total read counts that are also found in the data table? (Hint: If you are unsure of the relationship between RPKM and total read counts, you may want to look up the definition of RPKM)

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

<h2 style="color:red">** Question 4**</h2> 

Why might we prefer using RPKM values, instead of total read counts, when comparing gene expression values from RNA-seq data?

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

## Genes Upregulated in the G1 Checkpoint

Next, we will find all genes that are up-regulated in the G1 checkpoint. To do this, we will calculate the ratio of RPKM in G1 compared to the M phase, and filter by genes where the fraction G1/M > 2. In this filter, we are searching for genes that are overexpressed in G1 phase, relative to M phase.

In [None]:
# append a column with the ratio G1 RPKM/M RPKM
table.append_column('G1/M', table['G1 mRNA RPKM']/table['M mRNA RPKM'])

# now, filter the table by sites where this ratio is > 2
filteredG1 = table.where(table['G1/M'] > 2)

filteredG1

<h2 style="color:red">** Question 5**</h2> 

Why might we be filtering for genes a ratio of G1/M > 2, instead of 1, to discover genes upregulated in G1 phase?

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

<h2 style="color:red">** Question 6**</h2> 

For how many genes is G1 expression greater than in the M phase? (Hint: Count the rows in the filtered table we produced above)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

<h2 style="color:red">** Question 7**</h2> 

Run a similar comparison analysis to the one above for G2 phase: find all of the genes that are up-regulated in G2 phase, compared to M phase. Print the number of genes that are up-regulated in G2 phase, relative to M phase.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Finding similarities in gene expression in the G1 and G2 Phases

Next, we will find the intersection of genes that are up-regulated in both the G1 and G2 phases, relative to the M phase. To do this, we will use the intersect() function defined below. This function takes in 2 parameters a and b, which are both lists.

In [None]:
def intersect(a, b):
    """ return the intersection of two lists """
    return list(set(a) & set(b))

As an example, say we have two lists, geneList1 and geneList2. We can use this function to calculate the intersection of the two lists:

In [None]:
geneList1 = ["TP53", "TRIM52", "RRAD"]
geneList2 = ["TP53", "TRIM52", "DUSP5"]

intersect(geneList1, geneList2)

This means that there are two genes that appear in both lists: TRIM52 and TP53.

<h2 style="color:red">** Question 8**</h2> 

Using the intersect() function, find the intersection of genes that are up-regulated in both G1 and G2 phases, relative to the M phase.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

<h2 style="color:red">** Question 9**</h2> 

Notice that the gene CDKN1A is in this intersecting list you just produced above. Go to genecards (http://www.genecards.org/) and search for gene CDKN1A. Give a short description of CDKN1A's role in cell division checkpointing, and any iteractions it may have with p53 gene.


---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

<h2 style="color:red">** Question 10**</h2> 
Now, on genecards (http://www.genecards.org/) search for TP53 and navigate to the section called 'Function'. Read this section. According to this paragraph, is p53 important in G1, G2, or both stages?

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

<h2 style="color:red">** Question 11**</h2> 
Now, filter the original data table for the TP53 gene, and observe the expression in the G1, G2 and M phases. Does this data for TP53 support or contradict TP53's role in G1 and G2 described in the problem above? Explain.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

** Explanation here:**

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---

<h2 style="color:red">** Question 12**</h2> 
Given what you know about G1 and G2 phases, how might you expect the RNA expression of checkpoint genes to change in cancer cells?

---
## <span style="color:red">Student Answer</span>

*Double-click and add your answer between the lines*

---