# Lab 6: Immune Response Checkpoints

Today, we will be looking at a couple effector genes that play a crucial role in the inhibition or activation of T cells. By the end of this lab, you will know the structure and function of some of these genes.

This lab, as well as following labs, will use the datascience API. For more information about the datascience Table API, see http://data8.org/datascience/tutorial.html#getting-started.

In [26]:
# imports
from datascience import Table
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('fivethirtyeight')
from sklearn.cluster import KMeans

First, we will load time series data from a subset of the immgen database. This dataset includes the relative expression of 40 genes over 100 days. Many of these genes can be classified into different categories. Two of these categories, which we will see today, include 'naive and late memory' as well as 'effector response' genes.

First, let's load in the immgen dataset.

In [None]:
# load in data for part 2
table = Table.read_table('https://raw.githubusercontent.com/data-8/mcb-88-connector/gh-pages/data/lab2/immgen_timeseries.csv')

# Show the table
table

### Filtering data by Category

Let's first look at the name of the genes that are in the 'Naive/late memory' category.

In [None]:
## filter by category

## example: filtered category for naive and late memory cells
table.where('Category', 'Naive/late memory')

Naive and late memory cells include genes with the highest expression in naive and memory CD8+ T cells. Some of genes encode molecules suspected to have roles in suppressing the immune response, such as Cnr2.


Today, however, we will be looking at effector response genes. These genes are upregulated shortly after differentiation from naive cells, but remain highly expressed compared to naive cells after initial upregulation. Some of these genes encode for early effector molecules.

<div style="color:red">** Question 1**:</div> 
Using the table 'table', filter out genes in the Category of 'effector response'.

In [15]:
## Answer here:
effectors = ...

<div style="color:red">** Question 2**:</div> 
How many cells are there in the effector response category in the table? (Note: this is not inclusive, but a small subset of effector genes)

In [6]:
# You will need to use code to answer this question


## Missing Data
Now we will graph the expression over time for all effector response genes. But we are missing values. Missing values are notated in the table by by 'NaN', which stands for 'Not a Number'. If you look at the table for effector response genes, there are 'nan' values in the table. Let's look at where we are missing values.

<div style="color:red">** Question 3**:</div> In the table of effector response genes, for what times are we missing data?

In [8]:
# Run any code you want for this question


<div style="color:red">Answer here:</div>


<div style="color:red">** Question 4**:</div> In the table of effector response genes, for what genes are values missing? 

In [27]:
# Run any code you want for this question


<div style="color:red">Answer here:</div>

## Plotting Gene Expression over Time
Now, let's graph the time series data for effector response genes. 

In [None]:
## prepare data to plot

# convert to pandas dataframe
effectors_df = effectors.to_df()

# drop columns that will not be plotted
# transpose data matrix
data = effectors_df.drop(effectors_df.columns[[0, 1, 3]], axis=1).transpose()

# replace headers with gene names
data.columns = data.iloc[0]
data = data[1:]

# show results!
data

In [None]:
# now, plot the data
data.plot()

TODO: ask question about molecule presense (for eukayotic cells)

### Something is wrong with the plot!
You may notice that in the chart above, Ctla4 and II2 have cut off their line. This is because of missing data. We will impute these missing values **for visualization purposes only**.

In [None]:
# fill in missing values with column means

## get column means
means = data.mean()
print(means)


## fillna is a pandas dataframe function that fills empty values using a specific method.
## For our purposes, we will fill empty values by the mean expression for a given gene.
filled = data.fillna(means)
filled

As you can see in the resulting table shown above, we no longer have missing values. Now, let's plot these imputed values.

<div style="color:red">** Question 5**:</div> Plot the imputed data. *Hint:* Look at how we plotted the data in one of previous cells.

In [None]:
## Answer here:


Using this plot, answer the following questions:

<div style="color:red">** Question 6**:</div>At what point in time to all genes have the highest expression?

<div style="color:red">Answer here:</div>


<div style="color:red">** Question 7**:</div> As stated, this plot shows RNA expression over time. However, increased presence of the molecule a gene encodes may come at a different time from gene expression. Do you expect increased presence of a molecule **before**, **during** or **after** the increased expression of the gene that encodes it?

<div style="color:red">Answer here:</div>


## Exploring CTLA4
Now, let's explore the CTLA4 gene in more depth.

1. Navigate to http://www.genecards.org/cgi-bin/carddisp.pl?gene=CTLA4.
2. Scroll to the section labeled 'Summaries'. Read this section and answer the following questions.


<div style="color:red">** Question 8**:</div>Does the protein that CTLA4 encodes inhibit or activate T cell responses?

<div style="color:red">Answer here:</div>


<div style="color:red">** Question 9**:</div> From this summary, what are the names of two ligands which CTLA4 binds to? (Remember these names, you will be using them in the net parts of the lab)

<div style="color:red">Answer here:</div>


## Exploring the CTLA4 Gene Network

Now, we will visually explore genes that bind to CTLA4.

1. Navigate to http://string-db.org/cgi/network.pl?taskId=QqmR370Wjf30
Here, you will see a network of genes, including CTLA4. Edges between genes demonstrate how the genes are connected. 

2. Click on the network. You can move genes around and click on them.

The genes you wrote down as an answer to **Question  9** should appear in this network. Click on the first gene.

<div style="color:red">** Question 10**:</div> What is the name and function of the **first** gene? What two molecules does it bind to?

<div style="color:red">Answer here:</div>


<div style="color:red">** Question 11**:</div>What is the name and function of the **second** gene? What two molecules does it bind to?

<div style="color:red">Answer here:</div>


<div style="color:red">** Question 12**:</div> Scroll down below the network to view the color codings for the graph edges. What is the meaning of each edge color that is expressed between CTLA4 and the genes found in **Question  9**?

<div style="color:red">Answer here:</div>


<div style="color:red">** Question 13**:</div>Based on what you have read about CTLA4, infer whether binding of the two molecules CD80 and CD86 inhibits or activates T cell differentiation.

<div style="color:red">Answer here:</div>


## Exploring CD28
As you may have seen in the gene network of CTLA4, CD28 was another gene that seems to be connected to CTLA4. 
Now, let's explore CD28 in more depth.

1. Navigate to http://www.genecards.org/cgi-bin/carddisp.pl?gene=CD28&keywords=CD28.
2. Scroll to the section labeled 'Summaries'. Read this section and answer the following questions.

<div style="color:red">** Question 14**:</div>Does the protein that CD28 encodes inhibit or activate T cell responses?

<div style="color:red">Answer here:</div>


Scroll down to the section labeled 'Proteins for CD28 Gene'. Read this section and answer the following question: 

<div style="color:red">** Question 15**:</div>What other proteins does CD28 bind to? Are these the same as the proteins that bind to CTLA4?

<div style="color:red">Answer here:</div>


## Protein Stucture for CTLA-4 and CD28

Now, we will view the 3-D structure of CTLA-4 and CD28. 

### Protein Structure for CTLA-4
Navigate to https://www.ncbi.nlm.nih.gov/Structure/pdb/5GGV. Here, you can click on the left bottom corner of the protein structure image. This will take you to a new window that allows you to click and explore the protein.

<div style="color:red">** Question 16**:</div> 
What do the pink, blue and brown colors represent?

<div style="color:red">Answer here:</div>


### Protein Structure for CD28
Navigate to https://www.ncbi.nlm.nih.gov/Structure/pdb/1YJD. Here, you can click on the left bottom corner of the protein structure image. This will take you to a new window that allows you to click and explore the protein.


<div style="color:red">** Question 17**:</div> 
What molecule names do both CTLA-4 and CD8 share in the structure?

<div style="color:red">Answer here:</div>
