Skip to content

Assignment#2

Ruth Isserlin edited this page Mar 1, 2020 · 4 revisions

Assignment 2 - Differential Gene expression and Preliminary ORA

Table of Contents

Overview

  • Remember to document your work in your journal concurrently with your progress. Journal entries that are uploaded in bulk at the end of your work will not be considered evidence of ongoing engagement.
  • Your task will involve writing an R Notebook. Place your notebook code in your bcb-420 github repo in the bcb420-2020 organization and link to the Notebook from your Journal.
  • Your work must be completed by 20:00 on the due date.
Your task is to take the normalized expression data that was created in Assignment #1 and rank your genes according to differential expression. Once your list is ranked you will perform thresholded over-representation analysis to highlight dominant themes in your top set of genes.

Include a brief introduction section with a summary of the normalization and results done in the first assignment. Assume that the person reading the report has not read your assignment #1 report. Including basic statistics from that analysis will be helpful. (for example, data downloaded from GEO with id X, …)

Differential Gene Expression

Conduct differential expression analysis with your normalized expression set from Assignment #1. Define your model design to be used to calculate differential expression - revisit your MDS plot from Assignment #1 to demonstrate your choice of factors in your model.

  1. Calculate p-values for each of the genes in your expression set. How many genes were significantly differentially expressed? What thresholds did you use and why?
  2. Multiple hypothesis testing - correct your p-values using a multiple hypothesis correction method. Which method did you use? And Why? How many genes passed correction?
  3. Show the amount of differentially expressed genes using an MA Plot or a Volcano plot. Highlight genes of interest.
  4. Visualize your top hits using a heatmap. Do you conditions cluster together? Explain why or why not.
  • Make sure all your figures have proper heading and labels. Every figure included in the report should have a detailed figure legend

Thresholded over-representation analysis

With your significantly up-regulated and down-regulated set of genes run a thresholded gene set enrichment analysis

  1. Which method did you choose and why?
  2. What annotation data did you use and why? What version of the annotation are you using?
  3. How many genesets were returned with what thresholds?
  4. Run the analysis using the up-regulated set of genes, and the down-regulated set of genes separately. How do these results compare to using the whole list (i.e all differentially expressed genes together vs. the up-regulated and down regulated differentially expressed genes separately)?
  • Present your results with the use of tables and screenshots. All figures should have appropriate figure legends.
  • If using figures create a figures directory in your repo and make sure all references to the figures are relative in your Rmarkdown notebook.

Interpretation

The most important aspect of the analysis is relating your results back to the initial data and question.

  1. Do the over-representation results support conclusions or mechanism discussed in the original paper?
  2. Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your results.

What to hand in

Name your Rmarkdown file A2_<yourname></yourname>.Rmd and place it in the root directory of your repo.

This report should be well organized and easy to read. Make sure to include the following elements:

  • Student name and title
  • Data source used for the analysis
  • Introduction to the data used. Include figures to describe the data. If using figures from the paper make sure to cite them.
  • Table of contents
  • Headers
  • References - reference all packages that you used!
Utilize rmarkdown features to make the report more readable.

All of your work should be written out as an RNotebook. Explanations of each step should be outlined using rmarkdown in the notebook interspersed with the code needed to perform the described task. Make sure all the questions listed in the interpretation section are addressed.

  • knit your RNotebook to html.
    • submit your html notebook to quercus.
    • create link to your html notebook on the readme of your repository.
    • Add the link to the student wiki.
We will be pulling your repos and compiling your notebooks as well. Make sure that all additional packages not in the base docker image are checked for and installed if needed.

Before handing in your assignment test that it will compile with our docker by running the following command:

docker run --rm -it -v "$(pwd)":/home/rstudio/projects --user rstudio risserlin/bcb420-base-image /usr/local/bin/R -e "rmarkdown::render('/home/rstudio/projects/name_of_rmd.Rmd',output_file='/home/rstudio/projects/name_of_html.html')" > processing_output_filename

Further reading, links and resources

  • Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C,Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for benchmarking gene set enrichment analysis. Brief Bioinform. 2020 Feb 6 [PMID](https://www.ncbi.nlm.nih.gov/pubmed/32026945)