-
Notifications
You must be signed in to change notification settings - Fork 0
Assignment#2
- Remember to document your work in your journal concurrently with your progress. Journal entries that are uploaded in bulk at the end of your work will not be considered evidence of ongoing engagement.
- Your task will involve writing an R Notebook. Place your notebook code in your bcb-420 github repo in the bcb420-2020 organization and link to the Notebook from your Journal.
- Your work must be completed by 20:00 on the due date.
Include a brief introduction section with a summary of the normalization and results done in the first assignment. Assume that the person reading the report has not read your assignment #1 report. Including basic statistics from that analysis will be helpful. (for example, data downloaded from GEO with id X, …)
Conduct differential expression analysis with your normalized expression set from Assignment #1. Define your model design to be used to calculate differential expression - revisit your MDS plot from Assignment #1 to demonstrate your choice of factors in your model.
- Calculate p-values for each of the genes in your expression set. How many genes were significantly differentially expressed? What thresholds did you use and why?
- Multiple hypothesis testing - correct your p-values using a multiple hypothesis correction method. Which method did you use? And Why? How many genes passed correction?
- Show the amount of differentially expressed genes using an MA Plot or a Volcano plot. Highlight genes of interest.
- Visualize your top hits using a heatmap. Do you conditions cluster together? Explain why or why not.
- Make sure all your figures have proper heading and labels. Every figure included in the report should have a detailed figure legend
With your significantly up-regulated and down-regulated set of genes run a thresholded gene set enrichment analysis
- Which method did you choose and why?
- What annotation data did you use and why? What version of the annotation are you using?
- How many genesets were returned with what thresholds?
- Run the analysis using the up-regulated set of genes, and the down-regulated set of genes separately. How do these results compare to using the whole list (i.e all differentially expressed genes together vs. the up-regulated and down regulated differentially expressed genes separately)?
- Present your results with the use of tables and screenshots. All figures should have appropriate figure legends.
- If using figures create a figures directory in your repo and make sure all references to the figures are relative in your Rmarkdown notebook.
The most important aspect of the analysis is relating your results back to the initial data and question.
- Do the over-representation results support conclusions or mechanism discussed in the original paper?
- Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your results.
Name your Rmarkdown file A2_<yourname></yourname>.Rmd and place it in the root directory of your repo.
This report should be well organized and easy to read. Make sure to include the following elements:
- Student name and title
- Data source used for the analysis
- Introduction to the data used. Include figures to describe the data. If using figures from the paper make sure to cite them.
- Table of contents
- Headers
- References - reference all packages that you used!
All of your work should be written out as an RNotebook. Explanations of each step should be outlined using rmarkdown in the notebook interspersed with the code needed to perform the described task. Make sure all the questions listed in the interpretation section are addressed.
- knit your RNotebook to html.
- submit your html notebook to quercus.
- create link to your html notebook on the readme of your repository.
- Add the link to the student wiki.
Before handing in your assignment test that it will compile with our docker by running the following command:
docker run --rm -it -v "$(pwd)":/home/rstudio/projects --user rstudio risserlin/bcb420-base-image /usr/local/bin/R -e "rmarkdown::render('/home/rstudio/projects/name_of_rmd.Rmd',output_file='/home/rstudio/projects/name_of_html.html')" > processing_output_filename
- Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C,Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for benchmarking gene set enrichment analysis. Brief Bioinform. 2020 Feb 6 [PMID](https://www.ncbi.nlm.nih.gov/pubmed/32026945)
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.