Skip to content

Journal Entry Assignment #2: Differential Gene expression and Preliminary ORA

itsSabbir edited this page Apr 14, 2022 · 11 revisions

Table of Contents

Objective

The objective of this assignment is to take the normalized expression data that was created in Assignment #1 and then rank genes according to the differential expression that was performed on the normalized expression data. Furthermore, With that ranked list perform thresholded over-representation analysis(ORA) to highlight dominant genes/themes in the top set of genes for the dataset. Additionally this assignment and all the information conducted should be a standalone piece of work, as in the information and ideas from Assignment 1 will be referenced but they are not the primary focus here.

Time est.: 5 h
Time used: 0.5 h
Date started: 2022/03/10 Date completed: 2022/04/13

Activates & Tasks

The activities and tasks for the assignment can be found here.

Note that the answers to these questions and requirements can be found in the matching headings of the HTML document for this assignment with a general idea and walkthrough of my thought process found below, in order of logical progression whilst working through the assignment.

Progress & Notes

I decided not to opt for making my Assignment 1 file part of my Assignment 2 file through the use of "Child nodes" because not everything I did or used in the Assignment 1 was correct or to my liking. Some things, after having learned more about data analysis and R caused me to change my perspective on the things I did. Additionally, based on feedback from both Professor Isserlin and TA Tamar Av-Shalom there were some things I needed to change and do differently. One such example is the MDS plot, which wasn't correct as it did not use the intended values. We can see in Figure 1 the MDS plot that there is clustering and therefore can safely assume that there is a strong relationship between the genes and their effects that are being investigated. This was the same conclusion that was determined in Assignment 1, even after correction, as well as using edgeR to normalize and clean the data as we had done in the previous assignment, we made some corrections and formatting changes so that the data table header was more presentable.

After downloading, cleaning, and analyzing RNASeq data, we use the Benjamini-Hochberg (BH) correction method to adjust for outliers. We also used the BH method of correction because it is less stringent, we used Benjamini-Hochberg. Bonferroni is more stringent, so we'll get fewer hits. It is preferable to use a less stringent correction because we want a reasonable number of results for our downstream analysis and do not need to worry about the cost of testing differential expression.

With this in mind, I created a volcano plot, MA plot, which has a very similar spread in terms of their data points. We can see that the genes that were expressed in a nice cluster. Then I ordered the datapoints of the top hits, i.e. for the genes as a heatmap. Those were values that were less than a threshold of p <0.05 as that seems to be the magic number.

I had a total of 608 genes after correction, 280 that were upregulated, and 328 of them were downregulated.

Gene enrichment analysis is performed with g:profiler, specifically gprofiler2. The authors of the paper used false discovery rate (FDR) correction using the fdrtool package, and differential expression significance was set at a 25% FDR rate, furthermore the pathway enrichment significance was set at p < 0.05. To identify biological pathways affected by MDD, Gene Set Enrichment Analysis (GSEA) was performed using the Wald statistic-ranked gene lists and EnrichmentMap gene-set database under their default parameters.

So we can see here that the authors' method to determine significant genes and perform differential gene analysis differed from my approach as we used Over Expression Analysis (ORA).

Conclusion, Outlook, & Discussion

I did not require to do any more threshold reduction as the number of genes that I had were somewhat reasonable. Furthermore, the authors of the paper (Hyunjung et. al., 2022) had similar differentially expressed genes. The authors of the paper identified 307 genes showing significant group differences 168 upregulated, 139 downregulated and gene sets were significantly altered in those with MDD, 528 gene sets, 267 upregulated, 261 downregulated. Comparatively, we can see how close these numbers are. The large difference of 838 genes being differentuially regualted and my 608 are most likely due to the fact that

References

  1. Note that this is written in Markdown because it makes it easier to CV the template to Mediawiki edited pages.