analysis/index.Rmd

---
title: "Home"
output:
  html_document:
    toc: false
---

`truncash` (Truncated ASH) is an exploratory project with Matthew, built on [`ashr`].

* [Matthew's initial observation on null, correlated data](voom_null.html)

Matthew did a quick investigation of the p values and z scores obtained for simulated null data (using just voom transform, no correction) from real RNA-seq data of [GTEx](http://www.gtexportal.org/home/).  Here is what he found.

"I found something that I hadn’t realized, although is obvious in hindsight: although you sometimes see inflation under null of $p$-values/$z$-scores, the most extreme values are not inflated compared with expectations (and tend to be deflated). That is the histograms of $p$-values that show inflation near $0$ (and deflation near $1$) actually hide something different going on in the very left hand side near $0$.  The qq-plots are clearer… showing most extreme values are deflated, or not inflated.  This is expected under positive correlation i think.  For example, if all $z$-scores were the same (complete correlation), then most extreme of n would just be $N(0,1)$. but if independent the most extreme of n would have longer tails..."

Matthew's initial observation inspired this project.  If under positive correlation, the most extreme tend to be not inflated, maybe we can use them to control the false discoveries.  Meanwhile, if the moderate are more prone to inflation due to correlation, maybe it's better to make only partial use of their information.

* [Occurrence of extreme observations](ExtremeOccurrence.html)

As [Prof. Michael Stein](https://galton.uchicago.edu/~stein/) pointed during a conversation with [Matthew](http://stephenslab.uchicago.edu/), if the marginal distribution is correct then the expected number exceeding any threshold should be correct.  So if the tail is "usually"" deflated, it should be that with some small probability there are many large $z$-scores (even in the tail).  Therefore, if "on average" we have the right number of large $z$-scores/small $p$-values, and "usually" we have too few, then "rarely" we should have too many.  A simulation is run to check this intuition.

* [Two FWER-controlling procedures on correlated null](StepDown.html)

In order to understand the behavior of $p$-values of top expressed, correlated genes under the global null, simulated from GTEx data, we apply two FWER-controlling multiple comparison procedures, Holm's "step-down" ([Holm 1979]) and Hochberg's "step-up." ([Hochberg 1988])

* [`truncash` Model and first simulations](truncash.html)

* [Pipeline for simulating null data](nullpipeline.html)

Using a toy model to examine and document the pipeline to simulate null summary statistics at each step, including `edgeR::calcNormFactors`, `limma::voom`, `limma::lmFit`, `limma::eBayes`.

* [FDR on Null, Part 1](FDR_Null.html)
* [FDR on Null, Part 2](FDR_null_betahat.html)

Apply two FDR-controlling procedures, BH and BY, as well as two $s$ value models, `ash` and `truncash` to the simulated, correlated null data, and compare the numbers of false discoveries (by definition, all discoveries should be false) obtained.  Part 1 uses $z$ scores only, Part 2 uses $\hat \beta$ and moderated $\hat s$.

* [$\hat\pi_0$ estimated in correlated global null](pihat0_null.html)

$\hat\pi_0$ estimated by `ash` and `truncash` with $T = 1.96$ on correlated global null data simulated from GTEx/Liver.  Ideally they should be close to $1$.

[`ashr`]: https://github.com/stephens999/ashr

<!-- The goal of this new template is to simplify the setup and maintenance of a research website. -->
<!-- Specifically, -->

<!-- *  Easier to build and extend the website using the new tools in the [rmarkdown][] package and [latest RStudio release][rstudio] -->
<!-- *  Easier to deploy the website with Git and GitHub by only using one branch -->

<!-- [rmarkdown]: http://rmarkdown.rstudio.com/rmarkdown_websites.htm -->
<!-- [rstudio]: https://www.rstudio.com/products/rstudio/download/preview/ -->