Skip to content

ekatsevi/Focused-BH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Focused BH

A novel FDR-controlling method for structured hypotheses that inputs any pre-specified filter and outputs a non-redundant rejection set with respect to that filter.

Overview

Focused BH is a multiple testing procedure designed for a broad range of applications where structured hypotheses arise. It is mainly motivated by problems with hierarchical structure, such as (1) phenome-wide association studies, with tree structured diseases based on the International Classification of Diseases, with the outer nodes filter and (2) Gene Ontology enrichment analysis, with directed acyclic graph structured biological processes based on the Gene Ontology, with the REVIGO filter. Focused BH also extends beyond hierarchically structured applications, such as to spatially structured applications like genome-wide association studies or neuroimaging analysis.

This repository provides software implementing Focused BH, as well as code to reproduce all numerical simulations and data analysis in the paper.

Repository structure

  • data/: contains raw and processed data from the PheWAS analysis of the UK Biobank data1 and the GO enrichment analysis of the breast cancer outcome data.
  • precomp/: contains precomputation results for our numerical simulations.
  • results/: contains final results for our numerical simulations.
  • src/: contains all source code for methodology, numerical simulations, data analysis, and plotting.

1 Note that our analysis of the UK Biobank data is based on data fields 41202 (Diagnoses - main ICD10) and 22182 (HLA imputation values), which are not publicly accessible. If you do not have access to these fields, we have provided summary statistics in data/processed/biobank so that the multiple testing portion of our analysis can be reproduced. If you do have access to these fields, place them into a comma-separated file called "ukb25261.csv" in the directory data/raw/biobank to reproduce our full data analysis.

Dependencies

The recommended operating system is Linux; the code has not been tested on other operating systems.

The following R (version 3.6.2) packages are required:

  • readxl 1.3.1
  • ontologyIndex 2.5
  • qvalue 2.18.0
  • cherry 0.6.13
  • structSSI 1.1.1
  • igraph 1.2.4.1
  • ape 5.3
  • R.utils 2.9.0
  • reshape2 1.4.3
  • Matrix 1.2.18
  • multtest 2.42.0
  • ggraph 2.0.0
  • kableExtra 1.1.0
  • janitor 1.2.0
  • VennDiagram 1.6.20
  • gridExtra 2.3
  • tidyverse 1.2.1

The code was tested using the R and package versions above, though later version should be compatible as well.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published