Skip to content

EXpression PREdiction with Summary Statistics Only

Notifications You must be signed in to change notification settings

LidaWangPSU/EXPRESSO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

EXPRESSO

EXpression PREdiction with Summary Statistics Only paper link

Table of contents

Introduction

EXPRESSO (EXpression PREdiction with Summary Statistics Only) could bulid gene expression model with eQTL summary statistics and reference panel only. It also integrates 3D genomic data to define cis-regulatory regions properly and uses epigenetic annotation to prioritize causal variants. It is developed and maintained by Lida Wang at Dajiang Liu's Group.

Installation

The package is hosted on github, which allows installation and update to be very easy. First, make sure you have the MASS, data.table, BEDMatrix and caret packages installed.

install.packages("devtools")
library(devtools)

And also, you need the latest version of fast.lasso and rareGWAMA to be installed.

devtools::install_github("zhanxw/fast.lasso")
devtools::install_github("dajiangliu/rareGWAMA")

Then you could install EXPRESSO from the repository here.

devtools::install_github("LidaWangPSU/EXPRESSO/EXPRESSO")
library(EXPRESSO)

Here we go.

Quick tutorial

Bulid gene expression prediction model

res.tmp <- EXPRESSO(sumstatFile,annoFile,windowFile,refFile,out_path,minMaf,maxIter,gene.vec,append=F)

Input includes

  • sumstatFile: summary statistics file
  • annoFile: epigenetic annotation file
  • windowFile: 3D genomic windows file
  • refFile: reference panel file
  • out_path: pre-specified output path
  • minMaf: filter by minimum allele frequency
  • maxIter: maximum iteration of gradient descent algorithm
  • gene.vec: a list of input gene id

Output results

We perform EXPRESSO by three different tunning parameter methods, including pseudo variable selection and MSE.

The weight output includes:

  • gene: gene id
  • snp: snp id
  • weight: corresponding non-zero weight

The cv output includes:

  • gene: gene id
  • window: the chosen 3D window
  • phi: the chosen penalty factor for epigenetic annotated variants
  • r2: r2 estimated by summary-statistics based cross-validation

Usage

We provided example input data here.

Data were subsetted from GTEx whole blood tissue as an example to run the script.

Example of R script used to run EXPRESSO can be found here.

Example of EXPRESSO output can be found here.

Contact

Lida Wang lida.wang.96@gmail.com

About

EXpression PREdiction with Summary Statistics Only

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages