SWAM - Smartly weighted averaging across multiple tissues

Overview

SWAM is an gene expression imputation method which combines information from multiple sources to boost accuracy of imputed gene expression levels

Getting Started

Prerequisites

To use SWAM, you need to have the following tools installed in your system.

perl (version 5 or later recommended)
python (version 2.7 is recommended. numpy package is also required)
R (version 3.4 or later recommended)
htslib : The binary tabix should be include in your $PATH. Type tabix in your command line to check.
sqlite3 : (3.22 or later recommended)

(The software prediXcan, which is currently deprecated, was originally downloaded from https://github.com/hakyimlab/PrediXcan and copied in this repository)

Cloning the repository

Next, clone the repository using the following command:

git clone https://github.com/aeyliu/SWAM.git

Preparing Input Files

genotypes : the genotype file should be dosages with the following columns - chromosome, RSID, position, reference allele, alternate allele, minor allele frequency, dosages

#example of genotypes file (tab separated)
#chromosome   rsid          position  ref alt   maf   dosage_individual1  dosage_individual2    ...
chr22         rs141944226   16286442  C    G    0.08       0.5                1  
chr22         rs150703810   16286465  G    C    0.07       0.5                2
...

samples : a text file with sample IDs should be included corresponding to the columns of the genotype file

#example of samples file (tab separated)
ID1     ID1
ID2     ID2
...

single-tissue database files : these .db files are compatible with the format generated by prediXcan (for example, GTEx derived tissues can be obtained from http://predictdb.org/). If using your own expression/genotype data, refer to prediXcan pipeline on how to generate predictDB-style models (https://github.com/hakyimlab/PrediXcan)
expression file for target tissue : measured expression file with following columns - gene ensemble ID, expression values (for each sample)

#example of expression file
Id                    ID1                    ID2                    ID3                 ...
ENSG00000177663.9     -0.120170828099355     -0.905468215149992     1.62413112667095 
ENSG00000069998.8     -1.35973738393861      0.640666889919105      -1.00885646145564
...

Running SWAM (example)

An example to run SWAM is included in the /SWAM/examples folder
Example of input files can be examined in the /SWAM/examples/sample folder
To run the example, simply replace {} with the directory where you cloned this repository
Also remember to specify the correct prediXcan installation path

## Modify these environment variables to conform your settings
export SWAMDIR=/path/to/SWAM
## Run this command to run example code
${SWAMDIR}/scripts/swam \
--directory ${SWAMDIR}/examples/sample/GTEx-V6p-1KG-2016-11-16 \
--name TW_Cells_EBV-transformed_lymphocytes_0.5_1KG \
--expr ${SWAMDIR}/examples/sample/Cells_EBV-transformed_lymphocytes_Analysis.chr22.expr.txt \
--geno ${SWAMDIR}/examples/sample/genotypes \
--PrediXcan-path ${SWAMDIR}/scripts/PrediXcan.py \
--num-cpu 4 \
--out ${SWAMDIR}/examples/lcl

To run SWAM, either --directory or --index file must be specified (either are fine)
The file format for the index file can be examined in the output {}/SWAM/examples/lcl/index.txt, and contains two columns: first column is the name of each tissue, and second column is the file path of its corresponding prediction model

Commands

Current list of commands for SWAM:

--index

Index file containing the list of tissue-specific training models. Each line should have two columns: 1) Tissue name and 2) file path for prediction model
--directory

Directory containing all prediction models to be used by SWAM. If index file is not specified, will be generated automatically
--name

Name of the target tissue. Must be included in the index file
--expr

Measured expression data for the target tissue, see input files for further details
--geno

Genotype files in gzipped dosage format, see input files for further details
--out

Prefix of output files

Additional options:

--num-cpu

Assign number of CPUs for parallelization (this will be helpful when calculating covariance file)
--PrediXcan-path

Path to PrediXcan software tool
--sqlite3-path

Path to sqlite3 software tool
--Rscript-path

Path to Rscript software tool
--tabix-path

Path to tabix software tool
--keep-files

Option to keep intermediate files
--keep-files

Calculate covariate matrix, which is needed for metaXcan
--cal-cov

Use cross-validation to determine tuning parameter
--cv

Command-line documentation of SWAM

A brief usage on how to use SWAM can be obtained by running it without any arguments.

${SWAMDIR}/scripts/swam

ERROR: Missing required arguments. Please see the usage below
Usage:
    /path/to/SWAM/scripts/swam [options]

     General Options:
      -help             Print out brief help message [OFF]
      -man              Print the full documentation in man page style [OFF]

     Required Options:
      -index STR        Index file containing the list of tissue-specific training models. Each line should have [TISSUE_NAME] [Path to PredictDB-formatted file []
      -directory STR    Directory containing all prediction models to be used  by SWAM. If index file is not specified, will be generated from directory []
      -name STR         Name of the target tissue. Must be included in the index file []
      -expr STR         Measured expression data for the target tissue (in PrediXcan format). First line has sample IDs, and from the second line [GENE_NAME] [EXPR_FOR_SAMPLE_1] [EXPR_FOR_SAMPLE_2] ... []
      -geno STR         Genotype files in gzipped dosage format in PrediXcan format []
      -out STR          Prefix of output files []

     Additional Options:
      -num-cpu STR      Assign number of CPUs for parallelization [1]
      -PrediXcan-path STRPath to PrediXcan software tool [PrediXcan.py]
      -Rscript-path STR Path to Rscript tool [Rscript]
      -sqlite3-path STR Path to sqlite3 tool [sqlite3]
      -tabix-path STR   Path to tabix tool [tabix]
      -keep-files       Option to keep intermediate files [OFF]
      -cal-cov          Calculate covariate matrix [OFF]
      -cv               Use cross-validation to determine tuning parameter [OFF]

The full command line documentation of SWAM can be obtained using --help option as follows

${SWAMDIR}/scripts/swam --help

Usage:
    /path/to/SWAM/scripts/swam [options]

     General Options:
      -help             Print out brief help message [ON]
      -man              Print the full documentation in man page style [OFF]

     Required Options:
      -index STR        Index file containing the list of tissue-specific training models. Each line should have [TISSUE_NAME] [Path to PredictDB-formatted file []
      -directory STR    Directory containing all prediction models to be used  by SWAM. If index file is not specified, will be generated from directory []
      -name STR         Name of the target tissue. Must be included in the index file []
      -expr STR         Measured expression data for the target tissue (in PrediXcan format). First line has sample IDs, and from the second line [GENE_NAME] [EXPR_FOR_SAMPLE_1] [EXPR_FOR_SAMPLE_2] ... []
      -geno STR         Genotype files in gzipped dosage format in PrediXcan format []
      -out STR          Prefix of output files []

     Additional Options:
      -num-cpu STR      Assign number of CPUs for parallelization [1]
      -PrediXcan-path STRPath to PrediXcan software tool [PrediXcan.py]
      -Rscript-path STR Path to Rscript tool [Rscript]
      -sqlite3-path STR Path to sqlite3 tool [sqlite3]
      -tabix-path STR   Path to tabix tool [tabix]
      -keep-files       Option to keep intermediate files [OFF]
      -cal-cov          Calculate covariate matrix [OFF]
      -cv               Use cross-validation to determine tuning parameter [OFF]

Options:
    -help   Print a brief help message and exits

    -man    Prints the manual page and exits

    --help [ON]
            Print a help message and exits

    --man [OFF]
            Prints a manual page and exits upon typing 'q'

    --index STR []
            Index file containing the list of tissue-specific training
            models. Each line should have [TISSUE_NAME] [Path to
            PredictDB-formatted file

    --directory STR []
            Directory containing all prediction models to be used by SWAM.
            If index file is not specified, will be generated from directory

    --name STR []
            Name of the target tissue. Must be included in the index file

    --expr STR []
            Measured expression data for the target tissue (in PrediXcan
            format). First line has sample IDs, and from the second line
            [GENE_NAME] [EXPR_FOR_SAMPLE_1] [EXPR_FOR_SAMPLE_2] ...

    --geno STR []
            Genotype files in gzipped dosage format in PrediXcan format

    --out STR []
            Prefix of output files

    --num-cpu STR [1]
            Assign number of CPUs for parallelization

    --PrediXcan-path STR [PrediXcan.py]
            Path to PrediXcan software tool

    --Rscript-path STR [Rscript]
            Path to Rscript tool

    --sqlite3-path STR [sqlite3]
            Path to sqlite3 tool

    --tabix-path STR [tabix]
            Path to tabix tool

    --keep-files [OFF]
            Option to keep intermediate files

    --cal-cov [OFF]
            Calculate covariate matrix
     
    --cv [OFF]
             Use cross-validation to determine tuning parameter

Download SWAM model trained from GTEx data

You can download SWAM models trained from GTEx v6 and v8 data at https://doi.org/10.5281/zenodo.5866500

Citing SWAM

Our SWAM paper is published at PLoS Genetics. Please use the following citation:

Liu AE, Kang HM. Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data. PLoS Genetics. 2022 Jan 31;18(1):e1009571. https://doi.org/10.1371/journal.pgen.1009571

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWAM - Smartly weighted averaging across multiple tissues

Overview

Getting Started

Prerequisites

Cloning the repository

Preparing Input Files

Running SWAM (example)

Commands

Command-line documentation of SWAM

Download SWAM model trained from GTEx data

Citing SWAM

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SWAM - Smartly weighted averaging across multiple tissues

Overview

Getting Started

Prerequisites

Cloning the repository

Preparing Input Files

Running SWAM (example)

Commands

Command-line documentation of SWAM

Download SWAM model trained from GTEx data

Citing SWAM

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages