immuneMR

Contact

4 December 2020
Christiane Gasperi
Chris Cotsapas

Front matter

This README file documents the results of our study Shared associations identify causal relationships between gene expression and immune cell phenotypes(submitted). Here we provide an overview of the code provided in this distribution, the software dependencies, and the sources of genotype, gene expression and immune cell phenotype data required.

Disclaimers

Throughout, we use the SLURM job scheduling system to distribute some of the more compute-intensive analyses on a cluster. We have tagged all instances with #SLURM in MasterScript.sh for ease of reference.
The code here is provided as-is. We have validated that this runs without errors on our own systems, but do not guarantee that this will be the case in different computing environments.

Data sets

We used gene expression data and genotype data from the BLUEPRINT consortium and immune cell phenotype and genotype data from the Milieu Intérieur project.

BLUEPRINT data set

Gene expression data: We downloaded FASTQ files for the RNAseq data for naive CD4+ T cells (169 individuals), CD14+ monocytes (193 individuals) and CD16+ neutrophils (196 individuals) from the BLUEPRINT consortium [Chen et al., Cell, 2016] (https://pubmed.ncbi.nlm.nih.gov/27863251). We used the GTEx pipeline for RNA-seq alignment, quantification and quality control (https://www.gtexportal.org/, Analysis Methods for V8). Scripts/GenExp_phenodata.sh gives an overview on how we applied the GTEx pipeline and can be used for orientation, but is not directly applicable to other data sets or computing environment needs user input and adaptations at various stages.
Genotype data: We obtained genotype data for all individuals for 7,008,524 variants acquired by whole genome sequencing (file name: EGAZ00001235598_release_vcf_06092016_All_chr.BPWP10_13_12_15.vcf).

Milieu Intérieur project data set

Phenotype data: We obtained flow cytometry measurements for 166 immune phenotypes of 816 individuals (file name: LabExMI_rawfacs.txt) and additional information including age, smoking status and CMV infection (file name: LabExMI_covariates.txt).
Genotype data: We obtained quality controlled and imputed genotype data from 816 individuals (file name: LabExMI_imputation_816x5699237.bim/.bed/.fam).

Running our analysis pipeline

Here, we provide the code required to re-analyze the data and replicate our published results. For the seq alignment, quantification and quality control of the BLUEPRINT gene expression data please refer to the GTex pipeline (https://www.gtexportal.org/, Analysis Methods for V8).

Pipeline overview

Our pipeline is split into fourteen distinct steps. These are:

Preparation of the Milieu Intérieur project phenotype data (quality control (QC) and normalization).
QC of the Milieu Intérieur project genotype data
Preparation of the covariates for the Milieu Intérieur project data set
Alignment, quantification and quality control of the BLUEPRINT gene expression data
QC of the BLUEPRINT genotype data
Preparation of the covariates for the BLUEPRIN data set
GWAS analyses on the Milieu Intérieur project immune cell phenotypes
Selection of genetic loci association with immune cell phenotypes
Perform conditional association analyses on the Milieu Intérieur project immune cell phenotypes
Perform unconditional and conditional association analyses on the BLUEPRINT gene expression data (eQTL analysis)
JLIM analysis to identify shared association between gene expression and immune cell phenotypes
Polygenic score (PGS) analysis to identify overall genetic correlation between gene expression and immune cell phenotypes
Transcriptome-wide summary statistics-based Mendelian Randomization (TWMR) analysis (see Porcu et al., Nature Communications, 2019, https://www.nature.com/articles/s41467-019-10936-0)
Two sample MR (TSMR) analysis

Software dependencies

PLINK v1.9
Pandoc 2.0.4
tabix 0.2.6
VCFtools 0.1.14
PRSice-2 v2.2.10 (PRSice.R and PRSice_linux need to be put into Scripts/PRS)
JLIM 2.0 (https://github.com/cotsapaslab/jlim, the jlim-master directory needs to be put into Scripts/JLIM)
R v3.4.1 or later (including packages ggplot2 v3.3.3, plyr v1.8.6, tidyr v1.0.2, gridExtra v.3, knitr v1.28, ggsci v2.9, gCMAP v1.22.1, readxl v1.3.1, png v0.1-7, biomaRt v2.34.2)
R v3.5.0 (used with the package TwosampleMR v0.4.25)
For software dependencies needed for the Gtex pipeline please see https://www.gtexportal.org/, Analysis Methods for V8.

Running the pipeline

Except for the alignment, quantification and quality control of the BLUEPRINT gene expression data (for this see Scripts/GenExp_phenodata.sh and the Gtex pipeline) the entire pipeline can be run from MasterScript.sh, which calls the scripts included in this distribution to execute all fourteen steps of our pipeline. The script is not intended as a stand-alone, unsupervised piece of code - rather, it steps through the various procedures. User input is required at various stages.

Description of files and directories in this distribution

File/directory	Description
Scripts/	Contains all scripts required in by the pipeline.
MasterScript.sh	Main script with fourteen QC and analysis steps of our pipeline.
Scripts/GenExp_phenodata.sh	Overview on how we applied the Gtex pipeline on the BLUEPRINT gene expression data.
README.md	This file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripts

Scripts

MasterScript.sh

MasterScript.sh

README.md

README.md

Repository files navigation

immuneMR

Contact

Front matter

Disclaimers

Data sets

BLUEPRINT data set

Milieu Intérieur project data set

Running our analysis pipeline

Pipeline overview

Software dependencies

Running the pipeline

Description of files and directories in this distribution

About

Releases 1

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Scripts		Scripts
MasterScript.sh		MasterScript.sh
README.md		README.md

cotsapaslab/immuneMR

Folders and files

Latest commit

History

Repository files navigation

immuneMR

Contact

Front matter

Disclaimers

Data sets

BLUEPRINT data set

Milieu Intérieur project data set

Running our analysis pipeline

Pipeline overview

Software dependencies

Running the pipeline

Description of files and directories in this distribution

About

Resources

Stars

Watchers

Forks

Languages