# Illustration of xQTL protocol

This notebook illustrates the computational protocols available from this repository for the detection and analysis of molecular QTLs (xQTLs). A minimal toy data-set consisting of 49 de-identified samples are used for the analysis.

## Analysis

Please visit [the homepage of the protocol website](https://cumc.github.io/xqtl-pipeline/) for the general background on this resource, in particular the [Getting Started](https://cumc.github.io/xqtl-pipeline/README.html#getting-started) section. To perform a complete analysis from molecular phenotype quantification to xQTL discovery, please conduct your analysis in the order listed below, each link contains a mini-protocol for a specific task. All commands documented in each mini-protocol should be executed in the command line environment.

### Bulk RNA-seq molecular phenotype quantification

1. [Reference data munging & QC](https://cumc.github.io/xqtl-pipeline/code/data_preprocessing/reference_data.html)
2. [Quantification of gene expression](https://cumc.github.io/xqtl-pipeline/code/molecular_phenotypes/bulk_expression.html)
3. [Quantification of alternative splicing events](https://cumc.github.io/xqtl-pipeline/code/molecular_phenotypes/splicing.html)
4. [Quantification of DNA methylation](https://cumc.github.io/xqtl-pipeline/code/molecular_phenotypes/calling/methylation_calling.html)

### xQTL association analysis

1. [Phenotype data munging & QC](https://cumc.github.io/xqtl-pipeline/code/data_preprocessing/phenotype_preprocessing.html)
2. [Genotype data munging & QC](https://cumc.github.io/xqtl-pipeline/code/data_preprocessing/genotype_preprocessing.html)
3. [Covariates data munging & QC](https://cumc.github.io/xqtl-pipeline/code/data_preprocessing/covariate_preprocessing.html)
4. [cis-QTL association testing](https://cumc.github.io/xqtl-pipeline/code/association_scan/cisQTL_scan.html)
5. [trans-QTL association testing](https://cumc.github.io/xqtl-pipeline/code/association_scan/transQTL_scan.html)

### Multi-omics data integration

## Data

For record keeping: preparation of the demo dataset is documented [on this page](https://github.com/gaow/lab-wiki/blob/master/private/data/xQTL_Protocol.md) --- this is a private repository accessible to Gao Wang's group members.

For protocols listed in this page, downloaded required input data in [Synapse](https://www.synapse.org/#!Synapse:syn36416559/files/). 
* To be able downloading the data, first create user account on [Synapse Login](https://www.synapse.org/). Username and password will be required when downloading
* Downloading required installing of Synapse API Clients, type `pip install synapseclient` in terminal or Command Prompt to install. Details list [on this page](https://help.synapse.org/docs/Installing-Synapse-API-Clients.1985249668.html).
* To download folder, type `synapse get -r SynapseID#` in terminal or Command Prompt, replace SynapseID# with true Synapse ID. Synapse ID can be found in [Synapse](https://www.synapse.org/#!Synapse:syn36416559/files/). Each folder in different level has unique ID, which allowing you to download only some folders or files within the entire folder. For example, if download all data under protocol_data folder, type `synapse get -r syn36416601 ` in terminal or or Command Prompt.

## Software environment: Singularity vs Docker

Our example analysis documented on this website are performed using `singularity`, via the `--container` option pointing to a `sif` singularity image file. For example, `--container TensorQTL.sif` uses `TensorQTL.sif` image to perform analysis for QTL association mapping via software `TensorQTL`. If you use Docker, you need to replace 

```
--container TensorQTL.sif
```

with

```
--container gaow/tensorqtl
```

where:

1. [`gaow` is the dockerhub account](https://hub.docker.com/u/gaow) under which all docker images are saved. You do not have to download these docker images manually. Simply point the dockerhub repository name to `--container` option as above --- SoS will download it the first time you run the analysis.
2. Please use all lower case letters and drop `.sif` extension when you modify the command from using singularity to docker, eg, `TensorQTL.sif` is modified into `tensorqtl`.

## Analyses on High Performance Computing clusters

The protocol example shown above performs analysis on a desktop workstation, as a demonstration. Typically the analyses should be performed on HPC cluster environments. This can be achieved via [SoS Remote Tasks](https://vatlab.github.io/sos-docs/doc/user_guide/task_statement.html) on [configured host computers](https://vatlab.github.io/sos-docs/doc/user_guide/host_setup.html). We provide this [toy example for running SoS pipeline on a typical HPC cluster environment](https://github.com/cumc/xqtl-pipeline/blob/main/code/misc/Job_Example.ipynb). First time users are encouraged to try it out in order to help setting up the computational environment necessary to run the analysis in this protocol.