# cisQTL analysis workflows

This section documents output from the Assocation Scan section of command generator MWE and explained the purpose for each of the command. The file used in this page can be found at [here](https://drive.google.com/drive/folders/16ZUsciZHqCeeEWwZQR46Hvh5OtS8lFtA?usp=sharing).

There are two options availble: TensorQTL and APEX. For a gene-SNP pairs, they produce the same cis norminal association p-value.

The default cis window is defined as 1 million dp up and down stream of the transcription start site(TSS). 

If bed file that were not generated by our workflow are used, please be noted that the end column should be TSS + 1 instead of the actual end of gene region.

**Each commands in the Molecular Phenotype Processing tutorials will be generated once per theme. The MWE is considered a one theme analysis** 


## TensorQTL

[TensorQTL](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1836-7) is a new implementation of [Matrix QTL](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3348564/). It will produce two gene-level pvalue for the association: one estimated using beta-distribution and one estimated using permutation.

In [None]:
sos run pipeline/TensorQTL.ipynb cis \
    --genotype-list plink_files_list.txt   \
    --phenotype-list output/data_preprocessing/MWE/phenotype_data/MWE.log2cpm.bed.processed_phenotype.per_chrom.recipe \
    --covariate-file output/data_preprocessing/MWE/covariates/MWE.log2cpm.MWE.covariate.cov.MWE.MWE.related.filtered.extracted.pca.projected.resid.bed.PEER.MWE.covariate.cov.MWE.MWE.related.filtered.extracted.pca.projected.cov.gz    \
    --cwd output/association_scan/MWE/TensorQTL  \
    --container containers//TensorQTL.sif

## APEX

[APEX](https://www.biorxiv.org/content/10.1101/2020.12.18.423490v1) can use either a linear model or a linear mixed model for association testing. The linear mix model is potentially useful for analysis with related individuals. By default, our workflow use the linear model for better comparability with the TensorQTL method.

For gene-level p-value, APEX use a recently-developed [aggregated Cauchy association test (ACAT) to combine p-values under arbitrary dependence structures](https://arxiv.org/abs/1808.09011) and should be comparable to beta-approximated permutation p-values. 

It should be noted that APEX requires indexed vcf.gz file as input, which was transformed from the per chromosome plink binary input(bed/bim/fam) for tensorQTL in our pipeline.

There are various cautions in using the APEX, please see the [module page](https://github.com/cumc/xqtl-pipeline/blob/main/code/association_scan/APEX/APEX.ipynb) for detail.

In [None]:
sos run pipeline/APEX.ipynb cis \
    --genotype-list vcf_files_list.txt   \
    --phenotype-list output/data_preprocessing/MWE/phenotype_data/MWE.log2cpm.bed.processed_phenotype.per_chrom.recipe \
    --covariate-file output/data_preprocessing/MWE/covariates/MWE.log2cpm.MWE.covariate.cov.MWE.MWE.related.filtered.extracted.pca.projected.resid.bed.PEER.MWE.covariate.cov.MWE.MWE.related.filtered.extracted.pca.projected.cov.gz    \
    --cwd output/association_scan/MWE/APEX  \
    --container containers//APEX.sif