# Factor analysis using Bi-Cross validation

## Overview

This module use an implement of the following paper
> Owen, Art & Wang, Jingshu. (2015). Bi-Cross-Validation for Factor Analysis. Statistical Science. 31. 10.1214/15-STS539. 

The software used is 
> A versatile toolkit for molecular QTL mapping and meta-analysis at scale
Corbin Quick, Li Guan, Zilin Li, Xihao Li, Rounak Dey, Yaowu Liu, Laura Scott, Xihong Lin
bioRxiv 2020.12.18.423490; doi: https://doi.org/10.1101/2020.12.18.423490

## Input and output

1. An indexed bed.gz file with the same format as [PEER factor analysis](PEER_factor.html).
2. A cov file with the same format as [PEER factor analysis](PEER_factor.html).
3. An indexed vcf.gz file.

## Output 
1. A cov.gz file with the same format as [PEER factor analysis](PEER_factor.html).

## Minimal working example

An MWE is uploaded to [google drive](https://drive.google.com/drive/folders/1yjTwoO0DYGi-J9ouMsh9fHKfDmsXJ_4I?usp=sharing)


In [None]:
sos run BiCV_factor.ipynb BiCV \
--phenoFile AC.mol_phe.annotated.bed.gz \
--vcfFile demo_chr1.vcf.gz \
--covFile AC.APEX.cov --cwd ./ \
--container apex.sif --name "demo" &

In [None]:
[global]
# The output directory for generated files. MUST BE FULL PATH
parameter: cwd = path
# The molecular phenotype matrix
parameter: phenoFile = path
# The covariate file
parameter: covFile = path
# The VCF file, required by APEX
# https://github.com/hsun3163/neuro-apex/issues/1
# https://github.com/cumc/xqtl-pipeline/issues/138
parameter: vcfFile = path
# For cluster jobs, number commands to run per job
parameter: job_size = 1
# Wall clock time expected
parameter: walltime = "5h"
# Memory expected
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 8
# Software container option
parameter: container = ""
parameter: name = ""
# N PEER factors
parameter: N = 30

# Default values from PEER:
## The number of max iteration

parameter: iteration = 10

In [3]:
[BiCV]
input: phenoFile, covFile, vcfFile
output: f'{cwd:a}/{name}.BiCV.cov.gz'
task: trunk_workers = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'
bash: container=container, expand= "$[ ]", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout'
    apex factor \
        --out $[_output[0]:nn] \
        --iter $[iteration] \
        --factors $[N] \
        --bed $[_input[0]] \
        --vcf $[_input[2]] \
        --cov $[_input[1]] \
        --threads $[numThreads]