# Factor analysis using Bi-Cross validation

Implemented in APEX

## Main code for the pipeline: APEX
Input: 
1. a bed file for molecular phenotype
if covariate file is required:
1. A cov file with colnames as samples and covariates/pca/factor named as #id columns on the side.
2. A vcf file documenting the genotype of the samples for any genetics regions.

Output:

1. A covariate file with colnames as samples and covariates/pca/factor named as #id columns on the side.

In [None]:
[global]
# The output directory for generated files. MUST BE FULL PATH
parameter: wd = path
cwd = wd
# For cluster jobs, number commands to run per job
parameter: job_size = 1
# Wall clock time expected
parameter: walltime = "5h"
# Memory expected
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 8
# Software container option
parameter: container = str
parameter: name = str

# N PEER factors, If do not specify or specified as 0, default values suggested by 
# UCSC (based on different sample size) Will be used
parameter: N = 4
n_of_factor = N

# Default values from PEER:
## The number of max iteration

parameter: iteration = 30

# The molecular phenotype matrix, in bed, after annotation
parameter: molecular_pheno = path
# The covariate file
parameter: covariate = "None"

# vcf Genotypy list, so that apex factor can run with covariate file input, requirment: vcf shall have the same sample as the cov and molecular phenotype
parameter: genotype_list = path
import pandas as pd
import os
vcf_file = pd.read_csv(genotype_list,sep = "\t")["dir"][0]

In [3]:
[BiCV]
input:  molecular_pheno
output: f'{wd:a}/{name}.BiCV.cov.gz',
        f'{wd:a}/{name}.BiCV.cov'
task: trunk_workers = 1, trunk_size = 1, walltime = '4h',  mem = '20G', tags = f'{step_name}_{_output[0]:bn}'
bash: expand = "$[ ]", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout',container = container,volumes = [f'{wd:ad}:{wd:ad}']
    apex factor \
    --out $[_output[0]:nn] \
    --iter $[iteration] \
    --factors $[n_of_factor] \
    --bed $[_input] \
    --vcf $[vcf_file] $[f'--cov {covariate}' if os.path.exists(covariate) else f'']

    gunzip -f -k $[_output[0]]