# Analyses of the lipids and other blood traits from UKBB

This notebook applies the `Get_Job_Script.ipynb` to automatically generate the sbatch scripts to run in Yale's cluster. The end result is to apply [various LMM workflows](https://github.com/statgenetics/UKBB_GWAS_dev/tree/master/workflow) to perform association analysis of different lipid traits (cholesterol, HDL, LDL, triglycerides), do clumping analysis and extract associated regions.

## File paths on Yale cluster

- Genotype files in PLINK format:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv`
- Genotype files in bgen format:
`SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/`
- Summary stats for imputed variants BOLT-LMM:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data`
- Summary stats for inputed variants FastGWA:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data`
- Phenotype files:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis`
- Relationship file:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620`
-Other traits to be analyzed:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_CAUC_lipidsforanalysis_apolipoproteinAandB,Hba1c_continuousandcategorical,egfrbyCKDEPI,serumcreatinine,UACR_inverseranknorm_110320`


## 07/28/20 analysis

On the cluster, open up this notebook using the JupyterLab server you set up via the ssh channel, then run the following cells,

## Important analysis notes

Corrected measures are being used for LDL (LDL/0.7) and total cholesterol (TC/0.8). Correction used for those on medication only. 

Triglycerides and HDL are used without correction.

Covariates are age at recruitment, genetic sex, medication status, smoking (1 & 2), and alcohol (1, 2, 3, 4). Smoking and alcohol are dummy coded

Alcohol: f.1558 (categories never/special ocassion collapsed)
Smoking: f.20116 (never/former/current). Plan is to replace with f.20161 (pack years) once we have that variable.
Medication status: cholesterolmedbyf6153or6177or20003 (It's what was used to adjust the phenotype)

### Inverse rank normalized traits

`/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKBCauc_cholesterolandbloodpressurefields_inverseranknorm_covariatesage,sex,alcohol,smokingpackyears_foranalysis`

The variables for analysis are:

1. HDL_inverseranknorm
2. LDL_medcorrected_inverseranknorm
3. triglycerides_inverseranknorm
4. cholesterol.f30690_medcorrected_inverseranknorm
5. diastolic_plus10inmedicated_inverserank
6. systolic_plus15inmedicated_inverserank

Potential covariates:

* age
* sex
* smoking_packyears (there is also smoking_dummy1 and smoking_dummy2, which is from before we had the pack years available)
* alcohol1, alcohol2, alcohol3, alcohol4

**The phenotypes (the continuous ones have already been inverse rank normalized, as indicated by the _int at the ends of their names) are:**

1. ApoA_int (Apolipoprotein A)
2. ApoB_int (Apolipoprotein B)
3. Creatinine_int (Serum Creatinine)
4. Hba1c_int (continuous Hba1c)
5. Hba1c_categorical (categorical Hba1c, with values 1-3: 1 is Hba1c <= 42, 2 is Hba1c > 42 & < 48, and 3 is Hba1c >= 48)
6. eGFR_int (calculated with CKD-EPI equation using serum creatinine (µmol/l)),
7. UACR_int (urinary albumin-to-creatinine ratio (UACR), calculated with urinary albumin (mg/dL) and urinary creatinine (g/dL))

Covariates in the file are:
* Age
* Genetic Sex
* Smoking (pack years)
* Alcohol (dummy coded, Alcohol1, Alcohol2, Alcohol3, Alcohol4)

## Bash variables for workflow configuration

In [1]:
# Common variables
tpl_file=../farnam.yml
bfile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sampleFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
bgenFile=`echo /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
unrelated_samples=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
formatFile_fastgwa=~/project/UKBB_GWAS_dev/data/fastGWA_template.yml
formatFile_bolt=~/project/UKBB_GWAS_dev/data/boltlmm_template.yml
formatFile_saige=~/project/UKBB_GWAS_dev/data/saige_template.yml
# Container
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
container_marp=/gpfs/gibbs/pi/dewan/data/UKBiobank/marp.sif
# LMM directories
lmm_dir_fastgwa=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/
lmm_dir_bolt=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/
lmm_dir_saige=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/SAIGE_results/results_imputed_data/
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_imp-fastgwa.sbatch
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_imp-bolt.sbatch
lmm_sbatch_saige=../output/$(date +"%Y-%m-%d")_imp-saige.sbatch
#Phenotype file
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
## LMM variables 
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
LDscoresFile=~/software/BOLT-LMM_v2.3.4/tables/LDSCORE.1000G_EUR.tab.gz
geneticMapFile=~/software/BOLT-LMM_v2.3.4/tables/genetic_map_hg19_withX.txt.gz
phenoCol="cholesterol.f30690_medcorrected"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking{1:2} alcohol{1:4}"
covarMaxLevels=10
qCovarCol=AGE
numThreads=20
bgenMinMAF=0.001
bgenMinINFO=0.8
lmm_job_size=1
ylim=0
### Specific to FastGWA
grmFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.grm.sp
### Specific to SAIGE
bgenMinMAC=4
trait_type=quantitative
loco=TRUE
sampleCol=IID
# LD clumping directories
clumping_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/LD_clumping/
clumping_sos=~/project/bioworkflows/GWAS/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_HDL_imp_ldclumping.sbatch
## LD clumping variables
# For sumtastsFiles if more than one provide each path
bfile_ref=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/LD_clumping/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.1210.ref_geno.bed
# Changes dependending upon which traits are analyzed
sumstatsFiles=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/HDL/*.snp_stats.gz
ld_sample_size=1210
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1
#clumpFile= 
#clumregionFile=
# Region extraction directories
extract_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/region_extraction/HDL
extract_sos=~/project/bioworkflows/GWAS/Region_Extraction.ipynb
extract_sbatch=../output/$(date +"%Y-%m-%d")_HDL_imp-region.sbatch
## Region extraction variables
region_file=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/LD_clumping/HDL/*.clumped_region
geno_path=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/UKBB_bgenfilepath.txt
sumstats_path=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/HDL/*.snp_stats.gz
extract_job_size=10




# Bolt-LMM jobs

### HDL

In [2]:
lmm_dir_bolt=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/HDL
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_HDL_imp-bolt.sbatch
phenoCol="HDL"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking{1:2} alcohol{1:4}"
qcovarCol=AGE

lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-13_HDL_imp-bolt.sbatch[0m
INFO: Workflow farnam (ID=39b24b87b30a8a06) is executed successfully with 1 completed step.



## Cholesterol

In [3]:
lmm_dir_bolt=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/cholesterol
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_cholesterol_imp-bolt.sbatch
phenoCol="cholesterol.f30690_medcorrected"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking{1:2} alcohol{1:4}"
qCovarCol=AGE

lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-12-15_cholesterol_imp-bolt.sbatch[0m
INFO: Workflow farnam (ID=b178fdaaac231b03) is executed successfully with 1 completed step.



## LDL

In [2]:
lmm_dir_bolt=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/LDL
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_LDL_imp-bolt.sbatch
phenoCol="LDL_medcorrected"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking{1:2} alcohol{1:4}"
qcovarCol=AGE

lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-12-15_LDL_imp-bolt.sbatch[0m
INFO: Workflow farnam (ID=26fd06bd806de5ef) is executed successfully with 1 completed step.



## Triglycerides

In [None]:
lmm_dir_bolt=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/triglycerides
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_triglycerides_imp-bolt.sbatch
phenoCol="triglycerides"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking{1:2} alcohol{1:4}"
qCovarCol=AGE

lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

# FastGWA jobs

## HDL

In [3]:
lmm_dir_fastgwa=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/HDL
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_HDL_imp-fastgwa.sbatch
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
phenoCol="HDL"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking1 smoking2 alcohol1 alcohol2 alcohol3 alcohol4" 
qCovarCol=AGE

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-09-15_HDL_imp-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=18b620039e084b7d) is executed successfully with 1 completed step.



## Cholesterol

Ran please finish creating the script for fastGWA for the remaining traits

In [2]:
lmm_dir_fastgwa=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/cholesterol
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_cholesterol_imp-fastgwa.sbatch
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
phenoCol="cholesterol.f30690_medcorrected"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking1 smoking2 alcohol1 alcohol2 alcohol3 alcohol4" 
qCovarCol=AGE

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-09-10_cholesterol_imp-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=7cf2c7310ac1d222) is executed successfully with 1 completed step.



## LDL

In [3]:
lmm_dir_fastgwa=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/LDL
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_LDL_imp-fastgwa.sbatch
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
phenoCol="LDL_medcorrected"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking1 smoking2 alcohol1 alcohol2 alcohol3 alcohol4" 
qCovarCol=AGE

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-09-10_LDL_imp-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=74f114d1a8a102d3) is executed successfully with 1 completed step.



## Triglycerides

In [4]:
lmm_dir_fastgwa=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/triglycerides
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_triglycerides_imp-fastgwa.sbatch
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis
phenoCol="triglycerides"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking1 smoking2 alcohol1 alcohol2 alcohol3 alcohol4" 
qCovarCol=AGE

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-09-10_triglycerides_imp-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=e14e0004ea8849e5) is executed successfully with 1 completed step.



# REGENIE int-blood

In [8]:
tpl_file=/home/dc2325/project/bioworkflows/GWAS/farnam.yml
bfile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sampleFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
bgenFile=`echo /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
unrelated_samples=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
#Container
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
container_marp=/gpfs/gibbs/pi/dewan/data/UKBiobank/marp.sif
# Directories
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
lmm_dir_regenie=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/REGENIE_results/results_imputed_data/int_ApoA_ApoB_Creatinine_Hba1c_eGFR_UACR
lmm_sbatch_regenie=../output/$(date +%Y-%m-%d)_lipids_imp-regenie.sbatch
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_CAUC_lipidsforanalysis_apolipoproteinAandB,Hba1c_continuousandcategorical,egfrbyCKDEPI,serumcreatinine,UACR_inverseranknorm_110320
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_CAUC_lipidsforanalysis_apolipoproteinAandB,Hba1c_continuousandcategorical,egfrbyCKDEPI,serumcreatinine,UACR_inverseranknorm_110320
phenoCol="ApoA_int ApoB_int Creatinine_int Hba1c_int eGFR_int UACR_int"
covarCol="SEX Packyrs_smoking alcohol1 alcohol2 alcohol3 alcohol4" 
qCovarCol=AGE
maf_filter=0.001
geno_filter=0.0
hwe_filter=0.0
mind_filter=0.0
trait=
minMAC=4
bsize=1000
ylim=0
job_size=1
numThreads=22
reverse_log_p=True
lowmem_prefix=~/scratch60

lmm_args="""regenie
    --cwd $lmm_dir_regenie
    --bfile $bfile
    --sampleFile $sampleFile
    --bgenFile $bgenFile
    --phenoFile $phenoFile
    --formatFile $formatFile_regenie
    --covarFile $covarFile
    --phenoCol $phenoCol
    --covarCol $covarCol 
    --qCovarCol $qCovarCol
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --trait $trait
    --minMAC $minMAC
    --bsize $bsize
    --numThreads $numThreads
    --job_size $job_size
    --ylim $ylim
    --p-filter 1
    --reverse_log_p $reverse_log_p
    --lowmem_prefix $lowmem_prefix
    --container_lmm $container_lmm
    --container_marp $container_marp
"""
    
sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
        --template-file $tpl_file \
        --workflow-file $lmm_sos \
        --to-script $lmm_sbatch_regenie \
        --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-11-18_lipids_imp-regenie.sbatch[0m
INFO: Workflow farnam (ID=24d29053e6048d83) is executed successfully with 1 completed step.



# REGENIE int-lipids

In [6]:
tpl_file=/home/dc2325/project/bioworkflows/GWAS/farnam.yml
bfile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sampleFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
bgenFile=`echo /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
unrelated_samples=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
#Container
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
container_marp=/gpfs/gibbs/pi/dewan/data/UKBiobank/marp.sif
# Directories
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
lmm_dir_regenie=/gpfs/gibbs/pi/dewan/data/UKBiobank/results//REGENIE_results/results_imputed_data/int_HDL_LDL_Triglycerides_Cholesterol
lmm_sbatch_regenie=../output/$(date +%Y-%m-%d)_int_lipids_imp-regenie.sbatch
phenoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKBCauc_cholesterolandbloodpressurefields_inverseranknorm_covariatesage,sex,alcohol,smokingpackyears_foranalysis
covarFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKBCauc_cholesterolandbloodpressurefields_inverseranknorm_covariatesage,sex,alcohol,smokingpackyears_foranalysis
phenoCol="HDL_inverseranknorm LDL_medcorrected_inverseranknorm triglycerides_inverseranknorm cholesterol.f30690_medcorrected_inverseranknorm"
covarCol="SEX cholesterolmedbyf6153or6177or20003 smoking_dummy1 smoking_dummy2 alcohol1 alcohol2 alcohol3 alcohol4" 
qCovarCol=AGE
maf_filter=0.001
geno_filter=0.0
hwe_filter=0.0
mind_filter=0.0
trait=
minMAC=4
bsize=1000
ylim=0
job_size=1
numThreads=22
reverse_log_p=True
lowmem_prefix=~/scratch60

lmm_args="""regenie
    --cwd $lmm_dir_regenie
    --bfile $bfile
    --sampleFile $sampleFile
    --bgenFile $bgenFile
    --phenoFile $phenoFile
    --formatFile $formatFile_regenie
    --covarFile $covarFile
    --phenoCol $phenoCol
    --covarCol $covarCol 
    --qCovarCol $qCovarCol
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --trait $trait
    --minMAC $minMAC
    --bsize $bsize
    --numThreads $numThreads
    --job_size $job_size
    --ylim $ylim
    --p-filter 1
    --reverse_log_p $reverse_log_p
    --lowmem_prefix $lowmem_prefix
    --container_lmm $container_lmm
    --container_marp $container_marp
"""
    
sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
        --template-file $tpl_file \
        --workflow-file $lmm_sos \
        --to-script $lmm_sbatch_regenie \
        --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-12-15_int_lipids_imp-regenie.sbatch[0m
INFO: Workflow farnam (ID=69ec25f8d6c257bf) is executed successfully with 1 completed step.



# Hudson plots: method comparisson

In [5]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/GWAS/Hudson_plot.ipynb
hudson_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/hudson_plots/pleiotropy/HDL_fastGWA_boltlmm
hudson_sbatch=../output/$(date +"%Y-%m-%d")_HDL_fastGWA_bolt_hudson.sbatch
sumstats_1=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/HDL/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis_HDL.fastGWA.snp_stats.gz
sumstats_2=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/HDL/UKB_Caucasiansubset_cholesterolfields_adjbymedstatus_062420_foranalysis_HDL.boltlmm.snp_stats.gz
toptitle="HDL_fastGWA"
bottomtitle="HDL_boltlmm"
highlight_p_top=0
highlight_p_bottom=0
pval_filter=5e-08
job_size=1
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
#highlight_snp=
annotate_snp=0
phenocol1='HDL'
phenocol2='HDL'

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --annotate_snp $annotate_snp
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/GWAS/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-12-15_HDL_fastGWA_bolt_hudson.sbatch[0m
INFO: Workflow farnam (ID=18f078a1bf16c978) is executed successfully with 1 completed step.

