**AUTHOR:** <br>
Vasilis Raptis

**DATE:** <br>
09.07.2024 

**PURPOSE:** <br>
This notebook: 
- updates regenie and pheno files necessary for regenie step2 while using ***.bgen*** files. Array genotypes have FID==0, while srWGS .bgen files have FID==IID.

**NOTES:** <br>
- originally created in 03_part2b_run_regenie_step2_clinvar notebook


**Setup:**

In [26]:
# libraries
library(data.table)
library(tidyverse)

## Get my bucket name
my_bucket  <- Sys.getenv("WORKSPACE_BUCKET")
## Google project name
GOOGLE_PROJECT <- Sys.getenv("GOOGLE_PROJECT")

**Update .id files:** 

In [31]:
# change FID column to be the same as IID column, to match the .sample files
system("mkdir -p ./files_for_bgen_step2", intern=T)

system(paste0("awk '{print $2,$2}' microarray/plink_v7.1/arrays_qc_afr_clean.id > ", "files_for_bgen_step2/arrays_qc_afr_clean_for_bgen.id"), intern=T)
system(paste0("awk '{print $2,$2}' microarray/plink_v7.1/arrays_qc_amr_clean.id > ", "files_for_bgen_step2/arrays_qc_amr_clean_for_bgen.id"), intern=T)
system(paste0("awk '{print $2,$2}' microarray/plink_v7.1/arrays_qc_eur_clean.id > ", "files_for_bgen_step2/arrays_qc_eur_clean_for_bgen.id"), intern=T)

system("ls ./files_for_bgen_step2/*", intern=T)

**Update .pheno files:**

In [32]:
# change FID column to be the same as IID column, to match the .sample files

eur_pheno <- fread("pheno/eur_pheno_clean.txt") %>% mutate(FID = IID)
write.table(eur_pheno, "files_for_bgen_step2/eur_pheno_clean_for_bgen.txt", sep=" ", row.names=F, col.names=T, quote=F)

afr_pheno <- fread("pheno/afr_pheno_clean.txt") %>% mutate(FID = IID)
write.table(afr_pheno, "files_for_bgen_step2/afr_pheno_clean_for_bgen.txt", sep=" ", row.names=F, col.names=T, quote=F)

amr_pheno <- fread("pheno/amr_pheno_clean.txt") %>% mutate(FID = IID)
write.table(amr_pheno, "files_for_bgen_step2/amr_pheno_clean_for_bgen.txt", sep=" ", row.names=F, col.names=T, quote=F)


**Update step1 output files:**

In [33]:
## change header in .loco files (FID_IID)
# eur
eur_loco  <- fread("regenie_out/step1/del_eur_clean_step1_1.loco")
old_names <- names(eur_loco)[-c(1)] 
new_names <- str_remove(old_names, "0_" ) %>% paste0(. ,"_", .)
names(eur_loco)[-c(1)] <- new_names
write.table(eur_loco, "files_for_bgen_step2/del_eur_clean_step1_1.loco", sep=" ", row.names=F, col.names=T, quote=F)

# afr
afr_loco  <- fread("regenie_out/step1/del_afr_clean_step1_1.loco")
old_names <- names(afr_loco)[-c(1)] 
new_names <- str_remove(old_names, "0_" ) %>% paste0(. ,"_", .)
names(afr_loco)[-c(1)] <- new_names
write.table(afr_loco, "files_for_bgen_step2/del_afr_clean_step1_1.loco", sep=" ", row.names=F, col.names=T, quote=F)

# amr
amr_loco  <- fread("regenie_out/step1/del_amr_clean_step1_1.loco")
old_names <- names(amr_loco)[-c(1)] 
new_names <- str_remove(old_names, "0_" ) %>% paste0(. ,"_", .)
names(amr_loco)[-c(1)] <- new_names
write.table(amr_loco, "files_for_bgen_step2/del_amr_clean_step1_1.loco", sep=" ", row.names=F, col.names=T, quote=F)



In [23]:
## change pred.list files

####### NOTE
# not needed, will create new files to run with dsub, see 00_GWAS_pipeline_03.3_regenie_step2_ACAF_srWGS notebook
#######

# eur
# eur_pred <- fread("regenie_out/step1/del_eur_clean_step1_pred.list", header =F)
# eur_pred$V2 <- paste0("/home/jupyter/workspaces/geneticriskfactorsfordelirium/","files_for_bgen_step2/del_eur_clean_step1_1.loco")
# write.table(eur_pred, "files_for_bgen_step2/del_eur_clean_step1_pred.list", sep=" ", row.names=F, col.names=F, quote=F)

# afr 
# afr_pred <- fread("regenie_out/step1/del_afr_clean_step1_pred.list", header =F)
# afr_pred$V2 <- paste0("/home/jupyter/workspaces/geneticriskfactorsfordelirium/","files_for_bgen_step2/del_afr_clean_step1_1.loco")
# write.table(afr_pred, "files_for_bgen_step2/del_afr_clean_step1_pred.list", sep=" ", row.names=F, col.names=F, quote=F)

# amr
# amr_pred <- fread("regenie_out/step1/del_amr_clean_step1_pred.list", header =F)
# amr_pred$V2 <- paste0("/home/jupyter/workspaces/geneticriskfactorsfordelirium/","files_for_bgen_step2/del_amr_clean_step1_1.loco")
# write.table(amr_pred, "files_for_bgen_step2/del_amr_clean_step1_pred.list", sep=" ", row.names=F, col.names=F, quote=F)


**Save to bucket:**

In [34]:
system(paste0("gsutil cp files_for_bgen_step2/* ", my_bucket, "/data/files_for_bgen_step2_all"), intern=T)
