# folders for data list

## Overview of the datasets

We have the cohorts and GWAS/QTL datasets:
    
- GWAS: Bellenguez, Jansen, Kunkle, Wightman
- DIAN: mQTL
- EFIGA: pQTL, metaQTL 
- Knight Brain: eQTL, metaQTL, mQTL, pQTL, sQTL
- Knight CSF: metaQTL, pQTL
- MEGENTA AA: eQTL
- MEGENTA NHW: eQTL
- MIGA: eQTL
- MSBB: eQTL, mQTL, pQTL, sQTL
- PART: mQTL
- ROSMAP AC: eQTL, sQTL
- ROSMAP DLPFC: eQTL, mQTL, pQTL, sQTL, haQTL
- ROSMAP PCC: eQTL, sQTL
- ROSMAP microglia: eQTL
- ROSMAP monocyte: eQTL
- ROSMAP snRNA: eQTL(De Jager, Kellis) # Kellis would have additional subtypes: Ast10, Mic12 and Mic13
- STARNET: eQTL
- WHICAP: metaQTL

To make folders as directions below:
- cohort/genotype/original # for genotype data before QC
- cohort/genotype/analysis_ready # for genotype data after QC
- xQTL/cohort/condition/original # for phenotype before phenotype QC imputation, and the original covariates
- xQTL/cohort/condition/analysis_ready # phenotype and covariate data input to fine-mapping and tensorQTL, with genotype and hidden covariates PCs

## The script to make folders

In [None]:
#!/bin/bash

# Specify your S3 bucket name
bucket_name="statfungen"
base_folder="ftp_fgc_xqtl/"

# Define cohorts
cohorts=("DIAN" "EFIGA" "Knight" "MEGENTA" "MIGA" "MSBB" "PART" "ROSMAP" "STARNET" "WHICAP")

# Define conditions for specific cohorts, if applicable
declare -A cohort_conditions
cohort_conditions=(
    ["MEGENTA"]="AA NHW"
    ["Knight"]="Brain CSF"
    ["ROSMAP"]="DLPFC AC PCC microglia monocyte DLPFC_Mic DLPFC_Exc DLPFC_Inh DLPFC_OPC DLPFC_Oli DLPFC_Ast"
)

# Define xQTL types for each combination of cohort and condition
declare -A condition_xQTLs
condition_xQTLs=(
    ["MEGENTA_AA"]="eQTL sQTL" 
    ["MEGENTA_NHW"]="eQTL sQTL"
    ["Knight_Brain"]="eQTL metaQTL mQTL pQTL sQTL" 
    ["Knight_CSF"]="metaQTL pQTL"
    ["ROSMAP_DLPFC"]="eQTL mQTL pQTL sQTL haQTL" 
    ["ROSMAP_AC"]="eQTL sQTL" 
    ["ROSMAP_PCC"]="eQTL sQTL" 
    ["ROSMAP_microglia"]="eQTL" 
    ["ROSMAP_monocyte"]="eQTL"
    ["ROSMAP_DLPFC_Mic"]="eQTL"
    ["ROSMAP_DLPFC_Exc"]="eQTL"
    ["ROSMAP_DLPFC_Inh"]="eQTL"
    ["ROSMAP_DLPFC_OPC"]="eQTL"
    ["ROSMAP_DLPFC_Oli"]="eQTL"
    ["ROSMAP_DLPFC_Ast"]="eQTL"    
    ["DIAN"]="mQTL"
    ["EFIGA"]="pQTL metaQTL"
    ["MIGA"]="eQTL" 
    ["MSBB"]="eQTL mQTL pQTL sQTL"
    ["PART"]="mQTL"
    ["STARNET"]="eQTL"
    ["WHICAP"]="metaQTL"
)

# Function to create a folder in S3
create_s3_folder() {
    /mnt/vast/hpc/homes/rf2872/software/aws/dist/aws s3api put-object --bucket "$bucket_name" --key "${base_folder}$1/"

}

# Create directories
for cohort in "${cohorts[@]}"; do
    create_s3_folder "$cohort/genotype/original"
    create_s3_folder "$cohort/genotype/analysis_ready"

    # Check if there are specific conditions for this cohort
    if [[ -n ${cohort_conditions[$cohort]} ]]; then
        IFS=' ' read -ra conditions <<< "${cohort_conditions[$cohort]}"
        for condition in "${conditions[@]}"; do
            cohort_condition_key="${cohort}_${condition}"
            if [[ -n ${condition_xQTLs[$cohort_condition_key]} ]]; then
                IFS=' ' read -ra xQTLs <<< "${condition_xQTLs[$cohort_condition_key]}"
                for xQTL in "${xQTLs[@]}"; do
                    create_s3_folder "$xQTL/$cohort/$condition/original"
                    create_s3_folder "$xQTL/$cohort/$condition/analysis_ready"
                done
            fi
        done
    elif [[ -n ${condition_xQTLs[$cohort]} ]]; then
        # For cohorts without specific conditions, use cohort directly
        IFS=' ' read -ra xQTLs <<< "${condition_xQTLs[$cohort]}"
        for xQTL in "${xQTLs[@]}"; do
            create_s3_folder "$xQTL/$cohort/original"
            create_s3_folder "$xQTL/$cohort/analysis_ready"
        done
    fi
done


## Analysis_ready on CU or FTP

**GWAS(hg38)**:

- Bellenguez
    - sumstats: `/mnt/vast/hpc/csg/data_public/GWAS_sumstats/GCST90027158_buildGRCh38.tsv.gz`
    
- Jansen
    - sumstats: `/mnt/vast/hpc/csg/data_public/GWAS_sumstats/20230620_Jansen/AD_sumstats_Jansenetal_2019sept.bed.lifted.passed`
    
- Kunkle (2 stages)
    - sumstats: `/mnt/vast/hpc/csg/data_public/GWAS_sumstats/20231011_Kunkle/*passed`  
    
- Wightman (4 subsets)
    - sumstats: `/mnt/vast/hpc/csg/data_public/GWAS_sumstats/20230530_Wightman/*/*passed`


**ROSMAP**:

- Bulk_DLPFC
    - eQTL:
        - pheno:`/ftp_fgc_xqtl/projects/rna-seq/BU/ROSMAP_DLPFC/eQTL/phenotype_preprocessing`
        - cov: `/mnt/vast/hpc/csg/ftp_lisanwanglab_sync/ftp_fgc_xqtl/projects/rna-seq/BU/ROSMAP_DLPFC/eQTL/covariate_file`
    - sQTL: Frank (in progress)
    - mQTL:  
        - pheno: `/ftp_fgc_xqtl/projects/methylation/BU/preprocessed_data/ROSMAP`
        - cov: `/ftp_fgc_xqtl/projects/methylation/BU/preprocessed_data/ROSMAP`
    - haQTL: **TBD**
        - orignal: `/mnt/mfs/ctcn/datasets/rosmap/h3k9ac/dlpfcTissue/batch1/values/counts/H3K9ac*`
    - pQTL: 
        - pheno:`/mnt/vast/hpc/csg/zq2209/output6/pheno/flash`
        - cov: `/mnt/vast/hpc/csg/zq2209/output6/cov/flash_new/test.resid.Marchenko_PC.gz`
    
- Bulk_PCC
    - eQTL: 
        - pheno:`/ftp_fgc_xqtl/projects/rna-seq/BU/ROSMAP_PCC/eQTL/molecular_phenotype`
        - cov: `/ftp_fgc_xqtl/projects/rna-seq/BU/ROSMAP_PCC/covariate_file`
    - sQTL: Frank (in progress)
    
- Bulk_AC
    - eQTL: 
        - pheno:`/ftp_fgc_xqtl/projects/rna-seq/BU/ROSMAP_AC/eQTL/molecular_phenotype`
        - cov: `/ftp_fgc_xqtl/projects/rna-seq/BU/ROSMAP_AC/covariate_file`
    - sQTL: Frank (in progress)
    
- PseudoBulk
    - De Jager
        - Ast eQTL: `/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl/Ast/output/data_preprocessing`
        - Mic eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl/Mic/output/data_preprocessing`
        - Exc eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl/Exc/output/data_preprocessing`
        - Inh eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl/Inh/output/data_preprocessing`
        - OPC eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl/OPC/output/data_preprocessing`
        - Oli eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl/Oli/output/data_preprocessing`
    - Kellis
        - Ast eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Ast`
        - Mic eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Mic`
        - Exc eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Exc`
        - Inh eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Inh`
        - OPC eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/OPC`
        - Oli eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Oli`
        - Cell subtype Ast10 eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Ast.10`
        - Cell subtype Mic12 eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Mic.12`
        - Cell subtype Mic13 eQTL:`/mnt/vast/hpc/csg/wanggroup/fungen-xqtl-analysis/analysis/Wang_Columbia/ROSMAP/pseudo_bulk_eqtl_kelli/Mic.13`
- microglia 
    - eQTL: Travyse    
    
- monocyte 
    - eQTL: Travyse

**DIAN**:



    - mQTL: **TBD**

**EFIGA**:   




    - pQTL: Zining (**TBD**)    
    - metaQTL: Zining (**TBD**)    

**Knight**:
- Brain:
    - eQTL: 
      - pheno: `/ftp_fgc_xqtl/projects/knight/PHENOTYPE`
      - cov: `/ftp_fgc_xqtl/projects/knight/COVARIATE`
    - metaQTL: NAN
    - mQTL: Alexandre 
    - pQTL:  
        - pheno:`/mnt/vast/hpc/csg/zq2209/imputation/data/knight/flashier/`
        - cov: `/mnt/vast/hpc/csg/zq2209/imputation/data/knight/flashier_cov/knight_flashier.WashU_cov.MAP_Brain-xQTL_Gwas_geno_0.1_maf_0.0005.pQTL.related.filtered.extracted.pca.projected.Marchenko_PC.gz`
    - sQTL: **TBD**
    
- CSF:
    - metaQTL: NAN
    - pQTL: Zining

**MSBB**:   

- MSBB
    - eQTL:  
        - pheno: Minghui
        - cov: `/ftp_fgc_xqtl/projects/rna-seq/MSBB/eQTLs/cis_association/covariate`   
    - mQTL:  
        - pheno: `/ftp_fgc_xqtl/projects/methylation/BU/preprocessed_data/MSBB`
        - cov: `/ftp_fgc_xqtl/projects/methylation/BU/preprocessed_data/MSBB`   
    - pQTL: Minghui    
    - sQTL:     
        - leafcutter: 
            - pheno:`/ftp_fgc_xqtl/projects/rna-seq/MSBB/leaf_cutter`   
            - cov: Minghui
        - psichomics: Minghui    
             - pheno: Minghui
             - cov: Minghui

**PART**:   




    - sQTL: Alexandre

**STARNET**:   




    - eQTL: Travyse

**WHICAP**:    




    - metaQTL: Zining (**TBD**)

### association results only

**MEGENTA**:


FIXME:
Makaela Mews:
```Individual-level data needs to be shared by our Miami Collaborators or there needs to be a data use agreement. I'll contact my PI now and see if he was able to reach out to Gao.```
- AA:

    - eQTL: `/ftp_fgc_xqtl/projects/rna-seq/MAGENTA/summary_statistics`
    - sQTL: Makaela Mews (in process, would follow up in 5th week 2024)
- NHW: 
    - eQTL: `/ftp_fgc_xqtl/projects/rna-seq/MAGENTA/summary_statistics`
    - sQTL: Makaela Mews (in process, would follow up in 5th week 2024)
    

**MIGA**:


Have asked Travyse to upload the pheno and cov data
- MIGA
    - eQTL: `/ftp_fgc_xqtl/projects/rna-seq/MSSM/MiGA/TensorQTL_Summary_Statistics/10-02-2022`

## Analysis_ready on cloud:

**GWAS(hg38)**:

- Bellenguez
    - sumstats: `s3://statfungen/GWAS/Bellenguez/sumstats/`
    
- Jansen
    - sumstats: `s3://statfungen/GWAS/Jansen/sumstats/`
    
- Kunkle (2 stages)
    - sumstats: `s3://statfungen/GWAS/Kunkle/sumstats/`  
    
- Wightman (4 subsets)
    - sumstats: `s3://statfungen/GWAS/Wightman/sumstats/`


**ROSMAP**:

- Bulk_DLPFC
    - eQTL:
        - pheno:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC/analysis_ready/covariate_preprocessing/`
    - sQTL: Frank (in progress)
    - mQTL:  
        - pheno: `s3://statfungen/ftp_fgc_xqtl/mQTL/ROSMAP/DLPFC/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/mQTL/ROSMAP/DLPFC/analysis_ready/covariate_preprocessing/`
    - haQTL: **TBD**
        - orignal: `s3://statfungen/ftp_fgc_xqtl/haQTL/ROSMAP/DLPFC/original/`
    - pQTL: 
        - pheno:`s3://statfungen/ftp_fgc_xqtl/pQTL/ROSMAP/DLPFC/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/pQTL/ROSMAP/DLPFC/analysis_ready/covariate_preprocessing/`
    
- Bulk_PCC
    - eQTL: 
        - pheno:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/PCC/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/PCC/analysis_ready/covariate_preprocessing/`
    - sQTL: Frank (in progress)
    
- Bulk_AC
    - eQTL: 
        - pheno:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/AC/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/AC/analysis_ready/covariate_preprocessing/`
    - sQTL: Frank (in progress)
    
- PseudoBulk
    - De Jager
        - Ast eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC_Ast/analysis_ready/`
        - Mic eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC_Mic/analysis_ready/`
        - Exc eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC_Exc/analysis_ready/`
        - Inh eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC_Inh/analysis_ready/`
        - OPC eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC_OPC/analysis_ready/`
        - Oli eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/DLPFC_Oli/analysis_ready/`
    - Kellis
        - Ast eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Ast_Kelli/analysis_ready/`
        - Mic eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Mic_Kelli/analysis_ready/`
        - Exc eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Exc_Kelli/analysis_ready/`
        - Inh eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Inh_Kelli/analysis_ready/`
        - OPC eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/OPC_Kelli/analysis_ready/`
        - Oli eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Oli_Kelli/analysis_ready/`
        - Cell subtype Ast10 eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Ast_10_Kelli/analysis_ready/`
        - Cell subtype Mic12 eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Mic_12_Kelli/analysis_ready/`
        - Cell subtype Mic13 eQTL:`s3://statfungen/ftp_fgc_xqtl/eQTL/ROSMAP/Mic_13_Kelli/analysis_ready/`
- microglia 
    - eQTL: Travyse    
    
- monocyte 
    - eQTL: Travyse

**DIAN**:



    - mQTL: TBD

**EFIGA**:   




    - pQTL: Zining (TBD)    
    - metaQTL: Zining (TBD)    

**Knight**:
- Brain:
    - eQTL: 
      - pheno: `s3://statfungen/ftp_fgc_xqtl/eQTL/Knight/Brain/analysis_ready/phenotype_preprocessing/`
      - cov: `s3://statfungen/ftp_fgc_xqtl/eQTL/Knight/Brain/analysis_ready/covariate_preprocessing/`
    - metaQTL: **TBD**
    - mQTL: Alexandre (TBD)
    - pQTL:  
        - pheno:`s3://statfungen/ftp_fgc_xqtl/pQTL/Knight/Brain/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/pQTL/Knight/Brain/analysis_ready/covariate_preprocessing/`
    - sQTL: **TBD**
    
- CSF:
    - metaQTL: **TBD**
    - pQTL: Zining (TBD)

**MSBB**:   

- MSBB
    - eQTL:  
        - pheno: Minghui
        - cov: Minghui 
    - mQTL:  
        - pheno: `s3://statfungen/ftp_fgc_xqtl/mQTL/MSBB/analysis_ready/phenotype_preprocessing/`
        - cov: `s3://statfungen/ftp_fgc_xqtl/mQTL/MSBB/analysis_ready/covariate_preprocessing/`   
    - pQTL: Minghui    
    - sQTL:     
        - leafcutter: 
            - pheno: Minghui
            - cov: Minghui
        - psichomics: Minghui    
             - pheno: Minghui
             - cov: Minghui

**PART**:   




    - mQTL: Alexandre

**STARNET**:   




    - eQTL: Travyse

**WHICAP**:    




    - metaQTL: Zining (TBD)

### association results only

**MEGENTA**:


Makaela Mews:
- AA:

    - eQTL: `/ftp_fgc_xqtl/projects/rna-seq/MAGENTA/summary_statistics`
    - sQTL: Makaela Mews (in process, would follow up in 5th week 2024)
- NHW: 
    - eQTL: `/ftp_fgc_xqtl/projects/rna-seq/MAGENTA/summary_statistics`
    - sQTL: Makaela Mews (in process, would follow up in 5th week 2024)
    

**MIGA**:


Travyse:
- MIGA
    - eQTL: `/ftp_fgc_xqtl/projects/rna-seq/MSSM/MiGA/TensorQTL_Summary_Statistics/10-02-2022`