ABCD-brain-network-SCCA

This repository contains code and synthetic data for the research paper titled "Multivariate patterns between brain network properties, polygenic scores, phenotypes, and environment in preadolescents" by Seo et al. (2023). The paper is available as a medRxiv preprint at: https://www.medrxiv.org/content/10.1101/2023.07.24.23293075v1

SCCA Code

This section describes the codes for Sparse Canonical Correlation Analysis (Witten et al., 2009) used in this paper and and its visualization. The code is organized into different scripts:

scca_pipeline_example.sh: A pipeline for SCCA computation and visualization.

Codes for computation:

regress_out_train_test_split.py: Code for regressing out potential confounding effects of covariates before SCCA.
SCCA_hyper_param.py: Code for hyper-parameter tuning in SCCA.
SCCA_permutation_test.py: Code for performing permutation tests.
SCCA_bootstrap.py: Code for performing SCCA with bootstrap samples.
SCCA_bootstrap_result_summary.py: Code for summarizing bootstrap results.

Codes for visualization:

hyper_param_result_visualizer.py: Visualization code for hyper-parameter tuning results.
non_BNM_vis.py: Visualization code for SCCA result (loading) of polygenic scores, phenotypic outcomes, and environmental factors.
brain_visualize.R: Visualization code for SCCA result of nodal brain network measures.
bootstrap_vis_global_BNM.py: Visualization code for SCCA result of global brain network measures.

Data

DTI_node_coordinate.csv, ENV_Pheno_separate.csv, ROI_list.csv : Data required to visualize SCCA results
cov_synthetic.csv : Covariates data
pgs_synthetic.csv, env_synthetic.csv, pheno_synthetic.csv, BNM_1-4_synthetic.csv : Main data

Since the NDA prohibits the distribution of ABCD data, we are sharing synthetic data in place of real data. The original data is EXCLUSIVELY accessible through the NDA (https://nda.nih.gov/abcd/). This synthetic data was produced using conditional GAN for tabular data, or CTGAN, which learns the distribution of real data and generates synthetic data that emulates the probabilistic distribution (Xu et al., 2019). The synthetic data showed an overall quality score of 91.49%, indicating an overall similarity of column and column pair shapes between real and synthetic data is 0.9149. However, it is important to note that this synthetic data does not guarantee identical analysis results. It may lead to different statistical significance and effect sizes. Hyperparameters were optimized using Optuna (Akiba et al., 2019).

Reference

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-2631).

Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534. https://doi.org/10.1093/biostatistics/kxp008

Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional gan. Advances in Neural Information Processing Systems, 32.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
BNM_1_synthetic.csv		BNM_1_synthetic.csv
BNM_2_synthetic.csv		BNM_2_synthetic.csv
BNM_3_synthetic.csv		BNM_3_synthetic.csv
BNM_4_synthetic.csv		BNM_4_synthetic.csv
DTI_node_coordinate.csv		DTI_node_coordinate.csv
ENV_Pheno_separate.csv		ENV_Pheno_separate.csv
README.md		README.md
ROI_list.csv		ROI_list.csv
SCCA_bootstrap.py		SCCA_bootstrap.py
SCCA_bootstrap_result_summary.py		SCCA_bootstrap_result_summary.py
SCCA_hyper_param.py		SCCA_hyper_param.py
SCCA_permutation_test.py		SCCA_permutation_test.py
bootstrap_vis_global_BNM.py		bootstrap_vis_global_BNM.py
brain_visualize.R		brain_visualize.R
cov_synthetic.csv		cov_synthetic.csv
env_synthetic.csv		env_synthetic.csv
hyper_param_result_visualizer.py		hyper_param_result_visualizer.py
non_BNM_vis.py		non_BNM_vis.py
pgs_synthetic.csv		pgs_synthetic.csv
pheno_synthetic.csv		pheno_synthetic.csv
regress_out_train_test_split.py		regress_out_train_test_split.py
scca_pipeline_example.sh		scca_pipeline_example.sh

Transconnectome/ABCD-brain-network-SCCA

Folders and files

Latest commit

History

Repository files navigation

ABCD-brain-network-SCCA

SCCA Code

Data

Reference

About

Resources

Stars

Watchers

Forks

Languages