-
Notifications
You must be signed in to change notification settings - Fork 1
dclarkboucher/mediation_DNAm
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This file contains code for implementing our analysis from the manuscript "Methods for Mediation Analysis with High-Dimensional DNA Methylation Data: Possible Choices and Comparison" R version should be >= 2.1.0. Details on the packages required are located in the relevant scripts. When running R code, make sure that your working directory is the directory where this ReadMe is stored. In RStudio, you can do this by loading the "mediation_DNAm.Rproj" R project. If you have questions about the code in this repository or issues implementing the analysis, contact me at dclarkboucher@fas.harvard.edu. SIMULATION STUDY: The first step in the simulation study is to generate the necessary data, which can be done with the R script "simulation_scrip/generate_data.R". When using this script, set the "ndat" parameter to be 100 to replicate the entire study (extremely computationally costly), or 1 to produce results for just a single simulated dataset in each setting (24 datasets in total). The second step is to implement the R scripts "simulation_scripts/pathway_lasso.R", "simulation_scripts/hima_hdma_medfix_pcma_hilma.R", "simulation_scripts/one-at-a-time.R", and "simulation_scripts/bslmm.R". Using these scripts requires installation of additional R CRAN and GitHub packages. Moreover, because the methods vary in length and there are many simulated datasets, we strongly recommend using parallel computing on a remote cluster, for which our scripts can be easily adapted. The third step is to run "simulation_scripts/true_positive_rate_mse.R", which calculates the true positive rates for detecting active mediators and the MSE for estimating mediation contributions, and "simulation_scripts/percent_relative_bias.R", which calculates the percent relative bias in estimating the total indirect effect. These datasets were directly used for making manuscript tables 3-6 and supplementary tables 1-4. OBSERVED DNAm DATA ANALYSIS: Data used in this analysis can be obtained through the MESA Data Coordinating Center (https://www.mesanhlbi.org/). Since we cannot make MESA's data publicly available, we instead use a simulated dataset made to resemble the observed methylation data, which can be generated by running the file "dnam_scripts/generate_fake_dnam.R" to create a toy DNAm dataset. The second step is to run "dnam_scripts/fit_single_mediator_models.R" to run linear mixed models for screening the CpG sites down to the subset of 2,000 that were used in the analysis. Run "dnam_scripts/process_single_med_results.R" to process the output files. The third step is to run "dnam_scripts/regress_out_random_effects.R" to regress the random effect covariates out of the mediators. This is because none of the high-dimensional mediation methods can directly handle random effects as covariates, whereas a few of them can handle fixed effects. The fourth step is to run our files for implementing the methods. This can be done with the master script "dnam_scripts/implement_methods_master.R" which will run the many needed subscripts located in the folder "dnam_scripts/implement_methods"; or, it can be done by implementing those subscripts one-at-a-time, which may be more practical since running them all at once would be quite slow. Once all the methods have been run, the fifth and final step is to run the scripts "dnam_scripts/identify_noteworthy_cpgs.R", "dnam_scripts/estimate_mediation_effect.R", and "dnam_scripts/read_hdmm_spcma.R", which produce, respectively, manuscript table 1 and supplementary file 1; manuscript table 2; and the results necessary for interpreting SPCMA and HDMM.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published