Scripts for producing the figures of the analysis of the phenotypic and molecular data of the TRANSLATE-NAMSE project. The individual R-scripts (located in the folder workflow/scripts/) were integrated in an automated snakemake pipeline to facilitate regeneration of analyses during data cleaning. Individual R-scripts are named according to the figures for which they yield the data.
You can either run the whole pipeline, or you can run scripts individually. If you are looking for our tool YieldPred please visit the following website: https://translate-namse.de/ . The scripts for conducting the PEDIA analyses can be found in the following separate repository: https://github.com/igsb/PEDIA-TNAMSE
Clone the repository, then install the tnamse conda environment that can be found in workflow/envs/tnamse.yaml . Note: If you are using a MacOS / Apple Silicon system, see the comment in the code below.
git clone https://github.com/Ax-Sch/TNAMSE_geno_pheno.git
cd TNAMSE_geno_pheno
conda create -n tnamse --yes
conda activate tnamse
# if you are using an MacOS on an Apple Silicon system, uncomment the following line:
#conda config --env --set subdir osx-64
conda env update --file workflow/envs/tnamse.yaml
The R-packages webr / moonBook have to be installed manually, as it is not available via conda. Install them in R via:
- Run R:
conda activate tnamse
R
- Then run the following code within R:
if(!require(devtools)) install.packages("devtools", repos='http://cran.us.r-project.org')
devtools::install_github("cardiomoon/moonBook", upgrade="never")
devtools::install_github("cardiomoon/webr", upgrade="never")
quit(save="no")
To run the code "figure1_part2.R" / snakemake rule "figure1_part2" as well as snakemake rule "supporting_ccds_length" / code "supp_CCDS_process_files.R" and "supp_CCDS_plot.R", files from OMIM are required. Therefore, you need to aquire a licence for data download from OMIM (see https://omim.org/downloads ) if you would like to run this code. If you have obtained the license, take the third word of the heading (heading starts with "Download / ") of OMIM's download page as password to access the following link https://uni-bonn.sciebo.de/s/EZcIsUYItdBSiJw . If this is unclear / does not work, please get in touch.
You should then save the following files to "resources/":
- genemap2_15_07_2021.txt
- genes_to_phenotype.txt
- mimTitles.txt
Then modify the file config.yaml to point to the locations where the TRANSLATE NAMSE data are stored.
Now you should be able to run snakemake by executing:
conda activate tnamse
snakemake -c1 --use-conda # you can increase the 1 behind -c to the number of threads you want to use
The naming within the workflow is related to the accompanying manuscript (figures/suppl figures). Here you can see the graph of the rules and dependencies:
The output will be placed into the folder "results/".
If you would like to play around with the data you can do the following:
Go to the directory "workflow/scripts/". Here you can run individual scripts.
However, in every skript you have to skipt the line "args = commandArgs(trailingOnly=TRUE)".
Most scripts require the output of the two scripts "parse_table.Rmd" and "supp_solved_cases.Rmd" - it is therefore reasonable to run these first.
You can find additional information on the files in the resources folder in this additional README file.
If you have issues - just raise a github issue. If you have questions just get in touch.
This code is accompanies the manuscript "Next-generation phenotyping integrated in a national framework for patients with ultra-rare disorders improves genetic diagnostics and yields new molecular findings" which should be referenced when using this data.