ND-GC Machine Learning Project

This project contains the code used in our ND-GC variable selection machine learning paper: preprint

The data used in the study is available via the IMAGINE ID study

The Main Scripts folder contains the R scripts that do the analysis:

1-Data-Preparation.R contains code that takes the raw data from a master spreadsheet and applies a process of data cleaning: only numeric data are selected, some variables are recoded, missing data codes are harmonised, variables with > 90% of responses the same are removed, variables and participants with > 25% missing data are removed (in that order) and highly correlated (>0.8) variables are removed. The cleaned raw data are saved.
2-Descriptives.R contains code that makes the demographic details table
3-Data-Split.R contains code that performs the initial split of the dataset into training (80%) and test (20%) data, stratified by group, gender and age
4-PLS.R performs principal components analysis and (sparse) Partial Least Squares Discriminant Analysis
5-ML-All-Variables.R fits machine learning models to the full set of variables using nested cross validation
6-Variable-Importance.R determines variable importance with permutation testing and selects the most important variables from the models
7-ML-Selected-Variables.R re-fits ML models with the reduced variable sets
8-Model-Evaluation.R fits the best performing models with the final sets of variables and final model hyperparameters to the held-out test data
9-Variable-Dimensions.R uses exploratory graph analysis to investigate the underlying dimensional structure of the variables selected to be most important to ND-GC classification.
10-ML-Final.R takes the minimal set of variables from the dimensions identified by the EGA analysis and fits the ML classification models using only these variables, and measures performance
11-Model-Deployment.R contains code for making models for the accompanying shiny app

The cnv_ml_app folder contains the app (app.R) and the accompanying model files

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Main Scripts		Main Scripts
cnv_ml_app		cnv_ml_app
.gitignore		.gitignore
CNV_ML.Rproj		CNV_ML.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ND-GC Machine Learning Project

About

Releases

Packages

Languages

NADonnelly/nd_cnv_ml

Folders and files

Latest commit

History

Repository files navigation

ND-GC Machine Learning Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages