Skip to content

Code for the Bailey Thomas ND-CNV Machine Learning Project

Notifications You must be signed in to change notification settings

NADonnelly/nd_cnv_ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ND-GC Machine Learning Project

This project contains the code used in our ND-GC variable selection machine learning paper: preprint

The data used in the study is available via the IMAGINE ID study

The Main Scripts folder contains the R scripts that do the analysis:

  • 1-Data-Preparation.R contains code that takes the raw data from a master spreadsheet and applies a process of data cleaning: only numeric data are selected, some variables are recoded, missing data codes are harmonised, variables with > 90% of responses the same are removed, variables and participants with > 25% missing data are removed (in that order) and highly correlated (>0.8) variables are removed. The cleaned raw data are saved.

  • 2-Descriptives.R contains code that makes the demographic details table

  • 3-Data-Split.R contains code that performs the initial split of the dataset into training (80%) and test (20%) data, stratified by group, gender and age

  • 4-PLS.R performs principal components analysis and (sparse) Partial Least Squares Discriminant Analysis

  • 5-ML-All-Variables.R fits machine learning models to the full set of variables using nested cross validation

  • 6-Variable-Importance.R determines variable importance with permutation testing and selects the most important variables from the models

  • 7-ML-Selected-Variables.R re-fits ML models with the reduced variable sets

  • 8-Model-Evaluation.R fits the best performing models with the final sets of variables and final model hyperparameters to the held-out test data

  • 9-Variable-Dimensions.R uses exploratory graph analysis to investigate the underlying dimensional structure of the variables selected to be most important to ND-GC classification.

  • 10-ML-Final.R takes the minimal set of variables from the dimensions identified by the EGA analysis and fits the ML classification models using only these variables, and measures performance

  • 11-Model-Deployment.R contains code for making models for the accompanying shiny app

The cnv_ml_app folder contains the app (app.R) and the accompanying model files

About

Code for the Bailey Thomas ND-CNV Machine Learning Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages