Skip to content

Example of how I like to automate R programs for Clinical Trial analysis pipelines. Eample of how I currently "localise" projects stored in a network location to a local drive + sync output back to network.

Notifications You must be signed in to change notification settings

CStats/Automation-and-Localisation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Automation and Localisation

This is an outline of my process for streamlining report generation and current solution for localising projects to run off a network location.

Automation

Problem: reports that need regularly updating i.e. monitoring of events and data issues

  1. All analyses are performed within RStudio Project folders within "pgm".
  • UpdateData.bat Batch file to copy source data files to project "data" folder
xcopy "T:\DREAM3R\SOURCE\*.sas7bdat"  ".\SOURCE\" /y /S /d
xcopy "T:\DREAM3R\CLEAN\*.sas7bdat"   ".\CLEAN\" /y /S /d
copy "T:\DREAM3R\_Documentation\ALS.xls"  "ALS.xls" /d
  1. program which loads source data + creates most derived datasets + some objects for easy reuse within report files (e.g. graphs, complex tables)

SetupData.R

library(tidyverse)
library(readxl)
library(haven)
library(lubridate)
library(dplyr)
library(tidyr)
library(Hmisc)


# Load raw data  ----------------------------------------------------------

rm(list = ls())
data = '../../data/source'

source('load_queries.r')

files = list.files(data, pattern=".sas", all.files=FALSE, full.names=FALSE)
files

metadata_fields = openxlsx::read.xlsx("../../data/ALS.xlsx",sheet="Fields")
metadata_forms = openxlsx::read.xlsx("../../data/ALS.xlsx",sheet="Forms")

excluded_dates = c("DRNOW","TANOW","TLNEWNOW","TRANSDAT","LBCOLDATTIM","LBPBMCDATFRZ","MinCreated","MaxUpdated",
                   "SaveTS","RecordDate",
                   "HUDRNOW","HUDRNOW_INT",
                   "DRNOW_INT","TANOW_INT",
                   "IECSPICVDAT","IECSPICVDAT_INT",
                   "RIREVDAT","RIREVDAT_INT",
                   "RISUBDAT","RISUBDAT_INT","BRTSSDAT","BRTSRDAT","LBSSDAT","RISCANDAT","RIORDAT","RIREVAVDAT")

f = read_sas(paste(data,files[1],sep="/"))

temp=f %>% select(Subject,which(sapply(.,is.POSIXct))) %>%
  gather(variable,date,-Subject) %>%
  mutate(date = as.POSIXct.Date(date)) %>%
  group_by(Subject) %>%
  summarise(maxdate = max(date, na.rm = TRUE),.groups="keep") #%>%
  #mutate(maxdate=as.POSIXct.Date(maxdate))

datasetTimeStamp = file.info(paste(data,files[1],sep="/"))$mtime

Followed by my processing and data derivation steps. Finally, save the datasets with date / git version stamp.

save.image("../../Data/derived/DREAM3R_datasets.RData")
save.image(paste0("../../Data/derived/DREAM3R_datasets_",as.character(today()),".RData"))
  1. Kint.R program to run SetupData.R and render all report markdown documents / reports to an output "out" directory.
# Run data computation programs
source('_setupData.r')


# Generate reports
rmarkdown::render("DREAM3R_dashboard.rmd",
                  output_format = "all",
                  output_dir="../../out/"
)
  1. Distribute_output.bat batch file (saved in the output folder) to copy reports to shared network locations + cloud storage for easy access to latest results when in meetings.
copy "DREAM3R_dashboard.html" "X:\Oncology\ALTG\DREAM3R\11 Statistics\11.04 Report\11.04.01 Interim Statistical Reports\DREAM3R_Monitoring.html" 
copy "DREAM3R_DataIssues.html" "X:\Oncology\ALTG\DREAM3R\11 Statistics\11.04 Report\11.04.01 Interim Statistical Reports\DREAM3R_DataIssues.html" 
copy "DREAM3R_RECIST.html" "X:\Oncology\ALTG\DREAM3R\11 Statistics\11.04 Report\11.04.01 Interim Statistical Reports\DREAM3R_RECIST.html" 

xcopy "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\out" "C:\Users\%USERNAME%\OneDrive - The University of Sydney (Staff)\TOGA\DREAM3R\" /y /S 
  1. Batch file to enable scheduling of programs - note this needs to be saved on a local drive (C/D) as "task scheduler" often can't files on network.
  • Update.bat Batch file to run all the above steps.
cd "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\Data" 
call _updateData.bat

cd "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\pgm"
call update.bat

cd "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\out"
call _copyToSMP.bat
  1. Configure "Task Scheduler" to run the report process on a regular schedule (I usually update everything daily just after the data has been refreshed)

Localisation

Problem: opening data and running programs stored on a network location can be slow. RStudio projects perform better when running from a local directory.

We were spoiled with fast a fast local network for years but in the era of cloud storage our network drives have been moved offsite.

Solution: sync data and supporting files to local drive via custom batch file, similar batch file to return updated data / output to the network location

Project skeleton

  • Batch file to create local copy of project folder structure
if not exist "C:\TRIAL_DATA\DREAM3R" mkdir "C:\TRIAL_DATA\DREAM3R" 
if not exist "C:\TRIAL_DATA\DREAM3R\data" mkdir "C:\TRIAL_DATA\DREAM3R\data" 
if not exist "C:\TRIAL_DATA\DREAM3R\out" mkdir "C:\TRIAL_DATA\DREAM3R\out" 
if not exist "C:\TRIAL_DATA\DREAM3R\pgm" mkdir "C:\TRIAL_DATA\DREAM3R\pgm" 
if not exist "C:\TRIAL_DATA\DREAM3R\pgm\DREAM3R_Monitoring" mkdir "C:\TRIAL_DATA\DREAM3R\pgm\DREAM3R_Monitoring" 
if not exist "C:\TRIAL_DATA\DREAM3R\pgm\DREAM3R_Patients" mkdir "C:\TRIAL_DATA\DREAM3R\pgm\DREAM3R_Patients" 
if not exist "C:\TRIAL_DATA\DREAM3R\pgm\DREAM3R_StudyReport" mkdir "C:\TRIAL_DATA\DREAM3R\pgm\DREAM3R_StudyReport" 

xcopy "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\data" "C:\TRIAL_DATA\DREAM3R\data"  /s /d /y
copy "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\DREAM3R_Y_drive.lnk" "C:\TRIAL_DATA\DREAM3R\DREAM3R_Y_drive.lnk" /y
copy "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\DREAM3R_createLocal.bat" "C:\TRIAL_DATA\EnRICH\Data\DREAM3R_createLocal.bat" /y
copy "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\DREAM3R_copyDataOutput_toYdrive.bat" "C:\TRIAL_DATA\EnRICH\Data\DREAM3R_copyDataOutput_toYdrive.bat" /y
  • TIP create links to network location (in local folder) + link to local location (in network) to make it easy to move around.

Programs

  • All programs are version controlled and pushed to enterprise git hub server.
  • Use GitHub Desktop to simplify syncing of repository to local director.
    -- It's not easy (?safe) to update the repo within a batch file but I'm in the habit of checking for updates when I start work + when I've finished.

Data + files

  1. Local version of batch file to copy source data files to project "data" folder (not needed if Y drive one used relative paths)

UpdateData.bat

xcopy "T:\DREAM3R\SOURCE\*.sas7bdat"  ".\SOURCE\" /y /S /d
xcopy "T:\DREAM3R\CLEAN\*.sas7bdat"   ".\CLEAN\" /y /S /d
copy "T:\DREAM3R\_Documentation\ALS.xls"  "ALS.xls" /d
  1. batch file to copy all derived data + output files to network folder (only if they've been more recently updated than the network copies).

copyDataOutput_toYdrive.bat

xcopy  "C:\TRIAL_DATA\DREAM3R\out" "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\out" /s /d /y
xcopy  "C:\TRIAL_DATA\DREAM3R\data" "Y:\Statistics\1 STUDIES\Oncology\ALTG\DREAM3R\8.5 Analysis\data" /s /d /y
xcopy  "C:\TRIAL_DATA\SHERLOCK\data" "Y:\Statistics\1 STUDIES\Oncology\ALTG\SHERLOCK\8.5 Analysis\data" /s /d /y
  • sometimes reports also generate files which are saved in the program folder so I copy them manually. e.g.
copy "C:\TRIAL_DATA\SHERLOCK\pgm\SHERLOCK_Monitoring\SHERLOCK_interim.png" "Y:\Statistics\1 STUDIES\Oncology\ALTG\SHERLOCK\8.5 Analysis\pgm\SHERLOCK_Monitoring\SHERLOCK_interim.png" 
copy "C:\TRIAL_DATA\SHERLOCK\pgm\SHERLOCK_Monitoring\Recruitment.png" "Y:\Statistics\1 STUDIES\Oncology\ALTG\SHERLOCK\8.5 Analysis\pgm\SHERLOCK_Monitoring\Recruitment.png" 

-- could copy pgm folder like the others but could confuse git if not synced

  1. Local version of the program to run all components UpdateLocal.bat
cd "C:\TRIAL_DATA\DREAM3R\data" 
call _updateData.bat

cd "C:\TRIAL_DATA\DREAM3R\pgm"
call update.bat

cd "C:\TRIAL_DATA\DREAM3R"
call copyDataOutput_toYdrive.bat

Improvements

  • Ideally all programs use relative paths - you can have trouble getting relative path calls to work when triggered via the scheduled tasks
  • All these tasks could be run by python/R negating the bulk of the files - it is nice to have batch files to easily run if needed
  • Use a generator / template to streamline creation of files / links etc.

About

Example of how I like to automate R programs for Clinical Trial analysis pipelines. Eample of how I currently "localise" projects stored in a network location to a local drive + sync output back to network.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published