Merge pull request #1 from IBIC/master

update my fork
IBIC · Oct 17, 2018 · e961d91 · e961d91
2 parents 9897265 + 4b4bde8
commit e961d91
Show file tree

Hide file tree

Showing 19 changed files with 193 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -17,15 +17,13 @@ The `neuropointillist` package has functions to combine multiple
 sets of neuroimaging data, run arbitrary R code (a "model") on each
 voxel in parallel, output results, and reassemble the data. Included
 are three standalone programs. `npoint` and `npointrun` use the
-`neuropointillist` package, and `npointmerge` uses FSL commands to
-reassemble results.
+`neuropointillist` package, and `npointmerge` uses reassembles results.
+
 
 There are some examples included in this package that use data that we
 cannot release. These are useful only for looking at modeling code or
 for inspiration. However, we have simulated two timepoints of fMRI
 data and have a complete example and a worked vignette.
 
-
-
-
+Please direct all comments and complaints to Tara Madhyastha (madhyt@uw.edu).
 
diff --git a/docs/usage.md b/docs/usage.md
@@ -1,7 +1,7 @@
 # npoint
 ## Usage
 `npoint --set1 listoffiles1.txt --setlabels1 file1.csv --set2 listoffiles2.txt  --setlabels2 file2.csv`
-`--covariates covariatefile.csv  --mask mask.nii.gz --model code.R  [ -p N | --sgeN N] --output output`
+`--covariates covariatefile.csv  --mask mask.nii.gz --model code.R  [ -p N | --sgeN N | --slurmN ] --output output`
 `--debugfile outputfile `
 
 If a file called `readargs.R` exists that sets a vector called `cmdargs`, this file will be read to obtain the arguments for `npoint` instead of taking them from the command line. This is intended to make it a little easier to remember the long lists of arguments. 
@@ -30,7 +30,9 @@ The setlabel files are csv files that specify variables that correspond to the f
 
 `-p x` The `-p` argument specifies that multicore parallelism will be implemented using `x` processors. An warning is given if the number of processors specified exceeds number of cores. **See notes below on running a model using multicore parallelism.**
 
-`--sgeN N` Alternatively, the `--sge` argument specifies to read the data and divide it into `N` jobs that can be submitted to  the SGE (using a script that is generated called, suggestively, `runme`) or divided among machines by hand and run using GNU make. If SGE parallelism is used, we assume that the directory that the program is called from is read/writeable from all cluster nodes. **See notes below on running a model using SGE parallelism.**
+`--sgeN N` The `--sgeN` argument specifies to read the data and divide it into `N` jobs that can be submitted to  the SGE (using a script that is generated called, suggestively, `runme.sge`) or divided among machines by hand and run using GNU make. If SGE parallelism is used, we assume that the directory that the program is called from is read/writeable from all cluster nodes. **See notes below on running a model using SGE parallelism.**
+
+`--slurmN N` The `--slurmN` argument specifies to read the data and divide it into `N` jobs that can be submitted to a Slurm scheduler (using a script that is generated called, suggestively, `runme.slurm`) or divided among machines by hand and run using GNU make. If Slurm is used, the template file **slurmjob.bash** must be edited!!! Unlike SGE, Slurm works best if you give good estimates of the time your program will take to run, the amount of memory it needs, and if you select the number of jobs to make each one not very small. The file that is written is currently a template based on Harvard's cluster configuration. Like with SGE, we assume that the directory that the program is called from is read/writeable from all cluster nodes. At the risk of oversharing, Slurm's name derives from Simple Linux Utility for Resource Management, but I find it rather funny to sound it out in my head as I have been adding this feature. **See notes below on running a model using the Slurm Workload Manager.**
 
 `--output` Specify an output prefix that is prepended to output files. This is useful for organizing output for SGE runs; you can specify something like `--output model-stressXtime/mod1` to organize all the output files and execution scripts into a subdirectory. In addition, the model that you used and the calling arguments will be copied with this prefix so that you can remember what you ran. This is modeled off of how FSL FEAT copies the .fsf file into the FEAT directory (so simple and so handy)! (**required**)
 
@@ -60,12 +62,46 @@ different machines.
 
 The `readargs.R` file in `example.rawfmri` is configured so that it will create a directory called `sgetest` with the assembled design matrix file (in rds format), the split up fMRI data (also in rds format), and files to run the job. These files are:
 
-`Makefile` This file contains the rules for running each subjob and assembling the results. Note that the executables `npointrun` and `npointmerge` must be in your path environment. You can run your job by typing `make -j <ncores>` at the command line in the `sgetest` directory, or by calling the script `runme.local`. You can also type `make mostlyclean` to remove all the intermediate files once your job has completed and you have reassembled your output (by any method). If instead you type `make clean`, you can remove all the rds files also. 
+`Makefile` This file contains the rules for running each subjob and assembling the results. Note that the executables `npointrun` and `npointmerge` must be in your path environment. You can run your job by typing `make -j <ncores>` at the command line in the `sgetest` directory, or by calling the script `runme.local`, which will use 4 cores by default. You can also type `make mostlyclean` to remove all the intermediate files once your job has completed and you have reassembled your output (by any method). If instead you type `make clean`, you can remove all the rds files also. 
 
 `sgejob.bash` This is the job submission script for processing the data using SGE. Note that `npointrun` needs to be in your path. The commands in the job submission script are bash commands.
 
 `runme.sge` This script will submit the job to the SGE and call Make to merge the resulting files when the job has completed. It is an SGE/Make hybrid.
 
+## Running a model using the Slurm Workload Manager
+
+
+`Makefile` This file contains the rules for running each subjob and
+assembling the results. Note that the executables `npointrun` and
+`npointmerge` must be in your path environment. You can run your job
+by typing `make -j <ncores>` at the command line, or by calling the script `runme.local`, which will use 4 cores by default. You can also type
+`make mostlyclean` to remove all the intermediate files once your job
+has completed and you have reassembled your output (by any method). If
+instead you type `make clean`, you can remove all the rds files also.
+
+`slurmjob.bash` This is the job submission script for submitting the
+job to the Slurm Workload Manager. **Note that you must edit this file
+before submitting the job.** The defaults that are written here
+probably won't work for you; they are modeled after Harvard's NCF
+cluster and should be thought of as placeholders. The first thing to
+change is the partition, which is set to `ncf_holy` by default. You
+will need to change this to a partition that you have access to on
+your Slurm system. Next, you need to give a good estimate for the
+amount of memory, in MB, your job will use (`--mem`). You can get a
+reasonable estimate by running `make` on your local machine to run one
+job sequentially. The `time` command will give you statistics on how
+much memory each job uses. Assuming your jobs are approximately the
+same size, you should be able to double this and use that figure as an
+estimate. You also need to provide an estimate for the time you expect
+each job to take; it will be terminated if the job does not complete
+within that time.
+
+`runme.slurm` This script will submit the job to the Slurm Workload
+manager. The job is an array job that includes as many tasks as you
+specified. You will get an email when your job has completed. At that
+point, you can then come back to this directory and type `make` to
+merge the output files.
+
 ## Running a model using multicore parallelism
 
 The `readargs.R` file in the `example.flournoy` directory is configured so that it will use 24 cores to compare two models. You should change this number to be lower if your  machine does not have 24 cores. Note that data are not included for `example.flournoy`.

diff --git a/neuropointillist/DESCRIPTION b/neuropointillist/DESCRIPTION
@@ -10,4 +10,4 @@ Depends:
 License: GPL(>=2), doParallel,argparse,Rniftilib
 Encoding: UTF-8
 LazyData: true
-RoxygenNote: 5.0.1
+RoxygenNote: 6.1.0.9000
diff --git a/neuropointillist/NAMESPACE b/neuropointillist/NAMESPACE
@@ -8,3 +8,4 @@ export(npointWriteCallingInfo)
 export(npointWriteMakefile)
 export(npointWriteOutputFiles)
 export(npointWriteSGEsubmitscript)
+export(npointWriteSlurmsubmitscript)
diff --git a/neuropointillist/R/npointWriteMakefile.R b/neuropointillist/R/npointWriteMakefile.R
@@ -23,6 +23,7 @@ npointWriteMakefile <- function(prefix, resultnames, modelfile, designmat, makef
     fileConn <- file(localscript)
     writeLines(c("make -j 4\n"), fileConn)
     Sys.chmod(localscript, "775")
+    close(fileConn)
 
     fileConn <- file(makefile)
     alltarget <- "all: $(outputs) "
@@ -60,4 +61,6 @@ npointWriteMakefile <- function(prefix, resultnames, modelfile, designmat, makef
                  paste(mostlyclean,collapse=""),
                  clean),
                fileConn)
+    close(fileConn)
+
 }
diff --git a/neuropointillist/R/npointWriteSlurmsubmitscript.R b/neuropointillist/R/npointWriteSlurmsubmitscript.R
@@ -0,0 +1,50 @@
+#' Write an output Slurm submit script
+#'
+#' Generate an Slurm submit script for the given workflow
+#' @param prefix Prefix for output, to be prepended to outputs
+#' @param resultnames List of names for the expected outputs
+#' @param modelfile Name of the model file that contains the processVoxel command
+#' @param designmat Design matrix
+#' @param masterscript Name of the master submit script
+#' @param jobscript Name of the job submission script
+#' @param njobs Number of jobs to submit
+#' @export
+#' @examples
+#' 
+#' npointWriteSlurmsubmitscript()
+npointWriteSlurmsubmitscript <- function(prefix, resultnames, modelfile, designmat,masterscript,jobscript,njobs) {
+    dir <- dirname(prefix)
+    if (!dir.exists(dir)) {
+        dir.create(dir, recursive=TRUE)
+    }
+    # the name of one of the outputfiles that is created
+    outputfile <- paste(resultnames[1], ".nii.gz",sep="")
+    fileConnMaster <- file(masterscript)
+    fileConnJob <- file(jobscript)
+    writeLines(c("#!/bin/bash",
+                 "# This script will submit jobs to Slurm. You can also run this job locally by typing make.",
+                 paste("sbatch --array=1-", njobs, " ",  basename(jobscript), sep=""),
+                 "echo When you get mail from slurm that your job has completed, cd to this directory and type:",
+                 "echo make"),
+               fileConnMaster)
+
+
+    writeLines(c("#!/bin/bash",
+                 "\n",
+                 "#Slurm submission options",
+                 "#LOOK AT THESE AND EDIT TO OVERRIDE FOR YOUR JOB",         
+                 "#SBATCH -p ncf_holy",
+                 "#SBATCH --mem 4000",
+                 "#SBATCH --time 0-6:00",
+                 "#SBATCH --mail-type=END",
+                 "#SBATCH -o npoint_%A_%a.out",
+                "#SBATCH -o npoint_%A_%a.err",                                                   "export OMP_NUM_THREADS=1",
+                 paste("MODEL=",modelfile,sep=""),
+                 paste("DESIGNMAT=",designmat,sep=""),
+                 "num=$(printf \"%04d\" $SLURM_ARRAY_TASK_ID)",
+                 paste("npointrun -m ", basename(prefix), "${num}.nii.gz --model ${MODEL} -d ${DESIGNMAT}",sep=""),
+                 "\n"),
+               fileConnJob)
+    Sys.chmod(masterscript, "775")
+}
+
diff --git a/neuropointillist/man/npointCheckArguments.Rd b/neuropointillist/man/npointCheckArguments.Rd
diff --git a/neuropointillist/man/npointCheckSetLabels.Rd b/neuropointillist/man/npointCheckSetLabels.Rd
diff --git a/neuropointillist/man/npointMergeDesignmatWithCovariates.Rd b/neuropointillist/man/npointMergeDesignmatWithCovariates.Rd
diff --git a/neuropointillist/man/npointReadDataSets.Rd b/neuropointillist/man/npointReadDataSets.Rd
diff --git a/neuropointillist/man/npointReadSetFiles.Rd b/neuropointillist/man/npointReadSetFiles.Rd
diff --git a/neuropointillist/man/npointSplitDataSize.Rd b/neuropointillist/man/npointSplitDataSize.Rd
diff --git a/neuropointillist/man/npointWarnIfNiiFileExists.Rd b/neuropointillist/man/npointWarnIfNiiFileExists.Rd
diff --git a/neuropointillist/man/npointWriteFile.Rd b/neuropointillist/man/npointWriteFile.Rd
diff --git a/neuropointillist/man/npointWriteMakefile.Rd b/neuropointillist/man/npointWriteMakefile.Rd
diff --git a/neuropointillist/man/npointWriteOutputFiles.Rd b/neuropointillist/man/npointWriteOutputFiles.Rd
diff --git a/neuropointillist/man/npointWriteSGEsubmitscript.Rd b/neuropointillist/man/npointWriteSGEsubmitscript.Rd
diff --git a/neuropointillist/man/npointWriteSlurmsubmitscript.Rd b/neuropointillist/man/npointWriteSlurmsubmitscript.Rd