# Preprocessing

## Overview

### Usage

To use the preprocessing command line tool, navigate to the preprocessing directory at /mnt/nfs/lss/lss_kahwang_hpc/scripts/preprocessing and run the command below in your terminal to display the documentation for the tool.

```
$ python3 main.py --help
```
&nbsp;
```
usage: [DATASET_DIR] [SUBCOMMANDS [OPTIONS]]

Run pre-processing on whole dataset or selected subjects

Required Arguments:
  dataset_dir           Base directory of dataset.
  -h, --help            show this help message and exit

Subcommands:
  {heudiconv,mriqc,fmriprep,3dDeconvolve,regressors,3dmema,FD_stats}
    heudiconv           Convert raw data files to BIDS format. Conversion script filepath is required.
    mriqc               Run mriqc on dataset to analyze quality of data.
    fmriprep            Preprocess data with fmriprep pipeline.
    3dDeconvolve        Parse regressor files, censor motion, create stimfiles, and run 3dDeconvolve.
    regressors          Parse regressor files to extract columns and censor motion.
    3dmema              Runs 3dmema.
    FD_stats            Calculates FD statistics for dataset. Outputs csv with % of points over FD threshold anbd FD mean for each run and subject.
```

As you can see, the program takes in the dataset directory as an input, followed by a subcommand and it's options. We can look at each subcommands options by running python3 main.py dataset_dir/ {subcommand} --help  
Let's try that below with fmriprep and see what happens.

```
$ python3 main.py dataset_dir fmriprep --help
```

&nbsp;
```
usage: [OPTIONS]

optional arguments:
  -h, --help            show this help message and exit
  --fmriprep_opt FMRIPREP_OPT
                        Options to add to fmriprep. Write between '' and replace - with * as shown: '**[OPTION1] arg1 ** [OPTION2] ...'

Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset by default or on number of subjects given via the --numsub flag

Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.

General Optional Arguments:
  --rerun_mem           Rerun subjects that failed due to memory constraints
  --slots SLOTS         Set number of slots/threads per subject. Default is 4.

Argon HPC Optional Arguments:
  --email               Receive email notifications from HPC
  --no_qsub             Does not submit generated bash scripts.
  --hold_jid HOLD_JID   Jobs will be placed on hold until specified job completes. [JOB_ID]
  --no_resubmit         Enable to not resubmit tasks after migration. Default is to resubmit.
  --mem MEM             Set memory for HPC
  -q QUEUE, --queue QUEUE
                        Set queue for HPC
  --stack STACK STACK   Queue jobs in dependent stacks. When all jobs complete, next will start. Two required integer arguments [# of stacks][# of jobs per stack]. Use 'split' in second argument to split remaining jobs
                        evenly amongst number of stacks.
```

Sweetness. We can see what options are available for the fmriprep subcommand. Looks like there are a bunch of optional arguments. If you ever forget how to run a command or what options are available use the --help flag.

### How it works

The preprocessing python program automatically creates a bash script and then runs/submits (differs based on thalamege or argon host) the script. The program uses base bash scripts (thalamege: preprocessing/thalamege, argon: preprocessing/argon) and then fills in data based on user inputs. The new bash script will be written to either preprocessing/thalemege/dataset_dir_name or preprocessing/argon/jobs/dataset_dir_name.

The program records output info in the logs/ directory of each command directory (ie fmriprep/logs/). If you are running into issues or errors, you should check out the log files.  
Additionally, the preprocessing pipeline automatically keeps track of completed subjects in the completed_subjects.txt file and failed subjects in the failed_subjects.txt. This is useful for datasets with subjects comtinually being added such as ThalHi. You simply run the command normally without specifying subjects and it will only run subjects that have not been completed.

### Common Flags

#### General



```
General Optional Arguments:
  --rerun_mem           Rerun subjects that failed due to memory constraints
  --slots SLOTS         Set number of slots/threads per subject. Default is 4.
```

Rerun memory option reruns all subjects that are in failed_subjects_mem.txt file in /logs directory.  
Slots specifies number of slots to run per subject. This is equivalent to the number of cores.



#### Subjects

```
Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset (minues completed subjects) by default or on number of subjects given via the --numsub flag
```

This might be the option you use the most. Pretty self explanatory.  
For subjects, this is what the flag would look like:  
--subjects 10001 10002


#### Paths

```
Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.
```

Use the bids_dir flag when the bids directory is not in your root dataset directory. For example, the hcp developmental dataset is stored on a shared directory in argon so I would do --bids_dir /Dedicated/inc_data/bigdata/hcpd  
The work_dir flag will change what working directory the pipeline will use. Mostly useful for argon and changing between working on localscratch, nfscratch, and lss.


#### Argon

```
Argon HPC Optional Arguments:
  --email               Receive email notifications from HPC
  --no_qsub             Does not submit generated bash scripts to be run.
  --hold_jid HOLD_JID   Jobs will be placed on hold until specified job completes. [JOB_ID]
  --no_resubmit         Enable to not resubmit tasks after job migration. Default is to resubmit. This should be enabled when running on all.q
  --mem MEM             Set memory for HPC
  -q QUEUE, --queue QUEUE
                        Set queue for HPC. Default is our queue: SEASHORE
  --stack STACKS JOBS_PER_STACK   Queue jobs in dependent stacks. When all jobs complete, next will start. Two required integer arguments [# of stacks][# of jobs per stack]. Use 'split' in second argument to split remaining jobs evenly amongst number of stacks.
```

### Submitting on Argon vs Thalamege

The program has some slight differences when submitting on argon vs thalamege. On argon, jobs are submitted to the SGE scheduler and will generally be submitted as task arrays split up by subject. On thalamege, the jobs run in parallel again split up by subject.  
The program automatically knows which host you are on, so don't worry about having to tell it anything about the host.

On argon, for some of jobs, especially fmriprep, it makes sense to first copy over data into localscratch to make the job run faster. The localscratch is local memory and is accessed by argon much faster than our network lss drive. Up the job finishing, any output data is copied over to the target output directory. Localscratch is used as the working directory by default on Argon and it will be used automatically for fmriprep. The localscratch storage is much smaller so for subjects with lots of sessions, you may run into issues running out of file storage. Simply change the working directory to one on the lss if this happens.  
Check out the base fmriprep script to see an example of how that's done preprocessing/argon/fmriprep_base.sh

### Arguments

Let's checkout fmripreps options by using the --help flag.

```
$ python3 main.py dataset_dir/ fmriprep --help
usage: [OPTIONS]

optional arguments:
  -h, --help            show this help message and exit
  --fmriprep_opt FMRIPREP_OPT
                        Options to add to fmriprep. Write between '' and replace - with * as shown: '**[OPTION1] arg1 **[OPTION2] ...'

Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset (minus completed subjects)
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset (minus completed subjects) by default or on number of subjects given via the --numsub flag

Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.

General Optional Arguments:
  --rerun_mem           Rerun subjects that failed due to memory constraints
  --slots SLOTS         Set number of slots/threads per subject. Default is 4.

Argon HPC Optional Arguments:
  --email               Receive email notifications from HPC
  --no_qsub             Does not submit generated bash scripts.
  --hold_jid HOLD_JID   Jobs will be placed on hold until specified job completes. [JOB_ID]
  --no_resubmit         Enable to not resubmit tasks after migration. Default is to resubmit.
  --mem MEM             Set memory for HPC
  -q QUEUE, --queue QUEUE
                        Set queue for HPC
  --stack STACK STACK   Queue jobs in dependent stacks. When all jobs complete, next will start. Two required integer arguments [# of stacks][# of jobs per stack]. Use 'split' in second argument to split remaining jobs evenly amongst number of stacks.
  ```

As you can see from the documentation, fmriprep has all the standard options and only one unique optional argument called --fmriprep_opt. This flag is for adding options to running the fmriprep pipeline. An example call to run fmriprep in our generated bash script is below. The fmriprep options would be added after the last line.

```
singularity run --cleanenv -B $working_dataset_dir $singularity_path \
$working_bids_dir \
$working_dataset_dir \
participant --participant_label $subject \
--nthreads $slots --omp-nthreads $slots \
-w $working_dir \
--fs-license-file ${freesurfer_lic} \
--mem $10 \
--skip_bids_validation \
```

We have to use a special syntax for writing these options. The computer would get confused if we used dashes (-) and so we need to replace them with stars (*) and do it between single quotes ('). Entering the command:  
```
python3 main.py dataset_dir/ fmriprep --fmriprep_opt '**stop*on*first*crash'
```

would produce this (only the last line is changed)
```
singularity run --cleanenv -B $working_dataset_dir $singularity_path \
$working_bids_dir \
$working_dataset_dir \
participant --participant_label $subject \
--nthreads $slots --omp-nthreads $slots \
-w $working_dir \
--fs-license-file ${freesurfer_lic} \
--mem $10 \
--skip_bids_validation \
--stop-on-first-crash
```

You could also just generate the bash script, use the --no_qsub so the script doesn't run, and then add the options manually, change anything you want, and run the generated script yourself.

### Example

A typical fmriprep command will look like this:
```
python3 main.py dataset_dir/ fmriprep
```

Easy peasy. The program will automatically run any subjects that aren't completed yet (in logs/completed_subjects.txt) or have failed previously. And remember you can also run specific subjects using --subjects using --numsub
```
python3 main.py dataset_dir/ fmriprep --subjects 10001
python3 main.py dataset_dir/ fmriprep --subjects 10001 10002
python3 main.py dataset_dir/ fmriprep --numsub 5
```

## fmriprep

## mriqc

### Arguments

Let's look at the options for mriqc using the --help flag. (I hope you're noticing a pattern. Documentation is your best friend)

```
$ python3 main.py dataset_dir/ mriqc --help
usage: [OPTIONS]

optional arguments:
  -h, --help            show this help message and exit
  --group               Run group analysis for mriqc instead of default participant level
  --mriqc_opt MRIQC_OPT
                        Options to add to mriqc. Write between '' as shown: '--[OPTION1] --[OPTION2] ...'

Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset (minus completed subjects)
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset (minus completed subjects) by default or on number of subjects given via the --numsub flag

Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.

General Optional Arguments:
  --rerun_mem           Rerun subjects that failed due to memory constraints
  --slots SLOTS         Set number of slots/threads per subject. Default is 4.

Argon HPC Optional Arguments:
  --email               Receive email notifications from HPC
  --no_qsub             Does not submit generated bash scripts.
  --hold_jid HOLD_JID   Jobs will be placed on hold until specified job completes. [JOB_ID]
  --no_resubmit         Enable to not resubmit tasks after migration. Default is to resubmit.
  --mem MEM             Set memory for HPC
  -q QUEUE, --queue QUEUE
                        Set queue for HPC
  --stack STACK STACK   Queue jobs in dependent stacks. When all jobs complete, next will start. Two required integer arguments [# of stacks][# of jobs per stack]. Use 'split' in second argument to split remaining jobs evenly amongst number of stacks.
```

As you can see, mriqc is very similar to fmriprep with only the addition being the --group flag. Use this flag when you want to do group analysis for mriqc instead of the standard participant level analysis.  
Similar to fmriprep you have the options flag, --mriqc_opt, to add options to running mriqc. The special syntax is the same.

### Example

Most of your calls are going to look like this.
```
python3 main.py dataset_dir/ mriqc
```

## regressors

The regressors command parses columns from regressors.tsv files and censors motion. The outputs will be nuisance.1D and censor.1D files in the 3dDeconvolve directory.

### Arguments

```
$ python3 preprocessing/main.py dataset_dir regressors --help
usage: [OPTIONS]

optional arguments:
  -h, --help            show this help message and exit
  --regressors_wc REGRESSORS_WC
                        Wildcard used to find regressors files using glob. Must have * at beggining. Default is *regressors.tsv.
  -c [COLUMNS [COLUMNS ...]], --columns [COLUMNS [COLUMNS ...]]
                        Enter columns to parse from regressors file into nuisance.1D file for usage in 3dDeconvolve. Default columns will be added automatically.
  --no_default          Enter flag to not use default columns. If not entered, default columns will be parsed. Default columns are: ['csf', 'white_matter', 'trans_x', 'trans_y', 'trans_z', 'rot_x', 'rot_y', 'rot_z']
  --threshold THRESHOLD
                        Threshold for censoring. Default is 0.2

Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset (minus completed subjects)
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset (minus completed subjects) by default or on number of subjects given via the --numsub flag

Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.
  ```

Optional arguments
- You can add columns to parse from the regressors files via the --columns flag
- If you don't want default columns use the --no_default flag.
- The --threhsold sets the threshold for censoring motion based on framewise displacement. The default is 0.2. You may have to lower this if too much data is being removed. 

## 3dDeconvolve

The 3dDeconvolve command will parse regressor files for default or given columns, censor motion, create stimulus timing files, and generate a 3dDeconvolve bash script. Outputs will be in the 3dDeconvolve/ folder

### Arguments

```
$ python3 main.py dataset_dir/ 3dDeconvolve --help
usage: [stimulus_col][timing_col][OPTIONS]

positional arguments:
  stimulus_col          Column name for stimulus type in run timing file.
  timing_col            Column name for time of stimulus presentation in run timing file.

optional arguments:
  -h, --help            show this help message and exit
  --bold_wc BOLD_WC     Wildcard used to find bold files using glob. Must have * at beggining. Default is *
  --timing_file_dir TIMING_FILE_DIR
                        Directory holding run timing files. Default is dataset BIDS directory.
  --run_timing_wc RUN_TIMING_WC
                        Wildcard used to find run timing files using glob. Must have * at beggining. Default is *
  --regressors_wc REGRESSORS_WC
                        Wildcard used to find regressors files using glob. Must have * at beggining. Default is *regressors.tsv.
  --use_stimfiles       Use stimfiles instead of stim config for setting up 3dDeconvolve script.
  -c [COLUMNS [COLUMNS ...]], --columns [COLUMNS [COLUMNS ...]]
                        Enter columns to parse from regressors file into nuisance.1D file for usage in 3dDeconvolve. Default columns will be added automatically.
  --no_default          Enter flag to not use default columns. If not entered, default columns will be parsed. Default columns are: ['csf', 'white_matter', 'trans_x', 'trans_y', 'trans_z', 'rot_x', 'rot_y', 'rot_z']
  --sessions [SESSIONS [SESSIONS ...]]
                        Set the sessions to be analyzed in order. Default will be all sessions in alphabetical order
  --threshold THRESHOLD
                        Threshold for censoring. Default is 0.2

Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset (minus completed subjects)
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset (minus completed subjects) by default or on number of subjects given via the --numsub flag

Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.

General Optional Arguments:
  --rerun_mem           Rerun subjects that failed due to memory constraints
  --slots SLOTS         Set number of slots/threads per subject. Default is 4.

Argon HPC Optional Arguments:
  --email               Receive email notifications from HPC
  --no_qsub             Does not submit generated bash scripts.
  --hold_jid HOLD_JID   Jobs will be placed on hold until specified job completes. [JOB_ID]
  --no_resubmit         Enable to not resubmit tasks after migration. Default is to resubmit.
  --mem MEM             Set memory for HPC
  -q QUEUE, --queue QUEUE
                        Set queue for HPC
  --stack STACK STACK   Queue jobs in dependent stacks. When all jobs complete, next will start. Two required integer arguments [# of stacks][# of jobs per stack]. Use 'split' in second argument to split remaining jobs evenly amongst number of stacks.
  ```

3dDeconvolve differs from our other commands as it has positional, required arguments. You must enter the stimulus_col and timing_col for this command to work. These are the column names for your stimuli and their timings. Generally found in the events.tsv file or behavorial data.  
These two columns are used to generate stimfiles, which are required for 3dDeconvolve.  

There are many optional arguments that are also specific and important to running 3dDeconvolve.  
- The wildcards are useful when trying to pull out specific files, such as only a certain task.  
- The --timing_file_dir is used to find behavorial data files storing stimulus and timing info. By default the program looks for *events.tsv in the fmriprep directory. For us, we sometimes keep this info in the rdss and this flag can be used to find those timing files.
- The --sessions flag is useful when you have various sessions such as task and resting-state. You generally aren't going to want to mix the two, so you can use this flag to pull out only task data or only resting-state.  

The other optional arguments --columns, --no_default, and --threshold I have explained in the regressors portion of this notebook and are used to specify columns to parse from regressors file for denoising and for setting a threshold to censor motion.  
All others are common flags that I explained in the beginning of the notebook.

### Example

I will show an example of 3dDeconvolve on our ThalHi dataset. Below is the correct call.
```
python main.py /data/backed_up/shared/ThalHi_MRI_2020 3dDeconvolve Trial_type Time_Since_Run_Cue_Prez --timing_file_dir /mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data
```

Let's break it down. The first argument after main.py is our dataset directory, which points to the ThalHi 2020 MRI data on thalamege. Next, is our subcommand 3dDeconvolve. After this we get our two positional arguments: stimulus_col and timing_col.  
In this case, we want to run 3dDeconvolve on each cue in the ThalHi task (Stay, EDS, IDS) so we look in our behavioral data () and find the corresponding column name to be Trial_type. Next, is the timing_col which represents the timing of when the stimuli occurred. This column name is Time_Since_Run_Cue_Prez. 
Finally, the events/behavioral data is on the rdss instead of the 'default' fmriprep directory, so we need to specify that directory by using the --timing_file_dir flag.

Below is a snapshot of the output you will see. The 3dDeconvolve command will parse regressor files for default or given columns, censor motion, and create stimulus timing files. All outputs will be in the 3dDeconvolve/ folder. Like fmriprep and mriqc, it will also generate a bash script. You can use/edit this or just use your own. This program is most useful for generating the stimfiles, parsing regressors, and censoring motion.
```
Prepping 3dDeconvolve on subject 10003


Parsing regressor files for subject 10003 in /data/backed_up/shared/ThalHi_MRI_2020/fmriprep/sub-10003/
Parsing: sub-10003_task-ThalHi_run-1_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-2_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-3_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-4_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-5_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-6_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-7_desc-confounds_regressors.tsv
Parsing: sub-10003_task-ThalHi_run-8_desc-confounds_regressors.tsv
Writing regressor file to /data/backed_up/shared/ThalHi_MRI_2020/3dDeconvolve/sub-10003/nuisance.1D
Writing censor file to /data/backed_up/shared/ThalHi_MRI_2020/3dDeconvolve/sub-10003/censor.1D


Successfully extracted columns ['csf', 'white_matter', 'trans_x', 'trans_y', 'trans_z', 'rot_x', 'rot_y', 'rot_z'] from regressor files and censored motion
['/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_001_Task_THHS_2020_Aug_21_0817.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_002_Task_THHS_2020_Aug_21_0826.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_003_Task_THHS_2020_Aug_21_0833.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_004_Task_THHS_2020_Aug_21_0841.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_005_Task_THHS_2020_Aug_21_0851.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_006_Task_THHS_2020_Aug_21_0900.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_007_Task_THHS_2020_Aug_21_0907.csv', '/mnt/cifs/rdss/rdss_kahwang/ThalHi_data/MRI_data/Behavioral_data/10003_008_Task_THHS_2020_Aug_21_0915.csv']

Creating stimulus files for subject 10003
Writing stimulus file: Stay
Writing stimulus file: EDS
Writing stimulus file: IDS
```

## heudiconv

**Heudiconv is part of the preprocessing command line tool. Let's look at its input options by inputting the --help flag after the heudiconv command.**  
**Example shown below.**
\
&nbsp;


```
python main.py dataset_dir/ heudiconv --help

usage: [SCRIPT_PATH][OPTIONS]

positional arguments:
  script_path           Filename of script. Script must be located in following directory: /data/backed_up/shared/bin/heudiconv/heuristics/

optional arguments:
  -h, --help            show this help message and exit
  --post_conv_script POST_CONV_SCRIPT
                        Filepath of post-heudiconv Conversion script. Ocassionally needed to make further changes after running heudiconv.

Subject arguments:
  -n NUMSUB, --numsub NUMSUB
                        The number of subjects being analyzed. If none listed, default will be whole dataset
  -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
                        The subjects being analyzed. Do not include sub- prefix. If subjects are not included, pre-processing will be run on whole dataset by default or on number of subjects given via the --numsub flag

Path arguments:
  --bids_dir BIDS_DIR   Path for bids directory if not located in dataset directory.
  --work_dir WORK_DIR   The working dir for programs. Default for argon is user dir in localscratch. Default for thalamege is work directory in dataset directory.
```


**We can see from the documentation that the the heudiconv command takes 1 required input (SCRIPT_PATH) and some optional flags.** \
**The Script Path refers to the python script used to run heudiconv.**   
```
python main.py dataset_dir/ heudiconv /data/backed_up/shared/bin/heudiconv/heuristics/{Script Name}.py
```


**Sometimes with Heudiconv you need to make changes to data after running heuidconv, you can specify a post conversion script with the --post_conv_script flag.**  
**For example, for ThalHi our post conversion script can be found at /mnt/nfs/lss/lss_kahwang_hpc/scripts/thalhi/heudiconv_post.py. The heudiconv command would then be:** 
```
python main.py dataset_dir/ heudiconv /data/backed_up/shared/bin/heudiconv/heuristics/{Script Name}.py --post_conv_script /mnt/nfs/lss/lss_kahwang_hpc/scripts/thalhi/heudiconv_post.py
```

**Like our other preproccessing commands, we can specify which subjects to run or how many to run using the --subjects and --numsub flags.**
```
python main.py dataset_dir/ heudiconv /data/backed_up/shared/bin/heudiconv/heuristics/{Script Name}.py --subjects 10001 10002
python main.py dataset_dir/ heudiconv /data/backed_up/shared/bin/heudiconv/heuristics/{Script Name}.py --numsub 2
```

**The script will run in parallel on each subject.**

## FD_stats