Analyze Halo data
This package generates various statistics and plots for Halo data. Currently in its beginning stages, it has been written specifically for melanoma data but as it evolves it ultimately will support all data sets.
Step 1: Install
On MSK server, with R
Step 2: Create manifest with all project parameters including input files
cd [WORKING_DIRECTORY] Rscript scripts/configure_halo_pipeline.R \ --rawDataDir /home/byrne/halo/dev/halodev/tests/data \ --studyName halodevTest \ --dataDir $PWD/objectAnalysisData \ --metaDir /ifs/tcga/socci/Multiomyx/HaloData/Melanoma_IL2__Final/Cohort2/MetaData \ --driftDir /ifs/tcga/socci/Multiomyx/Cell_drift_loss_masks/melanoma_drift_result/drift_summary \ --markerConfigFile /home/byrne/halo/data/template_configs/template_marker_config.yaml \ --plotConfigFile /home/byrne/halo/data/template_configs/plot_config.yaml \ --annotationsDirs /ifs/tcga/socci/Multiomyx/HaloData/Melanoma_IL2__Final/Cohort2/HaloCoordinates \ --setDefaultDirectoryStructure
Above are the minimum required arguments to start a new pipeline run FROM SCRATCH. For a full list of options, run
Rscript scripts/configure_halo_pipeline.R -h
Use '--setDefaultDirectoryStructure' to do just that. This will create default folders and subfolders for all possible analyses.
The result of this script is a YAML file with all parameters needed to run entire pipeline. Default location for this file is studyName/config/study_config.yaml. This file can then be manually edited as needed or used as a template for future pipeline runs.
Step 3: Run pipeline
Move to study directory, e.g.,
Run from scratch, including marking exclusions
Rscript scripts/final_pipeline.R -m config/study_config.yaml --markExclusions
Step 4: Rerun failed or additional pipeline steps
To rerun a step because of failure or dependency changes, manually modify study_config.yaml by removing files to be recreated. Alternatively, if file does exist, delete it. Then rerun pipeline.
Rscript scripts/final_pipeline.R -m config/study_config.yaml
Currently the pipeline runs the following steps:
- parse and store all halo boundaries
- mark exclusions & generate debug plots
- generate cell type marker combination counts spreadsheet with cell type interpretations
- calculate total FOV area and marker densities
- plot total FOV marker densities
- calculate infiltration area and marker densities (by distance intervals from tumor interfaces)
- plot infiltration marker densities
Exclusions need to be marked only once. Once run, study_config.yaml will be updated to include data_dir, which will point to the directory containing *.rda files that include EXCLUDE columns. For any subsequent pipeline runs, do NOT use --markExclusions unless meta data, drift data or Halo boundary data has changed.
Similarly, if starting pipeline using *.rda files that already have EXCLUDE columns, fill in data_dir or data_files in study_config.yaml and do NOT use --markExclusions.
If not provided during configuration, study_config.yaml will be automatically generated with default values. IMPORTANT: Validation of config files is not yet implemented. Make sure to manually review and modify files as necessary.
study_config.yaml will be updated as the pipeline runs to point to any new files generated during the run. This will allow steps to be skipped in future runs if their dependencies already exist and remain unchanged during current run. To turn this default behavior off, use --noConfigOverwrite.
If an existing cellTypeConfigFile configuration file is not provided during a configuration, one will be generated based on *CellTypes.xlsx file in meta data directory, and can be manually modified if necessary before running pipeline.
markerConfigFile is used for plotting density values. A template exists, but there is currently no auto-generation of this file. See documentation (TO BE CREATED) for details on the format of this file.