No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 42 commits ahead of bic-mskcc:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
archive
docs
inst/extdata/template_configs
man
scripts
tests
.Rbuildignore
.gitignore
DESCRIPTION
NAMESPACE
README.md
README_OLD.md
TEMPORARY_NOTES.txt
halo.Rproj
test_starter.R

README.md

Analyze Halo data

This package generates various statistics and plots for Halo data. Currently in its beginning stages, it has been written specifically for melanoma data but as it evolves it ultimately will support all data sets.

Step 1: Install

On MSK server, with R

/home/byrne/R/R-3.4.3/bin/R

Run

library(devtools)
install_github("caitlinjones/halo")

Step 2: Create manifest with all project parameters including input files

Run

cd [WORKING_DIRECTORY]
Rscript scripts/configure_halo_pipeline.R \
     --rawDataDir /home/byrne/halo/dev/halodev/tests/data \
     --studyName halodevTest \
     --dataDir $PWD/objectAnalysisData \
     --metaDir /ifs/tcga/socci/Multiomyx/HaloData/Melanoma_IL2__Final/Cohort2/MetaData \
     --driftDir /ifs/tcga/socci/Multiomyx/Cell_drift_loss_masks/melanoma_drift_result/drift_summary \
     --markerConfigFile /home/byrne/halo/data/template_configs/template_marker_config.yaml \
     --plotConfigFile /home/byrne/halo/data/template_configs/plot_config.yaml \
     --annotationsDirs /ifs/tcga/socci/Multiomyx/HaloData/Melanoma_IL2__Final/Cohort2/HaloCoordinates \
     --setDefaultDirectoryStructure

Above are the minimum required arguments to start a new pipeline run FROM SCRATCH. For a full list of options, run

Rscript scripts/configure_halo_pipeline.R -h

Use '--setDefaultDirectoryStructure' to do just that. This will create default folders and subfolders for all possible analyses.

The result of this script is a YAML file with all parameters needed to run entire pipeline. Default location for this file is studyName/config/study_config.yaml. This file can then be manually edited as needed or used as a template for future pipeline runs.

Step 3: Run pipeline

Move to study directory, e.g.,

cd {studyName}

Run from scratch, including marking exclusions

Rscript scripts/final_pipeline.R -m config/study_config.yaml --markExclusions

Step 4: Rerun failed or additional pipeline steps

To rerun a step because of failure or dependency changes, manually modify study_config.yaml by removing files to be recreated. Alternatively, if file does exist, delete it. Then rerun pipeline.

Rscript scripts/final_pipeline.R -m config/study_config.yaml

NOTES:

  • Currently the pipeline runs the following steps:

    • parse and store all halo boundaries
    • mark exclusions & generate debug plots
    • generate cell type marker combination counts spreadsheet with cell type interpretations
    • calculate total FOV area and marker densities
    • plot total FOV marker densities
    • calculate infiltration area and marker densities (by distance intervals from tumor interfaces)
    • plot infiltration marker densities
  • Exclusions need to be marked only once. Once run, study_config.yaml will be updated to include data_dir, which will point to the directory containing *.rda files that include EXCLUDE columns. For any subsequent pipeline runs, do NOT use --markExclusions unless meta data, drift data or Halo boundary data has changed.

  • Similarly, if starting pipeline using *.rda files that already have EXCLUDE columns, fill in data_dir or data_files in study_config.yaml and do NOT use --markExclusions.

  • If not provided during configuration, study_config.yaml will be automatically generated with default values. IMPORTANT: Validation of config files is not yet implemented. Make sure to manually review and modify files as necessary.

  • study_config.yaml will be updated as the pipeline runs to point to any new files generated during the run. This will allow steps to be skipped in future runs if their dependencies already exist and remain unchanged during current run. To turn this default behavior off, use --noConfigOverwrite.

  • If an existing cellTypeConfigFile configuration file is not provided during a configuration, one will be generated based on *CellTypes.xlsx file in meta data directory, and can be manually modified if necessary before running pipeline.

  • markerConfigFile is used for plotting density values. A template exists, but there is currently no auto-generation of this file. See documentation (TO BE CREATED) for details on the format of this file.