Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consensus clustering so it assigns hclust values to each pixel #407

Merged
merged 4 commits into from
Mar 11, 2021

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

Currently, we're only saving the consensus cluster to SOM cluster mapping table. We actually want to use that mapping to assign the consensus clusters to each pixel in our dataset.

How did you implement your changes

We pass in more arguments to consensus_cluster.R and write the results to a new directory: pixel_mat_consensus.

@alex-l-kong alex-l-kong self-assigned this Mar 9, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

ark/phenotyping/som_utils.py Outdated Show resolved Hide resolved
@ngreenwald ngreenwald merged commit 732f520 into new_som Mar 11, 2021
@ngreenwald ngreenwald deleted the consensus_fix branch March 11, 2021 20:33
ngreenwald pushed a commit that referenced this pull request Mar 9, 2022
* Initial commit of new SOM branch

* R SOM integration (#375)

* Initial commit of new cluster boost step

* Add RST files

* Try explicitly downloading cytolib

* Try directly installing cytolib

* See if changes to cytolib library made it in

* Keep only the TIFs we absolutely need for testing

* Remove old preprocessing.py and cluster.py files: everything now in som_utils.py and som_runner.R

* Decrease size of FlowSOM images to speed up testing

* Reduced size of images for segmentation testing too, and adjusted function arguments accordingly

* Update to testing: only unit test example_flowsom_clustering.ipynb, and mock up create_pixels instead of directly running it

* Add new testing framework

* Change the directory structure of the channel data and the segmentation labels, which also requires tests to be updated

* Fix a typo in segmentation_dir for testing example_flowsom_clustering.ipynb

* Change path in som_utils.py

* Remove comments

* Adjust path name parameter passing for cluster_pixels and som_runner.R

* Add coverage by checking path names in cluster_pixels

* Basic path modifications

* Add framework for subsetting SOM (#379)

* Initial commit of new subsetting strategy

* Finalize subsetting method

* Fix notebook testing for subsetted som

* Add a ton of testing

* Nuke references to point

* Add tables library to fix import errors, and fix PYCODESTYLE

* Remove MIBItiff tests for flowsom clustering and mock up cluster_pixels in notebook test to write to HDF5 file

* PYCODESTYLE

* Reorganize SOM

* Remove old functions and adjust notebook tests for new structure

* Correct typo

* Add next version of SOM

* Need to add feather to autodoc

* Documentation fix for cluster_pixels

* Documentation fix for train_som

* Add seed and num_passes default arguments

* Remove extraneous comments

* Address code review comments outside of the normalization

* Fix indentation in conf.py

* Address code review comments

* Nuke another h5py

* Add consensus clustering process (#394)

* Initial commit of consensus clustering

* Update notebooks testing to include dummy consensus cluster results creation

* Address code review comments

* Add 99.9% normalization using subsetted data to run_trained_som.R (#396)

* Batch the FlowSOM process (#399)

* Batch preprocessing step

* Adjust notebook to handle new preprocessing

* Add implementation with multiple fovs batched in run_trained_som.R

* Remove old cluster batching code

* Batch segmentation label loading

* Change segmentation labels loading to numpy

* Remove old commented code, change create_fov_pixel_mat to return matrices and create_pixel_matrix to save matrices

* Fix consensus clustering so it assigns hclust values to each pixel (#407)

* Fix hierarchical clustering to assign to every element

* Add testing for new consensus clustering method

* Remove extra comma

* Add heatmap visualizations in for SOM results (#410)

* Initial results for visualizing the SOM heatmap

* Adjust tests to account for new directory setting in example_flowsom_clustering

* Add random seeds

* Add seed call to ConsensusClusterPlus

* Add numPasses back

* Remove channels argument from create_pixel_matrix

* Add save directories based on datetime to separate runs

* Address code review comments (part 1)

* Add numeric random seed

* Code review comments part 1 (no directory structure fix yet)

* Add new directory setup and notebook testing

* Fix typo

* Fix directory creation bugs in notebook tests

* Change cell ordering

* Add metadata columns to SOM cluster and consensus cluster data (#418)

* Add Candace's suggestions

* Update tests to reflect new method of creating and reading .feathers

* Change cluster assignment step from cbind to $ for minor speedup

* Fix notebooks_test

* Add _feature_1 generation

* Re-add notebook tests

* Add cell clustering steps in (#420)

* Initial setup of train_cell_som

* Add initial cell clustering results

* Forgot to update tests with new function names

* More function fixes

* PYCODESTYLE

* Fix documentation of cell_consensus_cluster

* Another docstring fix

* Add parameters to set learning rate and normalize in run_cell_som.R

* Remove unnecessary subset

* Fix 99.9% normalization in run_cell_som.R

* Typo

* Add row sum normalization and removal of zero sum rows to R script files

* Confirm cell clustering

* Make comments clearer

* Remove redundant fov from fov_pixel_data

* Fix parameters in example_flowsom_clustering.ipynb

* Remove subtracting by 1 error

* Clarify comments for cell clustering

* Almost all code review comments addressed

* Add efficiencies to pixel clustering and cell table merging

* Fix comments and print outs

* PYCODESTYLE

* Tests, lot's of tests...

* Add check to verify_same_elements that tests for order, and make sure ordering is ensured before run_pixel_som and run_cell_som

* Fix col_index -> column_index

* More index fixing

* Index fixes for clustering cells

* Add updated FlowSOM notebook with results

* Add metadata for cluster_cells test

* Move row sum normalization to Python, keep column percentile normalization in R

* Update FlowSOM notebook, and update notebook tests (hold off on cell clustering tests until we see what Brian discovers about Travis testing with R)

* Update error message printed by verify_same_elements if enforce_order=True

* PYCODESTYLE

* Remove _norm from file paths

* Add more descriptive notes to the pixel/cell clustering notebook (#428)

* Add descriptive notes to the FlowSOM notebook, add more argument setting parameters, and update notebook tests accordingly

* Fix call to execute in notebooks_test_utils

* Rename example_flowsom_clustering to example_pixel_cell_clustering

* Address code review comments

* Fix notebook merge

* Create pixel and cell cluster overlays (#431)

* Initial commit

* Add overlay visualizations

* Add tests for pixel overlay and fix segmentation mask generation

* Add cluster_col and column_prefix checks where needed

* Add code to save pixel and cell masks as .tiff files in example_pixel_cell_clustering

* Create datasets that integrate with remapping GUI (#447)

* Add functions to generate data formats for Zak's visualization

* Add option to keep the count column when aggregating the pixel cluster data

* Address more code review comments

* PYCODESTYLE

* Separate pixel and cell clustering processes, add code to generate data files for Zak

* Reassign pixel_consensus_dir back to None in example_cell_clustering.ipynb

* Add more descriptive visualizations for pixel and cell clustering (#460)

* Visualizations 1-4 in progress

* Fix docstring

* Fix a few tests

* Typo in docstring

* Docstring fix

* Docfix in right location

* Visualizations 5-6 added in

* Adjust tests to account for cell_size normalization

* Visualizations 7-8 done

* Docstring fix in compute_cell_cluster_counts

* Version commit of cell clustering

* Address a lot of Candace requests

* Update pipeline to include weighted cell channel expression heatmaps

* Notebook documentation link sweetener

* Initial addressing of code review comments

* More code review comments addressed

* Remove comments

* Update notebooks to include changes made in addressing code review comments

* Remove stray line in data_utils.py that clipped all segmentation label 1's from Candace's dataset

* Remove viz arguments to former viz functions that now just return the data

* Convert visualization into pure computational functions and add more testing

* Update cell clustering notebook to separate computation from visualization

* Update pixel clustering notebook

* Update cell clustering notebook

* Address a TON of code review comments

* Docstring

* Add img_sub_folder argument to create_pixel_matrix

* Suffix fix in data_utils

* Add img_sub_folder argument into example_pixel_clustering

* Replace cluster_prefix with None

* Initial work on .csv saving (likely to be retconned)

* Address final code review comment (saving to CSV)

* Doc fix

* Add new notebooks

* Clarify comment in cell clustering notebook

* Add FINAL final correction (normalize by cell size prior to running R)

* Update notebook documentation: saving to .csv, not .feather

* First Candace comments addressed (everything not in the notebook, and NOTE in pixel clustering

* Doc fix

* Add prefixes to all variables in the notebooks

* Address a hella ton of code review comments

* Doc fix in data_utils

* Adjust data_utils tests for new column names in pixel and cell cluster mask generation

* Fix last code review comment (do not save two versions of weighted channel average per cell cluster)

* Tested cell and pixel clustering on various cluster column pairs

* Address code review comments except for no segmentation label support

* Doc fix

* Doc fix (round 2)

* Doc fix (round 3)

* Fix notebook test for pixel clustering

* Support pixel clustering without segmentation images

* Typo in doc

* Adjust pipeline to accommodate segmentation_dir = None

* Doc fix

* More doc fixes

* Fix notebook test

* Explicitly specify max_k and cap in notebook

* Documentation fixes

* Fix a comment

* Remove preprocessed_dir and subsetted_dir from documentation at the top

* Remove TODO from pixel clustering

* Update documentation in cell clustering

* Update weighted cell channel average computation to take file path to weighted cell table

* Adjust cell pipeline for weighted cell channel avg

* Remove cell outputs from cell clustering notebook

* Fix column subset

* Condense cells in pixel and cell clustering process

* Update notebook utils

* Address one final comment

* Address Candace's review comments

* Address next set of Candace review comments (biggest change: z-score bug fixed)

* Minor change to documentation of JSON params saved in pixel clustering

* Add lower-bound capping using pmax

* clusterCountsNorm to clusterCountsNormSub for consistency

* Include both cell SOM and meta visualizations for weighted channel average, and nuke more mentions of hCluster_cap

* Add option to specify dtype when creating the pixel matrix

* Doc fixes

* Add more detail for cell_table_name in cell clustering notebook

* Fix comments in cell clustering

* Code review comments addressed

* Move z-scoring back to R, inputs to Zak's visualization now un-z-scored

* Add remapping scheme to som_utils.py

* Add re-mapping scheme

* Fix cell clustering name

* Need to add cell size normalization for weighted cell channel table

* Fix tick_range None

* Change default plot back to 'tab20'

* Add the cmap arg back to the pixel and cell clustering notebooks

* Nuke z-score in notebook

* Add a test to ensure the cell table creation fails if the user provides a cell table without the required columns

* Doc fixes in the cell clustering notebook

* Update to take into account user renaming of meta clusters, and ensure the colors for each metacluster are consistent across every visualization

* Add tests for remapping cell clusters

* Fix notebook typo

* Notebook update

* Add test to ensure columns and SOM clusters are valid for remapping purposes

* Add a test for the pixel and cell cluster overlay

* Add tests to ensure the pixel meta clusters are in the right order when remapped

* Add tests to ensure the cell meta clusters are in the right order

* Clarify comments

* Fix some docstrings related to the pixel SOM/meta cluster counts per cell

* One more minor fix

* More doc fixes

* More notebook documentation fixes

* More doc fixes

* Reduce the time window for automatically rebuilding Docker on new requirements.txt to 30 minutes

* Integrate interactive remapping workflow into the pixel and cell clustering pipeline (#497)

* Initial commit of remapping addition to clustering pipeline

* Add heatmap testing

* Rework tests for file reading in metacluster_remap_gui

* Add sample notebooks with re-visualization scheme

* Relabel barchart, fix off-by-one meta cluster colormap error, ensure heatmap meta cluster colors are grouped together

* Don't divide pixel and cell cluster counts by 1000 when displaying the bar chart

* Add sample notebooks with new updated visualizations

* Clarifying comment

* Remap doc fixes

* Remove duplicate comment about referring to documentation for cell weighting mechanism

* Mega OCD

* Prevent loading of .DS_Store files when generating pixel cluster masks

* Bug fix in io_utils.list_files

* Address all code review comments except for column names in interactive remapping

* Add notebooks with seg_suffix in JSON params and cell heatmap without pixel_meta_cluster_rename_ prefix

* Add test for optional prefix trimming of column names in metacluster_from_files

* Patch file loading

* Need to specify loading .tif and .tiff files only

* Fix path

* Fix path to pixel clustering

* Fix notebook tests and deploy new image to Docker

* Use travis_wait to force 20 minute timeout

* Don't deploy on PR check anymore

* Add specific pixel_output_dir and cell_output_dir folders to store intermediate files

* Remove extraneous files

* Remove mibi

* Remove QC metric notebook test

* Change function calls in notebooks_test.py

* Add cell clustering test pipeline

* Adjust pixel cluster test pipeline to be similar to cell cluster test pipeline

* Add check_contrast = False

* Remove commented notebook tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants