Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SOM implementation #374

Merged
merged 59 commits into from
Mar 9, 2022
Merged

Add SOM implementation #374

merged 59 commits into from
Mar 9, 2022

Conversation

alex-l-kong
Copy link
Contributor

@alex-l-kong alex-l-kong commented Jan 18, 2021

What is the purpose of this PR?

Had to reopen this PR because the previous one's commit history got massively messed up.

Addresses and closes #325. Candace's pixel clustering algorithm needs to be integrated with ark-analysis now. This PR begins to add this functionality in, starting with a basic SOM implementation. We need to verify that this PR produces the same or at least very similar results.

How did you implement your changes

We piggyback off of MiniSOM (https://github.com/JustGlowing/minisom), a Python-based SOM implementation that should work for our purposes. It shouldn't be hard to translate most of the code into our project, but optimizing the process may be difficult.

Once we've verified our outputs, we then need to implement downsampling. A SOM would take a very long time to train on hundreds of thousands of pixel data from various images. We don't need all of these pixel entries.

We also need to implement

Remaining issues

Preprocessing takes a very long time. Unfortunately, it seems like apply_along_axis doesn't offer much of a speedup. Alternatives would be greatly welcomed!

@alex-l-kong alex-l-kong self-assigned this Jan 18, 2021
alex-l-kong and others added 16 commits January 26, 2021 09:31
* Initial commit of new cluster boost step

* Add RST files

* Try explicitly downloading cytolib

* Try directly installing cytolib

* See if changes to cytolib library made it in

* Keep only the TIFs we absolutely need for testing

* Remove old preprocessing.py and cluster.py files: everything now in som_utils.py and som_runner.R

* Decrease size of FlowSOM images to speed up testing

* Reduced size of images for segmentation testing too, and adjusted function arguments accordingly

* Update to testing: only unit test example_flowsom_clustering.ipynb, and mock up create_pixels instead of directly running it

* Add new testing framework

* Change the directory structure of the channel data and the segmentation labels, which also requires tests to be updated

* Fix a typo in segmentation_dir for testing example_flowsom_clustering.ipynb

* Change path in som_utils.py

* Remove comments

* Adjust path name parameter passing for cluster_pixels and som_runner.R

* Add coverage by checking path names in cluster_pixels

* Basic path modifications
* Initial commit of new subsetting strategy

* Finalize subsetting method

* Fix notebook testing for subsetted som

* Add a ton of testing

* Nuke references to point

* Add tables library to fix import errors, and fix PYCODESTYLE

* Remove MIBItiff tests for flowsom clustering and mock up cluster_pixels in notebook test to write to HDF5 file

* PYCODESTYLE

* Reorganize SOM

* Remove old functions and adjust notebook tests for new structure

* Correct typo

* Add next version of SOM

* Need to add feather to autodoc

* Documentation fix for cluster_pixels

* Documentation fix for train_som

* Add seed and num_passes default arguments

* Remove extraneous comments

* Address code review comments outside of the normalization

* Fix indentation in conf.py

* Address code review comments

* Nuke another h5py
* Initial commit of consensus clustering

* Update notebooks testing to include dummy consensus cluster results creation

* Address code review comments
* Batch preprocessing step

* Adjust notebook to handle new preprocessing

* Add implementation with multiple fovs batched in run_trained_som.R

* Remove old cluster batching code

* Batch segmentation label loading

* Change segmentation labels loading to numpy

* Remove old commented code, change create_fov_pixel_mat to return matrices and create_pixel_matrix to save matrices
)

* Fix hierarchical clustering to assign to every element

* Add testing for new consensus clustering method

* Remove extra comma
* Initial results for visualizing the SOM heatmap

* Adjust tests to account for new directory setting in example_flowsom_clustering

* Add random seeds

* Add seed call to ConsensusClusterPlus

* Add numPasses back

* Remove channels argument from create_pixel_matrix

* Add save directories based on datetime to separate runs

* Address code review comments (part 1)

* Add numeric random seed

* Code review comments part 1 (no directory structure fix yet)

* Add new directory setup and notebook testing

* Fix typo

* Fix directory creation bugs in notebook tests

* Change cell ordering
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

alex-l-kong and others added 11 commits April 22, 2021 12:30
* Add Candace's suggestions

* Update tests to reflect new method of creating and reading .feathers

* Change cluster assignment step from cbind to $ for minor speedup
* Initial setup of train_cell_som

* Add initial cell clustering results

* Forgot to update tests with new function names

* More function fixes

* PYCODESTYLE

* Fix documentation of cell_consensus_cluster

* Another docstring fix

* Add parameters to set learning rate and normalize in run_cell_som.R

* Remove unnecessary subset

* Fix 99.9% normalization in run_cell_som.R

* Typo

* Add row sum normalization and removal of zero sum rows to R script files

* Confirm cell clustering

* Make comments clearer

* Remove redundant fov from fov_pixel_data

* Fix parameters in example_flowsom_clustering.ipynb

* Remove subtracting by 1 error

* Clarify comments for cell clustering

* Almost all code review comments addressed

* Add efficiencies to pixel clustering and cell table merging

* Fix comments and print outs

* PYCODESTYLE

* Tests, lot's of tests...

* Add check to verify_same_elements that tests for order, and make sure ordering is ensured before run_pixel_som and run_cell_som

* Fix col_index -> column_index

* More index fixing

* Index fixes for clustering cells

* Add updated FlowSOM notebook with results

* Add metadata for cluster_cells test

* Move row sum normalization to Python, keep column percentile normalization in R

* Update FlowSOM notebook, and update notebook tests (hold off on cell clustering tests until we see what Brian discovers about Travis testing with R)

* Update error message printed by verify_same_elements if enforce_order=True

* PYCODESTYLE
* Add descriptive notes to the FlowSOM notebook, add more argument setting parameters, and update notebook tests accordingly

* Fix call to execute in notebooks_test_utils

* Rename example_flowsom_clustering to example_pixel_cell_clustering

* Address code review comments
alex-l-kong and others added 7 commits October 27, 2021 11:03
* Visualizations 1-4 in progress

* Fix docstring

* Fix a few tests

* Typo in docstring

* Docstring fix

* Docfix in right location

* Visualizations 5-6 added in

* Adjust tests to account for cell_size normalization

* Visualizations 7-8 done

* Docstring fix in compute_cell_cluster_counts

* Version commit of cell clustering

* Address a lot of Candace requests

* Update pipeline to include weighted cell channel expression heatmaps

* Notebook documentation link sweetener

* Initial addressing of code review comments

* More code review comments addressed

* Remove comments

* Update notebooks to include changes made in addressing code review comments

* Remove stray line in data_utils.py that clipped all segmentation label 1's from Candace's dataset

* Remove viz arguments to former viz functions that now just return the data

* Convert visualization into pure computational functions and add more testing

* Update cell clustering notebook to separate computation from visualization

* Update pixel clustering notebook

* Update cell clustering notebook

* Address a TON of code review comments

* Docstring

* Add img_sub_folder argument to create_pixel_matrix

* Suffix fix in data_utils

* Add img_sub_folder argument into example_pixel_clustering

* Replace cluster_prefix with None

* Initial work on .csv saving (likely to be retconned)

* Address final code review comment (saving to CSV)

* Doc fix

* Add new notebooks

* Clarify comment in cell clustering notebook

* Add FINAL final correction (normalize by cell size prior to running R)

* Update notebook documentation: saving to .csv, not .feather

* First Candace comments addressed (everything not in the notebook, and NOTE in pixel clustering

* Doc fix

* Add prefixes to all variables in the notebooks

* Address a hella ton of code review comments

* Doc fix in data_utils

* Adjust data_utils tests for new column names in pixel and cell cluster mask generation

* Fix last code review comment (do not save two versions of weighted channel average per cell cluster)

* Tested cell and pixel clustering on various cluster column pairs

* Address code review comments except for no segmentation label support

* Doc fix

* Doc fix (round 2)

* Doc fix (round 3)

* Fix notebook test for pixel clustering

* Support pixel clustering without segmentation images

* Typo in doc

* Adjust pipeline to accommodate segmentation_dir = None

* Doc fix

* More doc fixes

* Fix notebook test

* Explicitly specify max_k and cap in notebook

* Documentation fixes

* Fix a comment

* Remove preprocessed_dir and subsetted_dir from documentation at the top

* Remove TODO from pixel clustering

* Update documentation in cell clustering

* Update weighted cell channel average computation to take file path to weighted cell table

* Adjust cell pipeline for weighted cell channel avg

* Remove cell outputs from cell clustering notebook

* Fix column subset

* Condense cells in pixel and cell clustering process

* Update notebook utils

* Address one final comment

* Address Candace's review comments

* Address next set of Candace review comments (biggest change: z-score bug fixed)

* Minor change to documentation of JSON params saved in pixel clustering

* Add lower-bound capping using pmax

* clusterCountsNorm to clusterCountsNormSub for consistency

* Include both cell SOM and meta visualizations for weighted channel average, and nuke more mentions of hCluster_cap

* Add option to specify dtype when creating the pixel matrix

* Doc fixes

* Add more detail for cell_table_name in cell clustering notebook

* Fix comments in cell clustering

* Code review comments addressed

* Move z-scoring back to R, inputs to Zak's visualization now un-z-scored

* Add remapping scheme to som_utils.py

* Add re-mapping scheme

* Fix cell clustering name

* Need to add cell size normalization for weighted cell channel table

* Fix tick_range None

* Change default plot back to 'tab20'

* Add the cmap arg back to the pixel and cell clustering notebooks

* Nuke z-score in notebook

* Add a test to ensure the cell table creation fails if the user provides a cell table without the required columns

* Doc fixes in the cell clustering notebook

* Update to take into account user renaming of meta clusters, and ensure the colors for each metacluster are consistent across every visualization

* Add tests for remapping cell clusters

* Fix notebook typo

* Notebook update

* Add test to ensure columns and SOM clusters are valid for remapping purposes

* Add a test for the pixel and cell cluster overlay

* Add tests to ensure the pixel meta clusters are in the right order when remapped

* Add tests to ensure the cell meta clusters are in the right order

* Clarify comments

* Fix some docstrings related to the pixel SOM/meta cluster counts per cell

* One more minor fix

* More doc fixes

* More notebook documentation fixes

* More doc fixes

* Reduce the time window for automatically rebuilding Docker on new requirements.txt to 30 minutes
…tering pipeline (#497)

* Initial commit of remapping addition to clustering pipeline

* Add heatmap testing

* Rework tests for file reading in metacluster_remap_gui

* Add sample notebooks with re-visualization scheme

* Relabel barchart, fix off-by-one meta cluster colormap error, ensure heatmap meta cluster colors are grouped together

* Don't divide pixel and cell cluster counts by 1000 when displaying the bar chart

* Add sample notebooks with new updated visualizations

* Clarifying comment

* Remap doc fixes

* Remove duplicate comment about referring to documentation for cell weighting mechanism

* Mega OCD

* Prevent loading of .DS_Store files when generating pixel cluster masks

* Bug fix in io_utils.list_files

* Address all code review comments except for column names in interactive remapping

* Add notebooks with seg_suffix in JSON params and cell heatmap without pixel_meta_cluster_rename_ prefix

* Add test for optional prefix trimming of column names in metacluster_from_files

* Patch file loading

* Need to specify loading .tif and .tiff files only

* Fix path
Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking nice! There's some random files that look like they got added in here too:
data/example_dataset/flowsom_data/deepcell_output, and 'flowsom_data/input_data`. Are those here for a particular reason?

ark/utils/notebooks_test.py Outdated Show resolved Hide resolved
@alex-l-kong
Copy link
Contributor Author

The notebook tests now run the visualization code, mocking only the necessary files and directories needed.

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but some commented code still in here

ark/utils/notebooks_test.py Outdated Show resolved Hide resolved
@ngreenwald
Copy link
Member

Awesome, looks good. Will you make a new release and bump to v0.3.0

@ngreenwald ngreenwald merged commit de1e702 into master Mar 9, 2022
@ngreenwald ngreenwald deleted the new_som branch March 9, 2022 19:35
@srivarra srivarra added the enhancement New feature or request label Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create basic SOM implementation
3 participants