Separate dim reduction and KMeans into independent operations#24
Merged
Separate dim reduction and KMeans into independent operations#24
Conversation
Redesign the precalculated app to decouple projection and clustering: - Split "Run Clustering" into "Project to 2D" + optional "Run KMeans" - Add "Color by" dropdown: color points by any metadata column or KMeans result - Support multiple KMeans runs (all persist in dropdown, sorted by k) - Add standalone ClusteringService methods: run_dim_reduction_safe, run_kmeans_only_safe - Add split backend controls: render_projection_controls, render_kmeans_controls - Rework taxonomy tree with "Group by" selector for multiple KMeans runs - Chart zoom auto-resets on re-projection (key includes data_version) - Fix duplicate log output (propagate=False on app loggers) - Show unique value counts in Color by dropdown - Warning when column exceeds 20 unique values (tableau20 palette limit) embed_explore app is unaffected (uses existing run_clustering_safe path). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Redesign of the precalculated embeddings app that separates two core operation into independent steps:
Previously these were bundled behind a single "Run Clustering" button, which made it seem like they were dependent on each other, which is misleading, because they both operate independently on the same full-size embedding vectors from the parquet input.
The new workflow:
This makes it clear that KMeans is just another label (like kingdom or phylum......), not something that affects the 2D layout. You can visually compare KMeans cluster assignments against any metadata attribute just by switching the Color by dropdown.