Skip to content

Separate dim reduction and KMeans into independent operations#24

Merged
NetZissou merged 2 commits intomainfrom
feature/separated-projection-kmeans
Apr 2, 2026
Merged

Separate dim reduction and KMeans into independent operations#24
NetZissou merged 2 commits intomainfrom
feature/separated-projection-kmeans

Conversation

@NetZissou
Copy link
Copy Markdown
Collaborator

@NetZissou NetZissou commented Apr 1, 2026

Summary

Redesign of the precalculated embeddings app that separates two core operation into independent steps:

  • dim reduction
  • KMeans clustering

Previously these were bundled behind a single "Run Clustering" button, which made it seem like they were dependent on each other, which is misleading, because they both operate independently on the same full-size embedding vectors from the parquet input.

The new workflow:

  1. Project to 2D: reduces embeddings to 2D using TSNE/PCA/UMAP and renders the scatter plot.
  2. Color by: a dropdown above the plot lets you color points by any metadata column (kingdom, species, img_type, etc.).
  3. Run KMeans (optional): runs KMeans on the full-size embeddings and adds the result as another option in the Color by dropdown. You can run multiple KMeans with different k values, all are kept and available for comparison.

This makes it clear that KMeans is just another label (like kingdom or phylum......), not something that affects the 2D layout. You can visually compare KMeans cluster assignments against any metadata attribute just by switching the Color by dropdown.

NetZissou and others added 2 commits April 1, 2026 14:17
Redesign the precalculated app to decouple projection and clustering:

- Split "Run Clustering" into "Project to 2D" + optional "Run KMeans"
- Add "Color by" dropdown: color points by any metadata column or KMeans result
- Support multiple KMeans runs (all persist in dropdown, sorted by k)
- Add standalone ClusteringService methods: run_dim_reduction_safe, run_kmeans_only_safe
- Add split backend controls: render_projection_controls, render_kmeans_controls
- Rework taxonomy tree with "Group by" selector for multiple KMeans runs
- Chart zoom auto-resets on re-projection (key includes data_version)
- Fix duplicate log output (propagate=False on app loggers)
- Show unique value counts in Color by dropdown
- Warning when column exceeds 20 unique values (tableau20 palette limit)

embed_explore app is unaffected (uses existing run_clustering_safe path).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@egrace479 egrace479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@NetZissou NetZissou self-assigned this Apr 2, 2026
@NetZissou NetZissou merged commit 66948e0 into main Apr 2, 2026
@NetZissou NetZissou deleted the feature/separated-projection-kmeans branch April 7, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants