From 5ca1d46f276c3839da92e98471dfa0c96084dbe9 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Fri, 26 Dec 2025 19:28:33 +0000 Subject: [PATCH] spec: add silhouette-basic specification Created from issue #2334 --- plots/silhouette-basic/specification.md | 29 +++++++++++++++++++++++ plots/silhouette-basic/specification.yaml | 28 ++++++++++++++++++++++ 2 files changed, 57 insertions(+) create mode 100644 plots/silhouette-basic/specification.md create mode 100644 plots/silhouette-basic/specification.yaml diff --git a/plots/silhouette-basic/specification.md b/plots/silhouette-basic/specification.md new file mode 100644 index 0000000000..1099774af9 --- /dev/null +++ b/plots/silhouette-basic/specification.md @@ -0,0 +1,29 @@ +# silhouette-basic: Silhouette Plot + +## Description + +A silhouette plot visualizes the quality of clustering results by showing the silhouette coefficient for each sample, grouped by cluster assignment. Each horizontal bar represents a sample's silhouette score (-1 to 1), where positive values indicate good cluster membership and negative values suggest potential misclassification. This visualization helps evaluate cluster cohesion (how similar samples are to their own cluster) and separation (how distinct they are from neighboring clusters). + +## Applications + +- Evaluating K-means, hierarchical, or other clustering algorithm results +- Comparing different numbers of clusters to find optimal k value +- Identifying poorly clustered or potentially misclassified samples +- Validating cluster assignments before downstream analysis + +## Data + +- `samples` (numeric) - feature vectors for each data point to be clustered +- `cluster_labels` (integer) - cluster assignment for each sample (0 to k-1) +- `silhouette_values` (numeric) - silhouette coefficient per sample (-1 to 1) +- Size: 50-500 samples with 2-10 clusters for readable visualization +- Example: clustering iris dataset into 3 species groups + +## Notes + +- Display horizontal bars for each sample's silhouette score, sorted within each cluster +- Group samples by cluster with distinct colors per cluster +- Include vertical line at average silhouette score for reference +- Annotate each cluster section with its average silhouette score +- Use sklearn.metrics.silhouette_samples for computing individual scores +- Clusters with consistently high scores (close to 1) indicate well-separated groups diff --git a/plots/silhouette-basic/specification.yaml b/plots/silhouette-basic/specification.yaml new file mode 100644 index 0000000000..24095118a7 --- /dev/null +++ b/plots/silhouette-basic/specification.yaml @@ -0,0 +1,28 @@ +# Specification-level metadata for silhouette-basic +# Auto-synced to PostgreSQL on push to main + +spec_id: silhouette-basic +title: Silhouette Plot + +# Specification tracking +created: 2025-12-26T19:28:09Z +updated: null +issue: 2334 +suggested: MarkusNeusinger + +# Classification tags (applies to all library implementations) +# See docs/concepts/tagging-system.md for detailed guidelines +tags: + plot_type: + - silhouette + - bar + data_type: + - numeric + - categorical + domain: + - statistics + - machine-learning + features: + - basic + - clustering + - evaluation