# HIERARCHICAL_CLUSTER

## Overview
This function performs hierarchical (agglomerative) clustering on a set of numeric data and returns a dendrogram plot as a base64-encoded PNG image. It is designed for use in Excel, where you can pass a 2D list or a single column of numbers. The function uses Ward's method for clustering by default, but you can specify other linkage methods. The result is visualized as a dendrogram.

Ward's method minimizes the total within-cluster variance. At each step, the pair of clusters with the minimum increase in total within-cluster variance after merging are combined. The increase in variance $\Delta E$ when merging clusters $A$ and $B$ is:

```math
\Delta E = \frac{|A| \cdot |B|}{|A| + |B|} \|\bar{x}_A - \bar{x}_B\|^2
```

where $|A|$ and $|B|$ are the sizes of clusters $A$ and $B$, and $\bar{x}_A$, $\bar{x}_B$ are their centroids.

## Usage
To use the `HIERARCHICAL_CLUSTER` function in Excel, enter it as a formula in a cell, specifying your data as a range. Optionally, specify the linkage method:

```excel
=HIERARCHICAL_CLUSTER(data, [method])
```
Replace `data` with your selected range of numbers, and `[method]` with the linkage method as a string (e.g., "complete"). The function returns a base64-encoded PNG image of the dendrogram. To insert the image in a viewable manner in Excel, use the Function Dialog and select the `Insert result, not formula` option. Using the custom function directly will only return the base64 string.

## Arguments
| Argument | Type     | Required | Description                                 | Example |
|----------|----------|----------|---------------------------------------------|---------|
| data     | 2D list  | Yes      | Numeric data for clustering (one or more columns). | [[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]] |
| method   | string   | No       | Linkage method: 'single', 'complete', 'average', 'weighted', 'centroid', 'median', or 'ward' (default: 'ward'). | "ward" |

## Returns
| Returns     | Type   | Description                                  | Example |
|-------------|--------|----------------------------------------------|---------|
| Dendrogram  | string | Base64-encoded PNG image of the dendrogram.  | "data:image/png;base64,iVBORw0KGgoAAA..." |
| Error       | string | Error message if calculation fails            | "Error: Not enough data" |

## Linkage Methods
- **single**: Nearest point (minimum distance)
- **complete**: Farthest point (maximum distance)
- **average**: Unweighted pair group method with arithmetic mean (UPGMA)
- **weighted**: Weighted pair group method with arithmetic mean (WPGMA)
- **centroid**: Centroid linkage (UPGMC)
- **median**: Median linkage (WPGMC)
- **ward**: Ward’s minimum variance method (default; only works with Euclidean distance)

## Limitations
- Only numeric data is supported. Non-numeric values are ignored.
- For large datasets, the dendrogram may be difficult to read.
- If an invalid method is provided, the function will fall back to 'ward'.

## Benefits
- Visualizes hierarchical relationships in your data directly in Excel.
- Useful for exploratory data analysis, segmentation, and pattern discovery.

## Examples

### Cluster a List of Values (Default: Ward)
**Sample Input Data (Range `A1:A10`):**
| Value |
|-------|
| 9.6   |
| 9.8   |
| 10    |
| 10.4  |
| 10.8  |
| 11    |
| 11.2  |
| 12    |
| 13    |
| 14    |

**Sample Call:**
```excel
=HIERARCHICAL_CLUSTER(A1:A10)
```
**Sample Output:**
"data:image/png;base64,iVBORw0KGgoAAA..." (truncated)

### Cluster with Complete Linkage
```excel
=HIERARCHICAL_CLUSTER(A1:A10, "complete")
```
**Sample Output:**
"data:image/png;base64,iVBORw0KGgoAAA..." (truncated)

In [None]:
options = {"insert_only":True}

import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import io
import base64

def hierarchical_cluster(data: list, method: str = "ward") -> str:
    """
    Performs hierarchical clustering on numeric data and returns a dendrogram as a base64-encoded PNG image or an error message.

    Args:
        data (list[list[float]]): Numeric data for clustering (Excel range or list)
        method (str): Linkage method for clustering. One of 'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'. Default is 'ward'.
    Returns:
        str: Base64-encoded PNG image of the dendrogram, or error message if calculation fails.
    """
    # Convert input to numpy array, flatten if 1D
    try:
        arr = np.array(data, dtype=float)
    except Exception:
        # Remove non-numeric rows manually
        arr_clean = []
        for row in data:
            try:
                arr_clean.append([float(x) for x in row])
            except Exception:
                continue
        arr = np.array(arr_clean, dtype=float)
    if arr.ndim == 1:
        arr = arr.reshape(-1, 1)
    elif arr.ndim == 2 and arr.shape[1] == 1:
        arr = arr
    elif arr.ndim == 2:
        # If more than one column, use all columns
        arr = arr.astype(float)
    else:
        return "Error: Invalid input data."

    # Remove non-numeric rows
    arr = arr[np.isfinite(arr).all(axis=1)]
    if arr.shape[0] < 2:
        # Return a blank image if not enough data to cluster
        buf = io.BytesIO()
        plt.figure(figsize=(4, 2))
        plt.text(0.5, 0.5, "Not enough data", ha='center', va='center', fontsize=12)
        plt.axis('off')
        plt.tight_layout()
        plt.savefig(buf, format='png')
        plt.close()
        img_b64 = base64.b64encode(buf.getvalue()).decode('utf-8')
        return f"data:image/png;base64,{img_b64}"

    # Perform hierarchical clustering
    try:
        linkage_matrix = linkage(arr, method=method)
    except Exception:
        try:
            linkage_matrix = linkage(arr, method="ward")
        except Exception:
            return "Error: Clustering failed."

    # Plot dendrogram
    plt.figure(figsize=(8, 4))
    dendrogram(linkage_matrix)
    plt.title(f"Hierarchical Clustering Dendrogram ({method})")
    plt.xlabel("Sample Index")
    plt.ylabel("Distance")
    buf = io.BytesIO()
    plt.tight_layout()
    plt.savefig(buf, format='png')
    plt.close()
    img_b64 = base64.b64encode(buf.getvalue()).decode('utf-8')
    return f"data:image/png;base64,{img_b64}"

In [None]:
import ipytest
ipytest.autoconfig()

def test_demo_ward():
    data = [[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]]
    result = hierarchical_cluster(data)
    assert isinstance(result, str)
    assert result.startswith("data:image/png;base64,")
    assert len(result) > 100

def test_demo_complete():
    data = [[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]]
    result = hierarchical_cluster(data, method="complete")
    assert isinstance(result, str)
    assert result.startswith("data:image/png;base64,")
    assert len(result) > 100

def test_invalid_method():
    data = [[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]]
    result = hierarchical_cluster(data, method="not_a_method")
    assert isinstance(result, str)
    assert result.startswith("data:image/png;base64,") or result.startswith("Error:")
    assert len(result) > 10

def test_not_enough_data():
    data = [[9.6]]
    result = hierarchical_cluster(data)
    assert isinstance(result, str)
    assert result.startswith("data:image/png;base64,") or result.startswith("Error:")
    assert len(result) > 10

ipytest.run()

In [None]:
# Interactive Demo
import gradio as gr

examples = [
    [
        [[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]],
        "ward"
    ],
    [
        [[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]],
        "complete"
    ]
]

def render_hierarchical_cluster_html(data, method="ward"):
    result = hierarchical_cluster(data, method)
    if isinstance(result, str) and result.startswith("data:image/png;base64,"):
        # Return an HTML <img> tag with the data URL
        return f'<img src="{result}" alt="Dendrogram" style="max-width:100%;height:auto;" />'
    return f'<div style="color:red;">{result}</div>'

demo = gr.Interface(
    fn=render_hierarchical_cluster_html,
    inputs=[
        gr.Dataframe(headers=["Value"], label="Data", row_count=10, col_count=1, type="array", value=[[9.6], [9.8], [10], [10.4], [10.8], [11], [11.2], [12], [13], [14]]),
        gr.Dropdown(["ward", "single", "complete", "average", "weighted", "centroid", "median"], value="ward", label="Linkage Method")
    ],
    outputs=gr.HTML(label="Dendrogram Image"),
    examples=examples,
    flagging_mode="never",
    fill_width=True
)
demo.launch()