Proposal: Hierarchical Dimensionality Reduction module 

**Author of Proposal: Dylan Stewart**

## Reason or Problem
A common issue with multi-dimensional raster image processing (at the extremes, hyperspectral imagery with hundreds of features) is significant redundancy within the feature space. Some datasets have tens or hundreds of bands when only a handful might be necessary for downstream use (e.g., classification, segmentation, clustering).

## Proposal
This module takes high dimensional data and a desired number of output channels or threshold, compares the distributions of the features within the data, and returns the most dissimilar grouping.

**Design:**
1. Given a dataset containing $N$ pixels and $F$ features, produce a pairwise-distance matrix:
$$C = F \times F,$$
where $C$ can be computed using various metrics (e.g., *Jensen-Shannon* divergence, a *symmetric* [Kullback-Leibler](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html) divergence, Mahalanobis Distance #114, Euclidean distance) evaluated over the distribution of pixels within the dataset.
2. Then, select the most similar pair of features (or spectra) by finding the minimum (for a distance/divergence measure) or maximum (similarity measure, e.g., mutual information or cosine similarity) and merge them by a specified aggregation (e.g., mean, median, max, min).
3. Update $C$ based on 2. until stopping criteria is met. Return dataset with reduced dimensionality.

**Usage: for reducing the dimensionality of an input by finding correlating features within and removing redundancy.** 

**Value: provide support to high-dimensional raster processing applications (e.g., data fusion, hyperspectral, multispectral)**

## Additional Notes or Context
Some distance metrics already available to build from:
- [cupy KL divergence function](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.special.kl_div.html)
- [scipy KL divergence](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html)
- Other distance/similarity metrics are easy to implement (Euclidean and cosine)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Hierarchical Dimensionality Reduction module #729

Reason or Problem

Proposal

Additional Notes or Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Hierarchical Dimensionality Reduction module #729

Description

Reason or Problem

Proposal

Additional Notes or Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions