You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A common issue with multi-dimensional raster image processing (at the extremes, hyperspectral imagery with hundreds of features) is significant redundancy within the feature space. Some datasets have tens or hundreds of bands when only a handful might be necessary for downstream use (e.g., classification, segmentation, clustering).
Proposal
This module takes high dimensional data and a desired number of output channels or threshold, compares the distributions of the features within the data, and returns the most dissimilar grouping.
Design:
Given a dataset containing pixels and features, produce a pairwise-distance matrix:
where can be computed using various metrics (e.g., Jensen-Shannon divergence, a symmetricKullback-Leibler divergence, Mahalanobis Distance Add Mahalanobis Distance Metric #114, Euclidean distance) evaluated over the distribution of pixels within the dataset.
Then, select the most similar pair of features (or spectra) by finding the minimum (for a distance/divergence measure) or maximum (similarity measure, e.g., mutual information or cosine similarity) and merge them by a specified aggregation (e.g., mean, median, max, min).
Update based on 2. until stopping criteria is met. Return dataset with reduced dimensionality.
Usage: for reducing the dimensionality of an input by finding correlating features within and removing redundancy.
Value: provide support to high-dimensional raster processing applications (e.g., data fusion, hyperspectral, multispectral)
Additional Notes or Context
Some distance metrics already available to build from:
Author of Proposal: Dylan Stewart
Reason or Problem
A common issue with multi-dimensional raster image processing (at the extremes, hyperspectral imagery with hundreds of features) is significant redundancy within the feature space. Some datasets have tens or hundreds of bands when only a handful might be necessary for downstream use (e.g., classification, segmentation, clustering).
Proposal
This module takes high dimensional data and a desired number of output channels or threshold, compares the distributions of the features within the data, and returns the most dissimilar grouping.
Design:
where
Usage: for reducing the dimensionality of an input by finding correlating features within and removing redundancy.
Value: provide support to high-dimensional raster processing applications (e.g., data fusion, hyperspectral, multispectral)
Additional Notes or Context
Some distance metrics already available to build from:
The text was updated successfully, but these errors were encountered: