Skip to content

alok-ai-lab/pyDeepInsight

Repository files navigation

pyDeepInsight

This package provides a python implementation of alok-ai-lab/DeepInsight as originally described in DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture [1]. This is not guaranteed to give the same results as the published MatLab code and should be considered experimental.

Installation

python3 -m pip -q install git+https://github.com/alok-ai-lab/pyDeepInsight.git#egg=pyDeepInsight

Overview

DeepInsight is a methodology to transform non-image data into image format suitable for analysis by image-based machine learning, such as convolutional neural networks (CNNs). pyDeepInsight provides access to these methodologies via classes.

Image Transformation

DeepInsight maps high-dimensional biological data onto a two-dimensional grid, facilitating the identification of patterns and relationships within the data, through machine learning.

Feature Topology

Feature transformation can be done using any feature reduction algorithm that provides 2D output. Commonly used methods include the following. The these example data are 172 randomly selected genes from TCGA RNA-seq processed into a 12x12 pixel image.

PCA

sklearn.decomposition.PCA

image Left: First two PCs, with the convex hull in red and the minimum bounding box in green. Middle: Density matrix for the number of features mapped to a given pixel. Right: Generated images for a selection of samples.

t-SNE

sklearn.manifold.TSNE

image Left: t-SNE embedding vectors, with the convex hull in red and the minimum bounding box in green. Middle: Density matrix for the number of features mapped to a given pixel. Right: Generated images for a selection of samples.

UMAP

umap.UMAP

image Left: UMAP embedding, with the convex hull in red and the minimum bounding box in green. Middle: Density matrix for the number of features mapped to a given pixel. Right: Generated images for a selection of samples.

Pixel Mapping

To reduce feature-to-pixel mapping collisions, alternative discretization methods are available, such as the linear sum assignment algorithm.

image Upper-Left: Default 'bin' discretization of features from the UMAP-embedding space to the pixel space, represented by red arrows. The number of features mapped to each pixel is shown. Upper-Middle: Red arrows indicating the pixel mapping of each feature using the 'lsa' discretization method. Upper-Right: The number of features in each pixel after 'lsa' discretization. Bottom-Left: Generated images for a selection of samples using 'bin' discretization. Bottom-Right: enerated images for the same selection of samples using 'lsa' discretization.

Classes

The pyDeepInsight package provides classes that aid in the transformation of non-image data into image matrices that can be used to train a CNN.

ImageTransformer Class

Transforms features to an image matrix using dimensionality reduction and discretization.

class pyDeepInsight.ImageTransformer(feature_extractor='tsne', 
discretization='bin', pixels=(224, 224))

Parameters

  • feature_extractor: {'tsne', 'pca', 'kpca'} or a class instance with method 'fit_transform'
    The string values use the same parameters as those described in the original paper. Providing a class instance is preferred and allows for greater customization of the feature extractor, such as modifying perplexity in t-SNE, or use of alternative feature extractors, such as UMAP.
  • discretization: {'bin', 'lsa', 'ags'}, default='bin'.
    Defines the method for discretizing dimensionally reduced data to pixel coordinates.
    By default, 'bin', is the method implemented in the original paper and maps features to pixels based on a direct scaling of the extracted features to the pixel space.
    The 'lsa' method applies SciPy's solution to the linear sum assignment problem to the exponent of the Euclidean distance between the extracted features and pixels to assign a features to pixels with no overlap. In cases where the number of features exceeds the number of pixels, Bisecting K-Means clustering is applied to the feature prior to discretization, with k equal to the number of pixels.
    In cases where 'lsa' takes too long or does not complete, the heuristic method, Asymmetric Greedy Search, can be applied with the 'ags' option. In cases where the number of features exceeds the number of pixels, Bisecting K-Means clustering is applied to the feature prior to discretization, with k equal to one less than the number of pixels.
  • pixels: int or tuple of ints, default=(224, 224)
    The size of the image matrix. A default of 224 × 224 is used as that is the common minimum size expected by torchvision and timm pre-trained models.

Methods

  • fit(X[, y=None, plot=False]): Compute the mapping of the feature space to the image space.
  • transform(X[, y=None, img_format='rgb']): Perform feature space to image space mapping.
  • fit_transform(X[, y=None]): Fit to data, then transform it.
  • pixel([pixels]): Get or set the image dimensions
  • inverse_transform(img): Transform from the image space back to the feature space.

CAMFeatureSelector Class

Extracts important features from a trained PyTorch model using class activation mapping (CAM) as proposed in DeepFeature: feature selection in nonimage data using convolutional neural network [2].

class DeepInsight.CAMFeatureSelector(model, it, target_layer, cam_method="GradCAM")

Parameters

  • model: pytorch.nn.Module class
    The CNN model should be trained on the output from an ImageTransformer instance. The torchvision.models subpackage provides many common CNN architectures.
  • it: ImageTransformer class
    The ImageTransformer instance used to generate the images on which the model was trained.
  • target_layer: str or pytorch.nn.Module
    The target layer of the model on which the CAM is computed. Can be specified using the name provided by nn.Module.named_modules or a by providing a pointer to the layer directly. If no layer is specified, the last non-reduced convolutional layer is selected as determined by the 'locate_candidate_layer' method of the TorchCAM [3] package by François-Guillaume Fernandez.
  • cam_method: the name of a CAM method class provided by the pytorch_grad_cam [4] package by Jacob Gildenblat. Default is "GradCAM".

Methods

  • calculate_class_activations(X, y, [batch_size=1, flatten_method='mean']): Calculate CAM for each input then flatten for each class.
  • select_class_features(cams, [threshold=0.6]): Select features for each class using class-specific CAMs. Input feature coordinates are filtered based on activation at same coordinates.

Example Jupyter Notebooks

References

[1] Sharma A, Vans E, Shigemizu D, Boroevich KA, & Tsunoda T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci Rep 9, 11399 (2019). https://doi.org/10.1038/s41598-019-47765-6

[2] Sharma A, Lysenko A, Boroevich KA, Vans E, & Tsunoda T. DeepFeature: feature selection in nonimage data using convolutional neural network, Briefings in Bioinformatics, Volume 22, Issue 6, November 2021, bbab297. https://doi.org/10.1093/bib/bbab297

[3] François-Guillaume Fernandez. (2020). TorchCAM: class activation explorer. https://github.com/frgfm/torch-cam

[4] Jacob Gildenblat, & contributors. (2021). PyTorch library for CAM methods. https://github.com/jacobgil/pytorch-grad-cam

Citation

@ARTICLE{Sharma2019-rs,
  title     = "{DeepInsight}: A methodology to transform a non-image data to an
               image for convolution neural network architecture",
  author    = "Sharma, Alok and Vans, Edwin and Shigemizu, Daichi and
               Boroevich, Keith A and Tsunoda, Tatsuhiko",
  journal   = "Sci. Rep.",
  volume    =  9,
  number    =  1,
  pages     = "11399",
  year      =  2019,
}

About

A python implementation of the DeepInsight methodology.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages