Skip to content

LynneYan/HIsomap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HIsomap

This is a Python implementation of homology-preserving dimension reduction(DR) algorithm. The implementation is described in "Homology-Preserving Dimensionality Reduction via Manifold Landmarking and Tearing".

There is also a demo for both homology-preserving manifold landmarking and tearing.

Installation

Tested with Python 2.7&3.7, MacOS and Linux.

Dependencies

HIsomap requires:

  • Python (>= 2.7 or >= 3.3)
  • NumPy
  • sklearn

Running examples requires:

  • matplotlib

Installation

Python2

$ git clone https://github.com/LynneYan/HIsomap.git
$ cd HIsomap
$ sudo python setup.py install

Python3

$ git clone https://github.com/LynneYan/HIsomap.git
$ cd HIsomap
$ sudo python3 setup.py install

Checking your HIsomap Installation

If you applied all the above steps successfully, you can open terminal and see "HIsomap X.XX" in pip list.

Python2

$ pip list

Python3

$ pip3 list

Run example

Python2

$ python example.py

Python3

$ python3 example.py

Features

class HIsomap(n_components=2, filter_function="base_point_geodesic_distance", BP='EP', nr_cubes=20, 
              overlap_perc=0.2, auto_tuning="off", n_neighbors=8, eigen_solver='auto', n_jobs=1, 
              clusterer=sklearn.cluster.DBSCAN(eps=0.6, min_samples=5))

Parameters

  • n_components, int, optional, default: 2

    • Number of dimensions in which to immerse the dissimilarities.
  • filter_function, string, optional, default: "base_point_geodesic_distance"

    • A string from ["sum", "mean", "median", "max", "min", "std", "dist_mean", "l2norm", "knn_distance_n", "height", "width", "base_point_geodesic_distance", "dist_mean", "eccentricity", "Guass_density", "density_estimator", "integral_geodesic_distance", "graph_Laplacian", "Guass_density_auto"].
    • If using knn_distance_n write the number of desired neighbors in place of n: knn_distance_5 for summed distances to 5 nearest neighbors.
    • If using base_point_geodesic_distance, you can adjust the parameter "BP" to locate the base point.
  • BP, string, optional, default: "EP"

    • A string from ["EP", "BC", "DR"].
    • EP means extremal point, BC means barycenter, and DR means densest region.
  • nr_cubes, int, optional, default: 20

    • The number of intervals/hypercubes to create.
  • overlap_perc, float, optional, default: 0.2

    • The percentage of overlap "between" the intervals/hypercubes.
  • auto_tuning, string, optional, default: "off"

    • A string from ["off", "on"].
    • If "off", the input data will be divided into nr_cube cubes with fixed length of interval.
    • If "on", the input data will be divided into nr_cube cubes where each cube contain roughly the same number of points. In this case, the lengths of intervals are different. This means, in dense region, there will be more number of cubes than other regions when auto_tuning is "on".
  • n_neighbors, int, optional, default: 8

    • Number of neighbors to consider for each point in Isomap.
  • eigen_solver, string, optional, default: "auto"

    • A string from ["auto", "arpack", "dense"].
    • auto: Attempt to choose the most efficient solver for the given problem.
    • arpack: Use Arnoldi decomposition to find the eigenvalues and eigenvectors.
    • dense: Use a direct solver (i.e. LAPACK) for the eigenvalue decomposition.
  • n_jobs, int or None, optional, default: 1

    • The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.
  • cluster, algorithm, optional, default: sklearn.cluster.DBSCAN(eps=0.6, min_samples=5)

    • Scikit-learn API compatible clustering algorithm. Must provide fit and predict.
__init__(self, n_components=2, filter_function="base_point_geodesic_distance", BP='EP', nr_cubes=20, 
         overlap_perc=0.2, auto_tuning="off", n_neighbors=8, eigen_solver='auto', n_jobs=1, 
         clusterer=sklearn.cluster.DBSCAN(eps=0.6, min_samples=5))

Initialize self.

fit_transform(self, X, y=None, init=None)

Fit the data from X, and returns the embedded coordinates.

Parameters

  • X, array, shape (n_samples, n_features).

    • Input data.
  • y, Ignored

  • init, ndarray, shape (n_samples,), optional, default: None

    • Starting configuration of the embedding to initialize the SMACOF algorithm. By default, the algorithm is initialized with a randomly chosen array.

Returns

  • Y, array, shape (n_samples, n_components)
    • Projected output.
get_landmark_index(self)

Returns

  • landmarks_indexes, int list
    • The indexes of landmarks in input data.
get_skeleton_nodes(self)

Returns

  • landmarks, ndarray, shape (n_landmarks, n_features).
    • Nodes of mapper graph in original domain.
get_skeleton_links(self)

Returns

  • skeleton, ndarray, shape (n_links, 2).
    • Edges of mapper graph.
get_scalar_value(self)

Returns

  • lens, Numpy Array, shape (, n_samples)
    • Scalar values of input data. Lower dimensional representation of data.
get_base_point(self)

Returns

  • basePoint, Numpy Array, shape (, n_features).
    • Base point in original domain.

Usage

Python2 code

# Import denpendencies
import numpy as np

# Import the class
from HIsomap import HIsomap

# Sample data "Swiss hole"
file_name = './data/SwissHole.txt'
X = np.loadtxt(file_name)

# Initialize. In this example, the number of cubes is 25 and auto_tuning is enabled. Other parameters are using the default.
proj = HIsomap(nr_cubes=25, auto_tuning="on")

# Fit to and transform the data. Y is projected result in 2 dimensional space.
Y = proj.fit_transform(X)

# You can also get the 'mapper graph' with nodes and edges.
proj.get_skeleton_nodes()
proj.get_skeleton_links()

Python3 code

# Import denpendencies
import numpy as np
import sklearn

# Import the class
from HIsomap import HIsomap

# Sample data "octa"
file_name = './data/octa.txt'
X = np.loadtxt(file_name)

# Initialize. The auto_tuning is turned off. And we used parameters from our paper.
proj =  HIsomap.HIsomap(nr_cubes=20, overlap_perc=0.2, clusterer=sklearn.cluster.DBSCAN(eps=150, min_samples=5), filter_function="base_point_geodesic_distance", BP="BC", auto_tuning="off")

# Fit to and transform the data. Y is projected result in 2 dimensional space.
Y = proj.fit_transform(X)

# You can also get the 'mapper graph' with nodes and edges.
proj.get_skeleton_nodes()
proj.get_skeleton_links()

Citation

License

Standard MIT disclaimer applies, see LICENSE for full text.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages