Skip to content

Latest commit

 

History

History
45 lines (30 loc) · 2.21 KB

theory.rst

File metadata and controls

45 lines (30 loc) · 2.21 KB

Theory

Diffusion maps is a dimension reduction technique that can be used to discover low dimensional structure in high dimensional data. It assumes that the data points, which are given as points in a high dimensional metric space, actually live on a lower dimensional structure. To uncover this structure, diffusion maps builds a neighborhood graph on the data based on the distances between nearby points. Then a graph Laplacian L is constructed on the neighborhood graph. Many variants exist that approximate different differential operators. For example, standard diffusion maps approximates the differential operator

\mathcal{L}f = \Delta f - 2(1-\alpha)\nabla f \cdot \frac{\nabla q}{q}

where \Delta is the Laplace Beltrami operator, \nabla is the gradient operator and q is the sampling density. The normalization parameter \alpha, which is typically between 0.0 and 1.0, determines how much q is allowed to bias the operator \mathcal{L}. Standard diffusion maps on a dataset X, which has to given as a numpy array with different rows corresponding to different observations, is implemented in pydiffmap as:

mydmap = diffusion_map.DiffusionMap.from_sklearn(epsilon = my_epsilon, alpha = my_alpha)
mydmap.fit(X)

Here epsilon is a scale parameter used to rescale distances between data points. We can also choose epsilon automatically due to an an algorithm by Berry, Harlim and Giannakis:

mydmap = dm.DiffusionMap.from_sklearn(alpha = my_alpha, epsilon = 'bgh')

For additional optional arguments of the DiffusionMap class, see usage and documentation.

A variant of diffusion maps, 'TMDmap', unbiases with respect to q and approximates the differential operator

\mathcal{L}f = \Delta f + \nabla (\log\pi) \cdot \nabla f

where \pi is a 'target distribution' that defines the drift term and has to be known up to a normalization constant. TMDmap is implemented in pydiffmap as:

mydmap = diffusion_map.TMDmap(epsilon = my_epsilon, alpha = 1.0, change_of_measure=com_fxn)
mydmap.fit(X)

where com_fxn is function that takes in a coordinate and outputs the value of the target distribution \pi .