Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rw normalization non-negotiable for DiffusionMap? #33

Open
SimonEnsemble opened this issue Feb 10, 2023 · 6 comments
Open

rw normalization non-negotiable for DiffusionMap? #33

SimonEnsemble opened this issue Feb 10, 2023 · 6 comments

Comments

@SimonEnsemble
Copy link
Contributor

for the diffusion map, by default α::Real=0.0 and this code here to normalize the kernel matrix is not called.

I'm pretty sure the normalize!(L, α=α, norm=:rw) line is non-negotiable. otherwise, the kernel matrix will not have columns sum to one, breaking the theory behind the diffusion map. pretty sure this needs to happen to turn the matrix into a Markov transition matrix. see Sec. 2.1 here. agree? disagree?

@SimonEnsemble
Copy link
Contributor Author

I don't think the normalization below is correct, anyway.

        normalize!(L, α=α, norm=:sym)     # Lᵅ = D⁻ᵅ*L*D⁻ᵅ
        normalize!(L, α=α, norm=:rw)      # M =(Dᵅ)⁻¹*Lᵅ

see Wikipedia.
normalize!(L, α=α, norm=:sym) is correct. but then it should be normalize!(L, α=1, norm=:rw). I think this and the above issue can be solved by putting normalize!(L, α=1, norm=:rw) after the if α> 0 (...) end.

normalize!(L, α=α, norm=:sym) line takes the kernel matrix and translates it to L_new = D⁻ᵅLD⁻ᵅ, yes. correct. but then we should make sure the columns sum to one by doing normalize!(L, α=1, norm=:rw). on Wikipedia, D^(alpha) is denoting a degree matrix not D to the alpha power. so this line should be α=1 for the :rw normalization. agree? disagree?

@SimonEnsemble
Copy link
Contributor Author

eg on ManifoldLearning v0.9.0, I am working with the simple toy swiss roll in 2D, and it cannot even recover that with many different Gaussian kernel bandwidths I'm trying.

Screen Shot 2023-02-09 at 10 24 50 PM

@SimonEnsemble
Copy link
Contributor Author

another bug I found:

decompose assumes that L is symmetric. but it is not necessarily. this introduces another error.

here is my code, which does not assume L is symmetric:

Screenshot 2023-02-10 at 12 11 10 PM

# source https://www.crcv.ucf.edu/gauss/info/project/ccs/diffusion-maps-finite.pdf
function cory_diffusion_map((X)::Matrix{Float64}, kernel::Function, d::Int; t::Int=1)
	# n_features = size(X)[1]
	# for f = 1:n_features
	# 	X[f, :] = (X[f, :] .- mean(X[f, :]))/ std(X[f, :])
	# end
	# compute Laplacian matrix
	K = pairwise(kernel, eachcol(X), symmetric=true)
	
	# normalize, to be a Markov chain transition matrix
	D = diagm(vec(sum(K, dims=1)))
	D⁻¹ =  diagm(1 ./ diag(D))
	L = D⁻¹ * K
	@assert all(sum.(eachrow(L)) .≈ 1.0)

	# eigen-decomposition
	decomp = eigen(L)
	# skip first (1.0) eigenvalue. sort highest to lowest otherwise.
	idx = sortperm(decomp.values, rev=true)[2:end]
	λs = decomp.values[idx][1:d]
	Vs = decomp.vectors[:, idx][:, 1:d] * diagm(λs .^ t)
	return Vs
end

@SimonEnsemble
Copy link
Contributor Author

I now see that the current setup of fit(DiffMap, ...) would only pass a symmetric matrix.

anyway, is a PR welcome with three changes:

  1. make sure L is a right-stochastic matrix.
    # normalize it
    if α > 0
        normalize!(L, α=α, norm=:sym)     # L̃ = D⁻ᵅ*L*D⁻ᵅ
    end
    # random walk
    normalize!(L, norm=:rw)      # M =(D̃)⁻¹ * L̃
  1. change decompose call to use skipfirst=true. need to skip first, constant eigenvector. pretty sure current setup of skipfirst=false is incorrect.
  2. change very dangerous decompose function which converts any matrix into a symmetric one (even if it is not) for eigendecomposition, to handle non-symmetric L.

I'm reasonably confident the current set-up is broken since it cannot even handle the 2D S-shaped curve. my diffusion map code is able to beautifully recover the S-shaped curve.

thanks.

@iabraham
Copy link

Just ran into the same issue and came to check if someone noticed the skipfirst=false in the source. Good catches @SimonEnsemble

@SimonEnsemble
Copy link
Contributor Author

here is our implementation of diffusion maps we're using for a project.
https://github.com/SimonEnsemble/DiffusionMap.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants