In [7]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


##Spectral Clustering

Without getting too much into the details. In order to cluster the graph into two clusters, we use the second smallest eigenvalue. 

As we saw in the Laplacian and its eigenvalues notebook, the smallest eigenvalue of the Laplacian is always zero. This corresponds to an eigenvalue proportional to a vector of all constant values.

The second smallest eigenvalue, will have an eigenvector that roughly splits the vertices of the graph, with half of the vertices corresponding to positive values in the eigenvector and half of the vertices corresponding to negative values. This makes it easy to form two clusters.

Since we are only looking for one eigenvalue (the second lowest one) and its corresponding eigenvector. The QR method, presented before, seems a bit too over-blown. In this notebook, we will examine how to find single eigenvectors and their eigenvalues.

This can be accomplished by using various versions of what has been called the "power method". As before, because I used a variety of sources, I have a variety of different algorithms that can be used in a variety of situations.

###Hoffman's Power Method


1. Assume a trial value $\mathbf{x}^{(0)}$. Choose one component to be unity. Designate that the unity component.

2. Perform the matrix multiplication: $\mathbf{A}\mathbf{x}^{(0)} = \mathbf{y}^{(1)}$

3. Scale $\mathbf{y}^{(1)}$ so that the unity component remains unity:

$$\mathbf{y}^{(1)} = \lambda^{(1)} \mathbf{x}^{(1)}$$

4. Repeat: 

$$\mathbf{A}\mathbf{x}^{(k)} = \mathbf{y}^{(k+1)} = \lambda^{(k+1)} \mathbf{x}^{(k+1)}$$


In [17]:
def hoffman_power(A):
    n = A.shape[0]
    x = ones(n) #note, this dies on the laplacian, see below
    lmda = 0
    for i in range(30):
        last = lmda
        y = dot(A,x)
        lmda = y[0]
        x = y/lmda
    error = last - lmda    
    return (lmda, x, abs(error))    
    

A = array([[8,-2,-2],[-2,4,-2],[-2,-2,13]])
hoffman_power(A)

(13.870603889183611,
 array([ 1.        ,  0.49178084, -3.42707923]),
 1.1429145457597656e-05)

The method, just presented finds the biggest eigenvalue. What will often be of interest (although not for clustering, but included here for completeness) is finding the smallest eigenvalue. We do this with the inverse power method:

###Inverse Power Method

1. Solve for $\mathbf{L}$ and $\mathbf U$ such that $\mathbf{LU} = \mathbf{A}$

2. Assume $\mathbf{x}^{(0)}$. Designate a component of $\mathbf{x}$ to be unity.

3. Solve for $\mathbf{x}'$ by forward substitution: $\mathbf{L} \mathbf{x}' = \mathbf{x}^{(0)}$

4. Solve for $\mathbf{y}^{(1)}$ by back substitution: $\mathbf{U}\mathbf{y}^{(1)} =  \mathbf{x}' $

5. Scale $\mathbf{y}^{(1)}$ so that the unity component is unity. Thus, $\mathbf{y}^{(1)} = \lambda_{\text{inverse}}^{(1)} \mathbf{x}^{(1)} $

6. Repeat

$$\mathbf{L} \mathbf{x}' = \mathbf{x}^{(k)}$$

$$ \mathbf{U}\mathbf{y}^{(k+1)} =  \mathbf{x}'$$

$$ \mathbf{y}^{(k+1)} = \lambda_{\text{inverse}}^{(k+1)} \mathbf{x}^{(k+1)} $$

In [25]:
from scipy.linalg import lu


def hoffman_inverse_power(A):
    l,u = lu(A, permute_l=True)
    n = A.shape[0]
    x = ones(n)
    x_prime = ones(n)
    y = ones(n)
    lmda = 0
    
    for i in range(10):
        last = lmda
        for j in range(n):
            x_prime[j] = x[j]/l[j][j]
            for k in range(j):
                x_prime[j] -= x_prime[k]*l[j][k]

        for j in reversed(range(n)):
            y[j] = x_prime[j]/u[j][j]
            for k in range(j+1, n):
                y[j] -= y[k]*u[j][k]/u[j][j]

        lmda = y[0] 
        x = y/lmda

    return (1/lmda, x, abs(lmda-last))    
    
A = array([[8,-2,-2],[-2,4,-2],[-2,-2,13]])
hoffman_inverse_power(A)    

(2.5090028139770366,
 array([ 1.        ,  2.14578754,  0.59971106]),
 8.4845877590389307e-06)

###Shifted Power Method

The eigenvalues of a matrix $\mathbf{A}$ may be shifted by a scalar s by subtracting $s\mathbb{I} \mathbf{x} = s \mathbf{x}$ from both sides of the standard eigenproblem, $\mathbf{A} \mathbf{x} = \lambda \mathbf{x} $. Thus,

$$\mathbf{A} \mathbf{x}-s\mathbb{I}\mathbf{x}  = \lambda \mathbf{x} -s \mathbf{x}$$

$$(\mathbf{A} - s\mathbb{I})\mathbf{x}  = (\lambda -s )\mathbf{x}$$

$$\mathbf{A}_{\text{shifted}} \mathbf{x} = \lambda_{\text{shifted}} \mathbf{x}$$

Shifting the eigenvalues of a matrix can be used to:

1. Find the opposite extreme eigenvalue.

2. Find intermediate eigenvalues

3. Accelerate convergence 

In [26]:
A = array([[8,-2,-2],[-2,4,-2],[-2,-2,13]])
value, vector, error = hoffman_power(A)
A_shifted = A - eye(3)*value
value2, vector, error = hoffman_power(A_shifted)
value2 +value

2.508980800011047

##Direct Method

The characteristic equation is obtained from

$$\text{det}(\mathbf{A} - \lambda \mathbb{I}) = 0$$

Rather than solving for the roots of the characteristic equation directly, we instead solve the characteristic equation iteratively (my sentences). "This can be accomplished my applying the secant method. Two intitial approximations of $\lambda$ are assumed, $\lambda_0$ and $\lambda_1$, the corresponding values of the cahracteristic determinant are computed, and these results are used to construct a linear relationship between $\lambda$ and the value of the characteristic determinant. The solution of that linear relationship is taken as the next approximation to $\lambda$, and the procedure is repeated to convergence

In [27]:
def direct_method(A, guess_1, guess_2):
    if(abs(guess_2 - guess_1)< 1e-5):
        return guess_2
    else:
        n = A.shape[0]
        f_1 = det(A - eye(n)*guess_1)
        f_2 = det(A - eye(n)*guess_2)
        slope = (f_2 - f_1)/(guess_2 - guess_1)
        guess_3 = guess_2 - f_2/slope
        return direct_method(A, guess_2, guess_3)
    
    
direct_method(A,15, 13)

13.870585123309459

From http://www.math.tamu.edu/~dallen/linear_algebra/chpt5.pdf

###Finding Nondominant Eigenvalues

Once the dominant eigenpair $(\lambda_1, V_1)$ of $A$ has been computed, we may wish to compute $\lambda_2$. If $A$ is symmetric, it can be proved that if $U_1 = V_1 / |V_1|$, then

$$A^{(2)} = A - \lambda_1 U_1 U_1^T$$

has eigenvalues $0, \lambda_2, \lambda_3, ..., \lambda_n$ 

In [36]:
A = array([[2,-1,-1,0,0,0],[-1,2,-1,0,0,0],[-1,-1,3,-1,0,0],[0,0,-1,3,-1,-1],[0,0,0,-1,2,-1],[0,0,0,-1,-1,2]])
print(sort(eig(A)[0]))

def hoffman_power(A):
    n = A.shape[0]
    x = zeros(n)
    x[0] = 1
    lmda = 0
    for i in range(40):
        last = lmda
        y = dot(A,x)
        lmda = y[0]
        x = y/lmda
    error = last - lmda    
    return (lmda, x, abs(error))    

max_e_value = hoffman_power(A)[0]
print(max_e_value)

#now we shift
A_1 = A - max_e_value*eye(6)

#and then, we know that there is an eigenvalue with -max_e_value and all 1's eigenvector
A_2 = A_1 + max_e_value*outer(ones(6),ones(6))/6

#and then we can calculate the second smallest eigenvalue easily: 
hoffman_power(A_2)[0]+max_e_value

[  4.66449861e-17   4.38447187e-01   3.00000000e+00   3.00000000e+00
   3.00000000e+00   4.56155281e+00]
4.56155067632


0.43844718719117015

In [40]:
e,v = eig(A)
print(e[3])
v[:,3]

0.438447187191


array([ 0.46470513,  0.46470513,  0.26095647, -0.26095647, -0.46470513,
       -0.46470513])

In [45]:
hoffman_power(A_2)[1]*0.4647

array([ 0.4647    ,  0.4647    ,  0.26095359, -0.26095359, -0.4647    ,
       -0.4647    ])

And finally, we see that we have successfully clustered the nodes into positive and negative nodes which constitute a cluster.