# spectral graph convolutional network (GCN)

Spectral Graph Convolutional Networks (Spectral GCNs) are a class of Graph Neural Networks (GNNs) that perform graph convolutions in the spectral domain. 

They rely on graph spectral theory, which uses the graph Laplacian and its eigenvalue decomposition to analyze graphs.

Spectral GCNs define graph convolutions as the multiplication of the node feature matrix with a **spectral filter**. 

# vertex-domain localization vs. frequency-domain smoothness

smoothness of a graph signal is determined by its relationship with the graph Laplacian eigenvalues. 


if a graph signal is smooth (slowly varying) in the vertex (spatial) domain, it is associated with lower graph Laplacian eigenvalue, it will be localized in the frequency (spectral) domain, meaning they have most of their energy concentrated in the lower frequencies of the spectral domain.

Conversely, if a signal is localized in the spatial domain (i.e., it has sharp changes or rapid variations), it is associated with higher graph Laplacian eigenvalue, it tends to be smooth (spread out) in the spectral domain, with its energy distributed across a wider range of frequencies.

# signal processing

## classical signal processing

### Fourier transform

Fourier domain: also known as frequency domain or spectral domain, is a representation of a signal or function in terms of its frequency components. 

Fourier domain is obtained by applying the Fourier Transform (for continuous signals) or the Discrete Fourier Transform (for discrete signals) to the time-domain signal, 

Fourier Transform decomposes a signal into a sum of sinusoidal functions (sines and cosines) with different frequencies, amplitudes, and phases. 

Fourier Transform provides an alternative view of a signal's behavior in the frequency domain, highlighting its global frequency characteristics instead of its time or spatial characteristics.

Fourier domain representation is essential for 

- analyze properties of signals or functions

- design signal processing techniques: such as filtering, compression, and denoising.

### wavelet transform

- wavelet transform is an operation that decomposes a signal into a set of wavelet coefficients using wavelet basis functions

- wavelet coefficients: components of the wavelet-transformed signal that represent the signal's local time-frequency characteristics at different scales. 

- A wavelet is a mathematical function used as a basis function in wavelet analysis.

- main advantage of wavelet transform is their ability to provide multi-scale, localized analysis, which makes them particularly useful for processing signals with non-uniform or transient features.

## graph signal processing

### Graph Fourier transform

Fourier domain: spectral representation of graph signals, which are functions defined on the vertices of a graph. 

Fourier domain is obtained by appling Graph Fourier Transform (GFT) to graph signals in the vertex (spatial) domain.

The GFT relies on the graph Laplacian's eigenvalue decomposition, where eigenvectors form the graph's Fourier basis/spectrum, and the eigenvalues correspond to graph frequencies.

$$
X' = g_{\theta} (L)X = g_{\theta} (U\Lambda U^T)X = U g_{\theta} (\Lambda) U^T X
$$

1. Graph Fourier Transform (GFT) to the node feature matrix in the spatial domain:
   
   $$X_{\text{spectral}} = U^T X_{\text{spatial}}$$

2. Apply GFT filter in the spectral domain:
   
   $$X_{\text{filtered}} = g_\theta(\Lambda) \odot X_{\text{spectral}}$$

3. Inverse Graph Fourier Transform (Inverse GFT) to obtain the transformed node features in the spatial domain:

   $$X'_{\text{spatial}} = U X_{\text{filtered}}$$

   $X$ is the node feature matrix in the spatial domain

   $L = D - A$ is graph Laplacian matrix, where $D$ is the degree matrix and $A$ is the adjacency matrix.

   $U$ is the eigenvector matrix of the graph Laplacian

   $\Lambda$ is the eigenvalue matrix of the graph Laplacian

   $g_\theta(\Lambda)$ is the GFT filter function

   $\odot$ denotes element-wise multiplication

### graph wavelet transform

$$
X' = \psi_s (L)X = \psi_sg_{\theta} (U\Lambda U^T)X = U \psi_s (\Lambda) U^T X
$$

1. Define graph wavelet function in the spectral domain: which captures the desired time-frequency characteristics, where $s$ is the scale parameter.

    $$\psi_s(x) = e^{-\frac{x^2}{2s^2}}$$

2. Compute graph Laplacian: where $D$ is the degree matrix and $A$ is the adjacency matrix.

    $$L = D - A$$

3. Compute eigendecomposition of graph Laplacian: $$L = U \Lambda U^T$$ where $U$ is the matrix of eigenvectors and $\Lambda$ is the diagonal matrix of eigenvalues.

4. Wavelet Transform: $$X' = U \psi_s(\Lambda) U^T X$$

    where $X$ is the node feature matrix in spatial domain.

    $X'$ is transformed signal in spectral domain.

    $\psi_s(\Lambda)$ is the diagonal matrix formed by applying the wavelet function element-wise to the eigenvalues in $\Lambda$.

5. Inverse Wavelet Transform: $$X = U \psi_s^{-1}(\Lambda) U^T X'$$


# spectral filter

spectral filter is a function of graph Laplacian's eigenvalues, denoted as $g_\theta(\Lambda)$, where $\theta$ are the trainable parameters, and $\Lambda$ is the eigenvalue matrix of the graph Laplacian. 

The spectral filter operates on the graph Fourier basis, and different types of filters can be used to capture various graph properties. 

- GFT filter: computational inefficient and global

- polynomial filters: computational efficient and localized

- wavelet filter: computational inefficient and localized

- diffusion filter: computational efficient

## GFT filter

**Directly applying GFT filters in GCN can be computationally expensive** due to several factors:

- Eigenvalue decomposition of graph Laplacian: time complexity of $O(n^3)$ for an n x n matrix, which can be quite expensive for large graphs.

- Multiplication with eigenvectors: multiplying the node feature matrix with the eigenvector matrix $U$ and its transpose $U^T$ involves two matrix multiplications, each with a time complexity of $O(n^2f)$ for an $n x f$ matrix, where $f$ is the number of features. 

    This can also become computationally expensive for large graphs with many nodes and features.

**GFT filter is a global operation**, can't capture local structure in signals

- the filter is applied in the spectral domain based on global frequency characteristics of the graph which represent the entire graph structure, not just the local neighborhood of each vertex.

## polynomial filter

Polynomial filters are called approximations because they approximate spectral filters, which can be computationally expensive or difficult to work with due to their global nature or non-localized operations.

Polynomial filters are more computationally efficient and localized filter while preserving the key characteristics of the original filter.


Polynomial filters are constructed as a linear combination of basis functions

$$
g_{\theta} (\Lambda)  = \sum_{k=0}^{K-1}\theta_k \Lambda ^k
$$

### localization and smoothness of filter

localization of filter can be controlled by the order (K) of polynomial

- order determines the number of coefficients and, consequently, the degree of localization. (K coefficients gives K-node localization)

- A lower-order polynomial filter will be smoother in the frequency domain and more localized in the vertex domain. This means that the filter will focus more on the local structure of the signal, capturing patterns and features present in the immediate neighborhood of each vertex. However, a lower-order filter may not be able to capture more complex or global patterns in the graph.

- a higher-order polynomial filter will be less smooth in the frequency domain and less localized in the vertex domain. This filter can capture more global patterns and structures in the graph but may lose some of the local information.

smoothness of filter can be controlled by Coefficients of the polynomial filter

- coefficients determine the weights of the different polynomial terms in the filter.

- coefficients are parameters learned during training to best capture the graph's underlying structure and patterns.

### advantage

Computational efficiency: no explicit eigendecomposition, instead operate directly on the graph Laplacian or the normalized adjacency matrix, making them suitable for large-scale graphs and real-time processing.

Vertex-domain localization: Polynomial filters can capture local structure in the graph while preserving essential frequency characteristics, enabling GCN to learn both local and global graph patterns.

Flexibility: **smoothness and localization of polynomial filters can be controlled by adjusting order and coefficients**, allowing for a trade-off between accuracy and computational complexity.

### Chebyshev filter

Chebyshev filter use Chebyshev polynomials as basis functions to approximate the ideal spectral filter. 

ChebNet, a popular Spectral GCN, applies Chebyshev filters to the graph Laplacian.

$$
g_{\theta} (\Lambda)  = \sum_{k=0}^{K-1}\theta_k T_k (\tilde \Lambda)
$$

$g_\theta(\Lambda)$ represents the Chebyshev filter

$T_k$ is the Chebyshev polynomial of order $k$

$\tilde{\Lambda}$ is the scaled eigenvalue matrix of the graph Laplacian

$\theta_k$ are the trainable parameters

$K$: order or degree of the polynomial. $K$ coefficients gives $K$-node localization in vertex domain

Chebyshev polynomials (Chebyshev basis)

- defined recursively

$$
T_k(X):= \left\{\begin{matrix}
1 & k=0  \\
X & k=1 \\
2yT_{k-1}(X) - T_{k-2}(X) & k \geq 2  \\
\end{matrix}\right.
$$

- a set of orthogonal polynomials, that form a basis for the space of continuous functions on a specific interval, usually [-1, 1]. 

- have unique properties, making them useful for approximating functions, solving differential equations, and performing spectral analysis on graphs.

    Orthogonality: Chebyshev polynomials are orthogonal with respect to the weight function $w(x) = \frac{1}{\sqrt{1-x^2}}$ on the interval [-1, 1]:

    $$\int_{-1}^{1} w(x)T_m(x)T_n(x) dx = \begin{cases}
                                         0 & \text{if } m \neq n \\
                                         \pi & \text{if } m = n = 0 \\
                                         \frac{\pi}{2} & \text{if } m = n \neq 0
                                     \end{cases}$$

    Trigonometric relation: Chebyshev polynomials can be expressed using trigonometric functions:

    $$T_n(\cos(\theta)) = \cos(n\theta)$$

## wavelet filter

Pros:

- Multi-scale analysis: Wavelet filters capture information at different scales, enabling GNNs to analyze and learn hierarchical features.

- Localization: Wavelet filters provide localized analysis in both vertex and frequency domains, effectively capturing local structure and patterns.

- Adaptability: Graph wavelets can be designed to adapt to graph-specific characteristics, making them suitable for various problems and datasets.

Cons:

- Complexity: eigendecomposition of graph Laplacian limiting their applicability in large-scale or real-time graph processing scenarios.

## diffusion filter

Diffusion wavelets are a wavelet-based approach designed to work with the diffusion process on graphs. 

They do not involve eigendecomposition because they are built on the diffusion operator, which directly captures the graph's connectivity structure and allows for localized and multiscale analysis of the graph signal.

1. Define diffusion operator: $P = \frac{1}{2}(I + AD^{-1})$, where $A$ is the adjacency matrix, $D$ is the degree matrix, and $I$ is the identity matrix.

2. Compute powers of diffusion operator: $\Psi_j = P^{2^{j-1}} - P^{2^{j}}$, where $j$ is the scale level.

3. Diffusion Wavelet Transform: $W_{j} = \Psi_{j} X$, where $X$ is the node feature matrix.

4. Inverse Diffusion Wavelet Transform: $X = \sum_{j=0}^{J-1} \Psi_{j}^{-1} W_{j}$, where $J$ is the maximum scale level.
