# Semi-Supervised Classification via Hypergraph Convolutional Extreme Learning Machine


### Zhewei Liu , Zijia Zhang , Yaoming Cai , Yilin Miao and Zhikun Chen 

#### Presented by Phiphat Chomchit

# Hypergraph Convolutional Extreme Learning Machine (HGCELM)

- propose by Zhewei Liu et al, 2021
- Semi Supervised Learning + Graph Covolutional Network + Extreme Learning Machine
- Graph Convolution Extreme Learning Machine
- Hyper Graph vs. Pairwise Graph

#  Extreme Learning Machine

- Huang et al, 2004
- Single Hidden Layer Feedforward Neural Networks
- supervised -> Classifies probelm and Regression problem
- SVM (เรียนรู้ได้รวดเร็วมาก) random mapping

# Supervised Learning Era.

- **Classic ELM** : Poor robustness because random mapping.
- **multi-objective evolutional ELM** : heuristic search -> often time-consuming.
- **Kernel ELM** : SVM, hidden random mapping in Hilbert space. (Analogizer)
- **Other** : ...

# Semi-Supervised Learning Era.
- **SS-ELM** : Graph Lapacian regularization., Pairwise relationship node.
- **GCN** : massage passing, gradient decent. -> over smoothing probelm.
- **GCELM** : SS-ELM + GCN

# Objective of this paper
- Hyper Graph.
- ...

# Classic ELM

- **Activate function matrix** (Random mapping)

Let  $h(x) = g(x,w,c)$ be an activation function.

    - Sigmoid Function
$$g(x, w, c) = \frac{1}{1 + e^{-(wx + c)}}$$

where $c, w \sim \mathcal{N}(0,\,1)\,.$

Let $H$ be an activat function matrix.

$$H = \begin{bmatrix}
h(x_1) & \\
\vdots & \\
h(x_N) & 
\end{bmatrix} = 
\begin{bmatrix}
h_1(x_1) & \cdots & h_L(x_1) \\
\vdots & \vdots & \vdots \\
h_1(x_N) & \cdots & h_L(x_N)
\end{bmatrix}$$

$$H =  
\begin{bmatrix}
g(w_1\cdot x_1 + c_1) & \cdots & g(w_L\cdot x_1 + c_L) \\
\vdots & \vdots & \vdots \\
g(w_1\cdot x_N + c_1) & \cdots & g(w_L\cdot x_N + c_L)
\end{bmatrix}_{N\times L}$$


where  $N$ is a number of **input data**,

and $L$ be a number of **hidden node** , 

* **Beta matrix**

Let **input data** $X = [x_1, x_2, \cdots, x_N]^T$, $x\in \mathbb{R}^M$,

Let 

$$\beta \in \mathbb{R}^{L\times D},\quad \beta = [\beta_1, \beta_2, \cdots, \beta_L]^T$$

be **weight** between hidden node and output data (beta matirx).

where $M$ is a number of **feature data**  and $D$ be a number of **output data**  

### Objective
$$\underset{\beta}{\mathrm{min}} \|H\beta - Y\|^2$$ 

So,

$$\beta = H^\dagger Y$$

where $H^\dagger$ is psudo inverse matrix (**Moore–Penrose inverse**) of H.
$$H^\dagger = (H^TH)^{-1}H^T$$

# Kernel ELM

$$\underset{\beta,\xi}{min}\frac{1}{2}\|\beta\|^2 + c\frac{1}{2}\sum_{i=1}^N\|\xi_i\|^2$$

s.t.

$h(x)\beta = y_i^T - \xi_i^T,\quad i = 1, ..., N$

* **Lagrange Multiplier Method**

$$L = \frac{1}{2}\|\beta\|^2 + c\frac{1}{2}\sum_{i=1}^N\|\xi_i\|^2 - \sum_{i=1}^N\sum_{j=1}^M\alpha_{i,j}(h(x_i)\beta_j - y_{i,j} + \xi_{i,j})$$

where $\xi_i = [\xi_{i,1}, ..., \xi_{i,M}]^T$ is the training error vetor of the $M$ output nodes

and  $\quad\alpha_i = [\alpha_{i,1}, ..., \alpha_{i,M}]^T$ is the Lagrange multiplier

* **Critical points**

$$
\begin{align}
\frac{\partial{L}}{\partial{\beta_i}} &&= 0 \rightarrow &&\beta_j = \sum_{i=1}^N \alpha_{i,j}h(x_i)^T \rightarrow \beta = H^T\alpha \quad (1)\\
\frac{\partial{L}}{\partial{\xi_i}} &&= 0 \rightarrow &&\alpha_i = c\xi_i,\quad i=1,...,N\quad (2)\\
\frac{\partial{L}}{\partial{\alpha_i}} &&= 0 \rightarrow &&h(x_i)\beta - y^T_i + \xi^T_i = 0,\quad i=1,...,N\quad (3)\\
\end{align}
$$

* **Solve Equation**

From $(2)$ implies $\xi_i = \frac{\alpha_i}{c}$.

Consider $(3)$

$$
\begin{align}
h(x_i)\beta - y^T_i + \xi^T_i &= 0 \\
h(x_i)\beta - y^T_i + \frac{\alpha_i^T}{c} &= 0\\
H\beta - Y + \frac{\alpha}{c} &= 0\\
Y &= H\beta + \frac{\alpha}{c}\\
Y &= (HH^T\alpha + \frac{\alpha}{c})\quad \text{by}\,(1)\\
Y &= (HH^T + \frac{I}{c})\alpha\\
\alpha &= (HH^T + \frac{I}{c})^{-1}Y\\
\beta &= H^T(HH^T + \frac{I}{c})^{-1}Y\quad \text{by}\,(1)
\end{align}
$$

# GCN

* Adjacent matrix (Graph representation): $A$

example:
<img src='A.png' width=500>

* Let $\tilde{A} = I_N + A$ be the augmented normalized adjacency.

$$\tilde{A} = \begin{bmatrix}
1 & 0 & 0 & 1 & 0 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 0 & 0 & 1 & 0\\
0 & 0 & 1 & 0 & 0 & 1 & 0 & 0\\
1 & 0 & 0 & 1 & 0 & 1 & 0 & 0\\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 1\\
1 & 0 & 1 & 1 & 0 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 1\\
\end{bmatrix}$$

* Define $\tilde{D}$ by $\tilde{D_{ii}} = \sum_j \tilde{A}_{ij}$.

$$\tilde{D} = \begin{bmatrix}
3 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 2 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 2 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 3 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 2 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 4 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 2\\
\end{bmatrix}$$

* The matrix of latent representation

$$H = h(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}XW)$$

# HGCELM

* Hypergraph

Let $ \mathcal{G} = (\mathcal{V, E, W})$ be a **hypergraph** composed of 

a **vertex set** $\mathcal{v\in V}$ with the size of $N$, 

a **hyperedge set** $\mathcal{e\in E}$ with the size $|\mathcal{E}|$ , 

and a **weight set of hyperedge** $\mathcal{W}$ where
the weight of hyperedge  $\mathcal{e}$  e is indicated as $\mathcal{w(e)}$.



*  Incidence matrix: $Z$

<img src='B.png' width = 500>

Mathematically, the incidence matrix is defined by

$$\mathcal{z( v, e)} = \begin{cases}
1 & \mathcal{v}\in \mathcal{e}\\
0 & \mathcal{v}\not\in \mathcal{e}
\end{cases}
$$

* The normalized hypergraph Laplacian matrix.

$$L = I - D_v^{-1/2}ZWD_e^{-1}Z^TD_v^{-1/2}$$

* The degree of a vertex
$$\mathcal{d(v) = \sum_{e\in E}w(e)z(v, e)}$$

*  degree of a hyperedge
$$\mathcal{\delta(e) = \sum_{v\in V}z(v, e)}$$