>>> Work in Progress

### Overview
- Node level prediction
  - Example - 
- Link level prediction
  - Example - 
- Graph level prediction
  - Example - 

### Traditional ML Pipeline
- We design features for nodes/links/graphs which have dimension $\in \mathbb R^{D}$
- In traditional ML pipeline:
  - we train ML model (hand-designed features)
    - by applying Random Forest, SVM, NN
  - and then apply model to make predictions
- In this lecture/here (use undirected graphs for simplicity):
  - Node level prediction
  - Link level prediction
  - Graph level prediction

### ML in graphs
- Goal: Make predictions for set of objects
- Design choices:
  - Features: d-dimensional vector
  - Objects: nodes, edges, graphs
  - Objective function: 
> Given: $G = (V, E)$  
> Learn: $f: V \rightarrow \mathbb R$  
- How do we learn the function. 

### Node-level tasks
- Node classification
  - Identify missing node colors, given other node colors
- Characterize structure and position of node
  - Node degree
  - Node centrality
  - Clustering coefficient
  - Graphlets

#### Node Degree
- Degree 
  - number of edges a node has
  - neighboring nodes
  - treat all neighboring nodes equally
  - does not capture importance

<img src="./images/02_nodeDegree.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

#### Node centrality
- captures the node importance in a graph

##### Eigenvector centrality
- node is imp if it has more neighboring nodes
- sum of centrality of neighboring nodes
- recursive problem
> $c_{\nu} = \frac{1}{\lambda}\sum\limits_{u \in N(\nu)}c_{u}$  
> $\Rightarrow \lambda c = A c$  
  - where A is the adjacency matrix
  - c is the centrality vector
  - $\lambda$ is positive constant
- centrality is the eigenvector
- largest eigenvalue is always positive (Perron-Frobenius theorem)
- leading eigenvector $c_{max}$ is used for centrality

##### Betweenness centrality
- node is imp if it lies on many shortest paths between other nodes

<img src="./images/02_nodeCentralityBetweenness.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

##### Closeness centrality
- node is imp if it has small shortest path lengths to all other nodes

<img src="./images/02_nodeCentralityCloseness.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$


#### Clustering coefficient
- how connected nodes are corresponding to its neighboring nodes
- no of triangles is used to calculate coefficient
- in the middle example below, there are 6 triangle possible(4c2), out of which the v node forms 3 connection triangle to all, so the coeff is 0.5
  - 3 triangles out of 6 node triplets

<img src="./images/02_nodeCentralityClusteringCoeff.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$


#### Graphlets
- this is an extension of clustering coeff
- instead of counting triangles, this counts the pre-specified subgraphs - graphlets
- topology of node's neighborhood
- gives a measure of topological similarity compared to node degrees or clustering coefficient





#### Summary - Node level feature
- Importance based features
  - Node degree 
    - count neighboring nodes
  - Node centrality
    - based on choice of centrality measure
  - Example
    - predict celebrity users in social network
- Structure based features
  - capture topological properties of local neighborhood around node
  - Node degree
  - clustering coeff
  - Graphlet degree vector(GDV)
  - Example
    - used in protein-protein interaction
- node features help distinguish nodes
- but donot allow distinguish node labels

### Link prediction task and features
- predict new links based on existing links
- initially node pairs are ranked 
  - ?? and top K node pairs are predicted
- ??design featues for node pairs
- links over time
  - given state of graph edges at time t0, predict ranked list of links that appear at time t1
  - Methodology - Proximity
    - compute score of common neighbors for each node pair
    - sort pairs by decreasing score
    - predict top n pairs
    - ???

#### Link level features
- Distance based feature
- Local neighborhood overlap
- Overlap neighborhood overlap

##### Distance based feature
- shortest path distance between two nodes
- does not capture degree of neighborhood overlap  
<img src="./images/02_linkPredictionShortestPath.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

##### Local Neighborhood overlap

<img src="./images/02_linkPredictionLocalNe.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

##### Global Neighborhood overlap
- Katz index - number of paths of all length between pair of nodes
- to compute paths between 2 nodes
  - use adjacency matrix
  - can compute path between path of any given length l between u and v node - $A_{uv}^{l}$

<img src="./images/02_linkPredictionAdjacencyMatrix1.png" width=400 height=400>  

----
<img src="./images/02_linkPredictionAdjacencyMatrix2.png" width=400 height=400>  

----
<img src="./images/02_linkPredictionAdjacencyMatrix3.png" width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

### Graph-Level features and Graph Kernels
- How to design graph level features
  - this characterizes structure of entire graph
  - Types
    - Kernel methods - measures similarity b/w data
    - Graph Kernels - measure similarity b/w graphs

#### Kernel method
- widely used in ML for graph level prediction
- Design kernels instead of feature vectors
- key idea
  - Kernel matrix K(G, G')
    - positive semipositive
    - positive eigen values
    - represent feature representation $\phi(.)$ as $K(G, G') = \phi(G)^{T}\phi(G')$
    - once kernel is defined, existing kernel methods such as kernel SVM can be used to make predictions


#### Graph kernels
- Graphlet Kernel 
  - represented as Bag-of-graphlets
  - computationally expensive
- WL Kernel
  - color enrichment hash
  - represented as Bag-of-colors
  - computationally efficient
  - closely related to GNN

##### Graphlet Kernel
  - design graph feature vector $\phi(G)$
  - Bag-of-Words (BoW) for a graph
    - use word count as features for documents
    - no ordering
    - regard nodes as words
    - following will have the same feature vector  

<img src="./images/02_graphFeatureBoW.png" width=100 height=100>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

     - what if Bag of node degree is used  
<img src="./images/02_graphFeatureBoNode.png" width=200 height=200>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$     

##### Graphlet features
- count the number of different graphlets 
- differnce from node-level features
  - list of graphlets of size k are calculated
  - in graph level 
    - they dont need to be connected
    - isolated nodes are allowed
    > $G_{k} = (g_{1}, g_{2}, ..., g_{n_{k}})$
  - the graphlet count vector is calculated as
    > $(f_{G})_{i} = \#(g_{i} \subseteq G)$ 
    > - for i = 1,2,..$n_{k}$
    > - $f_{G} \in \mathbb R^{n_{k}}$
    
<img src="./images/02_graphFeatureGraphlet.png" width=200 height=200>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$  

##### Graphlet kernel
> $K(G, G') = f_{G}^{T}f_{G'}$
- but this is results in skewed value due to different size of G and G'
- so normalize each feature vector
> $h_{G} = \frac{f_{G}}{sum(f_{G})}$  
> $K(G, G') = h_{G}^{T}h_{G'}$  
- Counting graphlets is expensive
- counting size k graphlets for graph of size n by enumeration has complexity of $n^{k}$

##### Weisfeiler-Lehman(WL) Kernel
- use Bag of node degrees
- color refinement
  - iteratively refine node colors using hash with different colors to different inputs
  > $c^{(k+1)}(\nu) = hash(\{c^{(k)}(\nu),\{c^{(k)}(u)\}_{u \in N(\nu)}\})$
  - after k steps of color refinement, $c^{(k)}(\nu)$ summarizes structure of k-hop neighborhood
- WL kernel value is computed by the inner product of color count vectors
  - computationally efficient
  - linear time complexity