# Graph Attention Networks (GNN) as an extension of Graph Convolution Network (GCN).
## Intro

Assume general info on Graphs & ML is known:
1. What are graphs and kind of data can be represented by graphs
    * Captures interaction between many objects (nodes/vertices). Number of objects is arbitrary;
    * Relations between nodes are described by edges. Number of edges is arbitrary;
    * No unique graph representation for a given data;
    * Graph analysis framework is developed. Notes: [graph_stuff.ipynb](data_processing/graphs/graph_stuff.ipynb)
    
1. Why use GNN (vs traditional NNs)?
    * Can process graph data which is unstructured, unordered data;
    * Connectivity is complex
    * Leverages connectivity of nodes-  distills information from each node's neighborhood;
1. What should trained model be capable of?
    * To process data independent of number and ordering of nodes and edges<br>
    Order invariance  = permutation invariance.


# Convolution $\leftrightarrow$ connectivity
Notes: [GNN_Convolution__notes.ipynb](data_processing/neural_networks/Graph_neural_networks/GNN_Convolution__notes.ipynb)

Local connectivity allows to pass information between nodes.


* Convolution/Cross-correlation operation is a closest analogue:
    * Works on structured data
    * Number of neighbors is limited (3 to 8 in case of 2D image)
    * Image can be padded so each pixel has 8 neighbors
    * Each pixel can gather info from his neighborhood based on:
        * kernel size
        * number of operation iterations

* Graphs:
    * cannot collect node neighbor information into one matrix.<br>
    Some node might have 0 neighbors, others 1000
    * kernel is not easily defined. Specially kernel radius > 1 neighbor. Notes:  [graph_random_walk.ipynb](data_processing/graphs/graph_random_walk.ipynb)
    * Adjacency matrix $A_{i,j}$ (N by N) encodes connectivity betwen node pair $(n_i,n_j) ; i,j \in [1,\dots,N]$

## Useful linear algebra tricks/interpretations
Notes: [linear_algebra_for_GCN_GAT](educational/linear_algebra/linear_algebra_for_GCN_GAT.ipynb)

* Matrix as a container of vectors;
* 'Broadcasting' transformation on multiple vectors;
* 'Broadcasting' multiple transformations on single vector

## Graph Convolutional Network as base for GAT
Notes: [GNN_Convolution__notes.ipynb](data_processing/neural_networks/Graph_neural_networks/GNN_Convolution__notes.ipynb)

* Aggregation / Message Passing = gather information from node's neighbors onto itself.
    * Permutation invariance functions;
        * summation, mean, max,...
    * Count invariance:
        * mean, max, ..
    * Adjacency matrix is perfect candidate. 

How is this approach permutation invariant if it depends on Adjacency matrix?

It is if we work with input features if length $F$:
* Aggregation keeps dimension of $F$
* Linear layer transforms each feature into a vector of length $F^\prime$. 

Matrix of trainable parameters is of shape $F\times F^\prime $, and is independent of number, or order, of nodes.

$$
\vec{{ h}}_{new}^T = \sigma( \vec{h}_{aggregated}^T W_h + \vec{b}_h)
$$

$$H_{new}= \sigma( H_{aggregated} W_h + \vec{b_h}) = \sigma( \tilde A H W_h + \vec{b_h})$$

    

## Graph Attention Network (GAT)
Notes [GNN_Attention_notes.ipynb](data_processing/neural_networks/Graph_neural_networks/GNN_Attention_notes.ipynb)