# ML Graphs Training

**Graphs** (or networks) are composed of **nodes** (or vertices) connected by **edges** (or links).

We can consider a **community** (or subgraph) a part of the original graph.

Relationship between 2 connected nodes can be: 
* **directed** (ex: A follows B does not imply that B follows A) phone calls, or following on twitter.
* **undirect** (ex: A follow B implies B follows A) collaborations or friendship on Facebook
<br><br>

Node's centrality is the importance of the node in the graph.  

The node **degree** is the quantity of direct neighbour it has.  

The **clustering coefficient** measures how connected the neighbour nodes are.  

Graphlets **degree vectors** count how many different graphlets are rooted at a given node.
**Edge-level** features complement the representation with more detailed information about the connecedness of the nodes (including shortest distance, and their Katz-index)

**ego-network** is the degree 1 network related to a said node. It forms graphlets or triangles.

**Graph level features** contain high-level information about graph similarity and specificities.

**Kernel methods** measure similarity between graphs through different "bag of nodes" methods (similar to bag of words)

**Bipartite graphs** are graphs whose nodes can be divided into two disjoint sets U and V such all edges are observed between U and V but both remains indepedent. (Authors-to-papers)

Also see **self-edges (self-loop) graphs** and **multigraphs**.

**Strongly connected direct graph** means that all nodes can have direct connection to each others. From which derivates **strongly connected components** (SCCs) which are communities.

Most networks in real world are really sparse and their adjacency matrix will contain a lot of 0s.

This is why adjacency list are used when networks are large and sparse.

## Graph Neural Networks

There are 3 types of tasks:
* Node-level prediction
* Link-level prediction
* Graph-level prediction

Typical neural networks are not compatible with graphs, in fact they are not permutation invariant.

A GNN layer represents a node as the combination of the representations (**aggregation**) of its neighbours and itself from the previous layer (**message passing**), plus usually an activation function to add some nonlinearity,

Different types of network's design exists:

* ***Graph Convolutional Networks*** average the normalised representation of the neighbours for a node (most GNNs are GCNs).
* ***Graph Attention Networks*** learn to weigh the different neighbors based on their importance (like transformers).
* ***GraphSAGE*** sameples neighbours at different hops before aggregating their information in several steps with max pooling.
* ***Graph Isomorphism Networks*** aggregate representation by applying an MLP to the sum of the neighbours' node representations.

Min Max pooling can encounter failures (ex: 1,1,-1,-1 avg to 0 and -1,0,1 avg to 0 therefor does not differenciate the 2 representations).

## GNN shape and the over-smoothing problem

At each new layer, the node representation includes more and more nodes.

ML Graphs allow prediction of node labels, new links, generated graphs and subgraphs.

Since Representation Learning is used, there is no feature engineering required.

We map nodes to d dimensionnal embeddings vectors.

# Applications of Graph ML

Classic Tasks:

* Node classification: predict a property of a node (recommender system: online user/item) 
* Link prediction: predict whether there are missing links between nodes
* Graph classification: Categorize different graphs (molecule property prediction)
* Clustering: Detect if nodes form a community
* Graph generation: Drug discovery
* Graph evolution: Physical simulation


Example we can think of are AlphaFold models which led to solving new protein folding.
Google map GNN to predict traffic.

# Traditional methods : Graphlets, Graph Kernels

# Methods for node embeddings: DeepWalk, Node2Vec

# Graph Neural Networks: GCN, GraphSAGE, GAT, GNN

# Deep generative models for graphs

# Applications