In [1]:
print('Day 1')

Day 1


Beginning with [this](https://medium.com/octavian-ai/how-to-get-started-with-machine-learning-on-graphs-7f0795c83763) on how to get started with ML on graphs.

Some insights important for us:

> In this article I’m not going to cover “traditional” graph analysis — that’s the well known algorithmic techniques like PageRank, clique identification, shortest path, etc. These are very powerful and should be considered a first port-of-call due to their well understood nature and plentiful implementation in public libraries.

Dark Web data analysis problems do not conform to these formalisms due to their breadth.

> In mainstream areas of ML the community has discovered widely applicable techniques (e.g. transfer learning using ResNet for images or BERT for text) and made them accessible to developers (e.g. TensorFlow, PyTorch, FastAI). However, there are no equivalently easy, universal techniques, nor do any of the popular ML libraries have support for graph data.

Challenges faced while applying ML on graphs:

- flexibility of graphs doesn’t fit the fixed computation-graph
- so we do not get fixed sized tensors, a paradigm popular with deep learning libraries and GPU manufacturers
- hence, it is customary (as of Q1, 2019) to build our own high level systems

While sampling from a graph to feed to an ML model, two cases might arise:

- for small graphs, we can use the adjacency matrix etc to get a tensor of minibatches
- for large graphs, we will need a method to sample subgraphs from the graph and train on them.  An extreme case is to take individual node-edge pairs.

> Note that some approaches tabularize the data before it reaches the machine learning library. Node2Vec is a good example of this, where random walks are used to transform each node into a vector. These vectors are then fed into the machine learning model as a list.

Common types of graph ML problems:

- scoring of nodes
- scoring of edges
- scoring of graphs
- link prediction

A very common way to vectorize graphs is through embeddings, like through using random walks. We should definitely try Node2Vec on our problem (a team member will be working on this)

Graph Convolutional Networks, or simply, Graph Networks are a very broad area of research on graph ML, and I'll be focusing in on them from the next episode.

Finally, recent approaches use techniques like attention and reinforcement learning to solve graph ML problems, and this might be the next thing to try after GCNs.

TBC with these:
- https://arxiv.org/abs/1806.01261
- https://tkipf.github.io/graph-convolutional-networks/
- https://arxiv.org/pdf/1812.08434v1.pdf
- https://nlp.stanford.edu/pubs/SocherChenManningNg_NIPS2013.pdf

And look at latest work on GCNs.

Bye!

In [2]:
print('Day 2')

Day 2


Before diving on to read up papers, let's look at the current research landscape on graph learning and DL on graphs, as of Jul, 2020.

Edit: Much of these results should be very insightful for us once we're clear how vanilla GCNs work.

- Around 10% of all accepted papers ar tier-1 conferences are on Graph Networks
- The width of the network, w * the depth of the network, d should be proportional to the size of the graph n, i.e. dw = O(n), if we want GNN being able to compute a solution to popular graph problems (like cycle detection, diameter estimation, vertex cover, etc.). This is currently not the case.
- This year there are applications in fixing bugs in Javascript, game playing, answering IQ-like tests, optimization of TensorFlow computational graphs, molecule generation, and question generation in dialogue systems.
- There are quite a few papers this year on reasoning in knowledge graphs. Essentially, a knowledge graph is a structured way to represent facts, where nodes and edges carry meaningful information.

[This](https://arxiv.org/pdf/1912.12693.pdf) review paper is a very recent (Jun, 2020) study on the theory of various graph DL approaches.

[This](https://towardsdatascience.com/graph-deep-learning/home) is an ongoing blog series on various topics on graph NNs, might be a useful read in the future.

From now on, I'll combine it under the umbrella term "Geometric Deep Learning".

> A major difference compared to classical deep neural networks dealing with grid-structured data is that on graphs such operations are permutation-invariant, i.e. independent of the order of neighbour nodes, as there is usually no canonical way of ordering them.

As of 2020, we have two professionally maintained libraries that can be used to build GDL models, namely [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/) and [DGL](https://www.dgl.ai/). 

[This](http://geometricdeeplearning.com/) website has a collection of tutorials, papers and code on GDL.

Some nice latest talks on GDL:
- [Microsoft](https://www.youtube.com/watch?v=zCEYiCxrL_0)
- [Prof. Xavier Bresson](https://www.youtube.com/watch?v=VXNjCAmb6Zw)
- [Institute for Advanced Study](https://www.youtube.com/watch?v=USfNJNePDKQ)
- [Stanford](https://www.youtube.com/watch?v=YrhBZUtgG4E)

So, the path ahead is clear: watch these lectures, read basic papers, get the theory clear, look up code implementations, and finally switch to our dataset. A number of recent works on GDL involve other fields like NLP, CV, RL, Encoders, GANs etc, and we might need to study them up later.

Starting with [this](https://www.youtube.com/watch?v=zCEYiCxrL_0) video lecture. Summary: 
- Distributed Embeddings
- Graph repn of problem -> repn of each node using Dist. Embd.
- Neural Message Passing
- Invariance through unification operation
- Two types - Gated GNNs and GCNs
- Applications - molecular chemistry, finding bugs in code
- Special cases - CNNs, Deep Sets

In [3]:
print('Day 3')

Day 3


[This](https://arxiv.org/abs/1609.02907) is the seminal paper on GCNs by Kipf and Welling (ICLR 2017)

`../papers/GCN-Kipf-2017.pdf` has reading highlights, notes, etc. After a reading, we can move on to look at implementations in PyTorch. Notes:
- Spectral Graph Convs and Chebyshev Polynomials on Laplacians, etc; most math isn't clear rn.
