>>> Work in Progress

### Links
- [Lectures](http://snap.stanford.edu/class/cs224w-2020/)
- [Course TextBook](https://www.cs.mcgill.ca/~wlh/grl_book/files/GRL_Book.pdf)
- Graph ML tools
  - [PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric)
  - [DeepSNAP]()
  - [GraphGym]()
  - [SNAP.PY]()
  - [NetworkX]()

### PyTorch Geometric (PyG)

PyG documentation:
_'PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data._

_It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of easy-to-use mini-batch loaders for operating on many small and single giant graphs, multi GPU-support, distributed graph learning via Quiver, a large number of common benchmark datasets (based on simple interfaces to create your own), the GraphGym experiment manager, and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds.'_

### Introduction
- many types of data can be represented as graphs
- graph relational data
- complex domains have rich relational structure - relational graph
- deep learning toolbox
  - graphs are frontier of deep learning
- graph has complex topology than images, text
  - no notion of spatial locality like grids
  - graphs are dynamic and multimodal nodes
- how to design neural network so that no human feature engineering ( feature engineering) is needed
  - instead of feature engineering - representation learning is used
  - automatically learn the features
  - and predict the downstream tasks
    - map nodes to d-dimensional embeddings, such that similar nodes in the network are embedded close together

### Course outline
- Traditional methods: Graphlets, Graph Kernels
- Methods for node embeddings: DeepWalk, Node2Vec
- Graph Neural Networks: GCN, GraphSAGE, GAT, Theory of GNNs
- Knowledge graphs and reasoning: TransE, BetaE
- Deep generative models for graphs
- Applications to Biomedicine, Science, Industry

### Syllabus
- Introduction; Machine Learning for Graphs 
- Traditional Methods for ML on Graphs 
- Node Embeddings 
- Link Analysis: PageRank 
- Label Propagation for Node Classification 
- Graph Neural Networks 1: GNN Model 
- Graph Neural Networks 2: Design Space 
- Applications of Graph Neural Networks 
- Theory of Graph Neural Networks 
- Knowledge Graph Embeddings 
- Reasoning over Knowledge Graphs
- Frequent Subgraph Mining with GNNs
- Community Structure in Networks
- Traditional Generative Models for Graphs
- Deep Generative Models for Graphs
- Scaling Up GNNs
- Learning on Dynamic Graphs
- GNNs for Computational Biology
- GNNs for Science
- Industrial Applications of GNNs


### Different types of task
- Node classification
  - predict property of node
- Link prediction
- Graph classification
- Clustering
- Other
  - Graph generation
  - Graph evolution

#### Nodel level ML problem
- Protein folding
  - Medicine bind to proteins
  - Proteins are made up of amino acids
  - Given a sequence of amino acids, can you predict the 3D structure of underlying protein
    - DeepMind's AlphaMind is close to solve
      - underlying amino acids were designed in the form of nodes of __spatial graph__ 
      - edge proximity between amino acids


#### Edge level ML tasks
- Link prediction
- Recommender systems
  - Watch movie
  - listen to music
    - make predictions using
      - Graph representation learning
      - GNN
    - used in 
      - Pinterest, LinkedIn, Facebook
    - nodes that are related are closer than ones not related
    - Predict
      - use feature information/images and transform it across underlying graph to come up with robust embedding
      - images + graph leads to much better recommendation than graphs itself
      - understand relationship between pairs of nodes/images saying nodes that are related should be embedded closer together
- Drug side effects
  - simulataneously take 5-6 drugs
  - these drugs interact with each other
  - side effect
    - cannot test experimentally all combination of drugs to see what kind of side effects they lead to
    - make a prediction engine that takes arbitrary pair of drugs and predict how these drugs are going to interact and cause side effect
  - design 2 level of heterogenous network 
    - triangle - represent drugs
    - circle - represent protein
    - we have protein-protein interaction network
    - there are lot of missing connection as how would medicine interact
    - can we predict missing connections
    - link predictions problem

#### Subgraph-level ML tasks
- Traffic prediction
  - road represent node segment
  - edges represent connectivity between road segments
  - prediction using GNN
  

#### Graph-level ML tasks
- Drug discovery
  - Modecules can be represented as graphs
    - Atoms as nodes
    - Chemical bonds as edges
  - graph NN was used to classify different molecules
  - which molecule can have theuraputic effect
  - team at MIT used DL for antibiotic discovery to classify different molecules and predict promising molecules from a pool of candidates
- Molecule generation
  - generate molecules that are non-toxic
  - generate molecules that have high solubility
  - generate molecules that have high drug likeness
  - optimize existing molecules to have desirable properties
- Physics simulation
  - for different materials i.e., set of particles, how do the particles interact
  - task for ML is to predict how the graph will evolve in future
  - how will this material deform in future
    - generate proximity graph
    - how will the particles evolve to their new positions
    - iterate over how the particles will move and based on this predict their future

### Choice of graph representaton

#### Components of network
- Objects - nodes, vertices - N
- Interactions - links, edges - E
- System - network, graph - G(N,E)
- Underlying infrastructure might be same
- choosing proper representation


#### How to build a graph
  - what are nodes
  - what are edges


#### Choice of proper network representation

#### Directed vs Undirected graphs
  - Undirected
    - symmetrical or reciprocal 
    - For example - collaboration, friendship, interaction between protein
  - Directed
    - every link has a direction/source and denoted by arrow
    - there is a source and destination
    - For example - phone calls, financial transactions

#### Node degrees
  - Undirected
    - number of edges adjacent to a given node
    - each edge gets counted twice
    - having a self-edge/self-loop adds a degree of 2 to the node
  - Directed
    - in-degree - pointing towards the node
    - out-degree - pointing outwards from the node


#### Bipartite graph
  - type of graph whose nodes can be divided into 2 disjoint sets/partitions 
  - edges go only from left to right
  - not inside the same partition
  - the sets are independent
  - examples
    - Authors to papers (they authored)
    - Authors to movies (they appeared in)
    - Users to movies (they rated)
    - Folded networks
      - author collaboration networks
      - Movie co-rating networks  
<img src = 'images/01_bipartite.png' width=200 height=200>

$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

#### Folded/Projected Bipartite graphs
- If we have a bipartite graph, we can project this bipartite graph to either left or right side
- use nodes from one side in my projection graph
- the way we connect the nodes
  - create a connection between pair of nodes
  - if they have one neighbor in common
  - 1,2,3 co-authored a paper
  - 3 and 4 did not co-author a paper
  - 2 and 5 co-authored a paper
- create a projection on the right side  

<img src = 'images/01_bipartite_projection.png' width=400 height=400>

$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

#### Representing Graphs - Adjacency Matrix
- matrix will take entries of 0 and 1 (binary)
- value of matrix element $A_{ij}$ will be set to 1, if nodes i and j are connected
- undirected graph
  - the matrix are symmetric
- directed graph
  - the matrix are not symmetric
- node degrees are different for directed and undirected graph
- adjacency matrices are sparse
  - extremely sparse  
  
<img src = 'images/01_adjacencyMatrix.png' width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

#### Edge list and Adjacency list
- Edge list
  - quite popular
  - very hard to graph manuplation
  - represent as two-dimensional matrix
  - Example
    - (2,3)
    - (2,4)
    - (3,2)

#### Adjacency list
- Adjacency list
  - very easier to work with sparse matrix
  - simply store the neighbors
  - for undirected - simply store the neighbors
  - for directed - store both in-going and out-going neighbors
  - Example
    - 1:
    - 2: 3,4
    - 3: 2,4
    - 4: 5

#### Node and Edge attributes
  - how to attach attributes and properties
  - edge can have a weight, how strong is the friendship
  - edge can have a rank, or type of friend


#### More types of graphs
- unweighted - undirected
- weighted - undirected
- self-edges (self-loops) - undirected
- multigraph - undirected  

<img src = 'images/01_weightedMatrix.png' width=400 height=400>
<img src = 'images/01_selfedgeGraph.png' width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

#### Connectivity of undirected graph
  - can be joined by path
  - block matrices 
    - tell us if the graph is inter-connected or not
    - isolated node
    - block diagonal structure  
    
<img src = 'images/01_connectivityUnDirected.png' width=400 height=400>  
$\tiny{\text{YouTube-Stanford-CS224W-Jure Leskovec}}$

#### Connectivity of directed graph
  - strongly or weakly connectivity
  - strong
    - has a path for every node 
  - weakly
    - disregard the edge directions

### Summary

- ML with graphs
  - Applications and use cases
- Different types of tasks
  - Node level
  - Edge level
  - Graph level
- Choice of graph representation
  - Directed
  - Undirected
  - bipartite
  - weighted
  - adjacency matrix