# Graph Embeddings

## Introduction
I recently learned that people are trying to use deep learning to solve problems from graph theory after a friend linked me the paper, [Learning Combinatorial Optimization Algorithms over Graphs](https://arxiv.org/pdf/1704.01665.pdf). I find this subject very interesting and want to explore it further. In this paper, I will try to create some simple graph representations by using vertex embeddings and a neural network. I am going to try to create what is essentially a lookup table using deep learning.

This is also a great exercise for me to learn how to use PyTorch


## Data
To create the graph, I will use a simple dictionary of nodes that contains a set of the neighbors. I will also define a next function so we can get batches of data to train the neural network.

In [1]:
from random import sample, randint, choice

In [2]:
import numpy as np

In [3]:
class Graph:
    def __init__(self, n_vert = 10, n_neigh = 3, bs=64):
        """
        Initialize a graph object. Note that graph is not guaranteed to be
        fully connected
        """
        self.bs = bs
        self.vertices = range(n_vert)
        self.graph = {}
        for v in self.vertices:
            possible = set(self.vertices).difference({v}) # remove loops
            edges = set(sample(possible, n_neigh)) # add n neighbors to each vertex
            self.graph[v] = edges
            
    def pos_sample(self):
        v = choice(self.vertices)
        x = [v, choice(tuple(self.graph[v]))]
        return x
    
    def neg_sample(self):
        v = choice(self.vertices)
        possible = set(self.vertices).difference(self.graph[v])
        possible = possible.difference({v})
        x = [v, choice(tuple(possible))]
        return x
    
    def __iter__(self, bs=64):
        """
        Set the batch size
        
        TODO: put this in init
        """
        self.bs = bs
        return self
        
    def __next__(self):
        """
        Returns a batch of data for use in training. The vertices are randomly chosen
        so it is not guaranteed that all vertices are in each training step - or that
        every possible neighbor will be chosen. Positive and negative examples are 
        one hot encoded for simplicity
        """
        n_pos = int(self.bs / 2)
        n_neg = self.bs - n_pos
        
        pos = [[self.pos_sample(), [1, 0]] for _ in range(n_pos)]
        neg = [[self.neg_sample(), [0, 1]] for _ in range(n_neg)]
        X, Y = zip(*(pos + neg))
        return X, Y

In [4]:
n_vert = 10
n_neigh = 3
bs = 64
graph = Graph(n_vert=n_vert, n_neigh=n_neigh, bs=bs)

In [5]:
graph.graph

{0: {1, 7, 9},
 1: {5, 6, 7},
 2: {0, 1, 8},
 3: {0, 2, 9},
 4: {2, 6, 7},
 5: {0, 4, 8},
 6: {1, 3, 7},
 7: {1, 3, 5},
 8: {0, 3, 6},
 9: {0, 2, 3}}

In [6]:
X, Y = next(graph)
X[:3], Y[:3]

(([0, 9], [5, 0], [6, 3]), ([1, 0], [1, 0], [1, 0]))

In [None]:
%timeit next(graph) # data is produced at 