<a href="https://colab.research.google.com/github/dsevero/Random-Edge-Coding/blob/main/Random_Edge_Coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs

- https://github.com/dsevero/Random-Edge-Coding
- https://arxiv.org/abs/2305.09705


# Setup (run this first)

## Clone repo

In [None]:
!git clone https://github.com/dsevero/Random-Edge-Coding.git
%cd Random-Edge-Coding/

## Install dependencies and Random Edge Coding

In [None]:
!./install_dependencies.sh
!ln -s craystack_repo/craystack craystack

## Download datasets

In [None]:
%env DATA_DIR=/content/data
!mkdir $DATA_DIR
!./download_datasets.sh

# Run Experiments

The following cell shows an example of how to run a specific experiment.
Optional arguments are discussed below.

- For model flags see files in `experiments/encode-decode-graph/configs/`.
- Use `--eval_likelihood_only` to only compute the likelihood of the graph, skipping over both encoding and decoding.
- Removing `--decode` will skip over decoding.
- For a list of available datasets see `AVAILABLE_DATASETS` in `rec/datasets.py`.

In [None]:
%%shell
python -B experiments/encode-decode-graph/run_experiment.py \
    --config=experiments/encode-decode-graph/configs/config_polya.py \
    --config.dataset_name=youtube \
    --decode \
    --config.max_num_edges=10_000 # set this to -1 to encode/decode the full graph.

To reproduce all experiments in the paper, run the following cell.




In [None]:
%%shell
for DATASET_NAME in youtube foursquare digg gowalla skitter dblp; do
    python -B experiments/encode-decode-graph/run_experiment.py \
        --config=experiments/encode-decode-graph/configs/config_polya.py \
        --config.dataset_name=$DATASET_NAME \
        --decode
done

# How to use Random Edge Coding in your own code

***DISCLAIMER***: In colab, numpy might raise a runtime error when running the example.

To fix it, uncomment the pip command below, run the cell, and restart the kernel.

In [None]:
# pip install numpy --upgrade --ignore-installed

In [None]:
import craystack as cs
import numpy as np

from rec.definitions import Graph
from rec.models import PolyasUrnModel

def sample_erdos_renyi_graph(num_nodes, p, seed=0):
    np.random.seed(seed)
    adjacency_matrix = np.triu(np.random.rand(num_nodes, num_nodes) < p, k=1)
    edge_array = np.stack(np.nonzero(adjacency_matrix)).T
    return Graph(
        edge_array=edge_array,
        num_nodes=num_nodes,
        num_edges=edge_array.shape[0],
    )

# Sample a graph from the G(n, p) model of Erdős and Rényi.
num_nodes = 200
p_erdos_renyi = 1/2
graph = sample_erdos_renyi_graph(num_nodes, p_erdos_renyi)

# Compute the information content of the vertex-sequence and graph
# under Pólya's Urn model, normalized by the number of observed edges
# in the graph. The information content is equal to the negative log-likelihood
# and is the optimal number of bits an algorithm should allocate under the model
# to minimize the average number of bits.
#
# Note the information content of the vertex-sequence is significantly larger
# than that of the graph, as it contains the order in which edges were added
# to the graph. With Random Edge Coding this redundancy is removed, providing
# a substantial bit saving.
model = PolyasUrnModel(graph.num_nodes, graph.num_edges, bias=1)
seq_bpe, graph_bpe = model.compute_bpe(graph)

# Initialize the ANS state, encode the graph, and compute the final message
# length. We add 32 extra bits to represent the integer needed to specify the
# number of observed edges.
ans_state = cs.rans.base_message(shape=(1,))
ans_state = model.push(ans_state, graph)
rec_bpe = (32 + 32*len(cs.flatten(ans_state)))/graph.num_edges

# Decode the graph and assert the graph is recovered losslessly.
ans_state, graph_decoded = model.pop(ans_state)
assert graph_decoded == graph

print("\n\n------------------RESULTS------------------")
print(f"BPE of the vertex-sequence under Pólya's Urn (theoretical):", seq_bpe)
print(f"BPE of the graph under Pólya's Urn (theoretical):", graph_bpe)
print(f"Random Edge Coding message length in BPE:", rec_bpe)
print(f"Savings due to Random Edge Coding: {1 - rec_bpe/seq_bpe: .2%}")