Fast sampling of the G(n, m)
-model of Erdős–Rényi random graphs.
The time and space complexity of the algorithm is in O(m*samples)
. The result can be returned either as an integer np.ndarray
of size (2, m, samples)
or a list
of integer sp.sparse.coo_matrix
.
erdospy
can be installed directly from source with:
pip install git+https://github.com/NiMlr/erdospy
Test your installation with:
import erdospy
erdospy.testall()
Generate five samples of the G(n, m)
-model for n, m = 4, 3
with:
n = 4
m = 3
samples = 5
erdospy.sample_erdos_renyi_gnm(n, m, samples, random_state=42)
# array([[[1, 3, 3, 1, 2],
# [2, 1, 2, 3, 3],
# [3, 2, 1, 3, 3]],
# [[0, 0, 1, 0, 1],
# [0, 0, 0, 2, 2],
# [2, 0, 0, 0, 1]]])
Thanks to the powerful implementation of sklearn.utils.random.sample_without_replacement
,
it turns out that the method in many cases is not only faster than that of other packages
from erdospy import sample_erdos_renyi_gnm
import networkx as nx
n = 50
m = 600
samples = 2000
%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="edge_array")
# 116 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="adjacency_matrix")
# 359 ms ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [nx.gnm_random_graph(n, m) for _ in range(samples)]
# 4.66 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
but it opens up the use-case n >> m
n = 1_000_000
m = 600
samples = 1
%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="adjacency_matrix")
# 3.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit [nx.gnm_random_graph(n, m) for _ in range(samples)]
# 931 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# running nx.gnm_random_graph for even larger n will soon break down
The simple algorithm is based on the fast sampling of m
edge indices in 0, 1, ..., n*(n-1)//2
using sklearn.utils.random.sample_without_replacement
and the subsequent sample-wise constant time mapping onto the indices of the adjacency matrix A_{n, ij} -> (i, j)
, where the edge indices are associated with the corresponding matrix entry, due to undirectedness the cases j < i
are sufficient, and
The mapping can be done without explicit representation of A_n
by solving a vectorized quadratic equation. Per sample of the random graphs, this results in m
edge tuples of form (i, j)
and represented as a np.ndarray
.