erdospy

Fast sampling of the G(n, m)-model of Erdős–Rényi random graphs.

The time and space complexity of the algorithm is in O(m*samples). The result can be returned either as an integer np.ndarray of size (2, m, samples) or a listof integer sp.sparse.coo_matrix.

Installation

erdospy can be installed directly from source with:

pip install git+https://github.com/NiMlr/erdospy

Test your installation with:

import erdospy
erdospy.testall()

Generate five samples of the G(n, m)-model for n, m = 4, 3 with:

n = 4
m = 3
samples = 5
erdospy.sample_erdos_renyi_gnm(n, m, samples, random_state=42)
# array([[[1, 3, 3, 1, 2],
#         [2, 1, 2, 3, 3],
#         [3, 2, 1, 3, 3]],

#        [[0, 0, 1, 0, 1],
#         [0, 0, 0, 2, 2],
#         [2, 0, 0, 0, 1]]])

Empirical performance

Thanks to the powerful implementation of sklearn.utils.random.sample_without_replacement, it turns out that the method in many cases is not only faster than that of other packages

from erdospy import sample_erdos_renyi_gnm
import networkx as nx

n = 50
m = 600
samples = 2000

%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="edge_array")
# 116 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="adjacency_matrix")
# 359 ms ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [nx.gnm_random_graph(n, m) for _ in range(samples)]
# 4.66 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

but it opens up the use-case n >> m

n = 1_000_000
m = 600
samples = 1

%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="adjacency_matrix")
# 3.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit [nx.gnm_random_graph(n, m) for _ in range(samples)]
# 931 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# running nx.gnm_random_graph for even larger n will soon break down

Implementation details

The simple algorithm is based on the fast sampling of m edge indices in 0, 1, ..., n*(n-1)//2 using sklearn.utils.random.sample_without_replacement and the subsequent sample-wise constant time mapping onto the indices of the adjacency matrix A_{n, ij} -> (i, j), where the edge indices are associated with the corresponding matrix entry, due to undirectedness the cases j < i are sufficient, and

$A_n = \begin{pmatrix} * & *\phantom{*} & \phantom{*}* & \phantom{*}*\phantom{*} & *\phantom{*} & *\\ 0 & *\phantom{*} & \phantom{*}* & \phantom{*}*\phantom{*} & *\phantom{*} & *\\ 1 & 2\phantom{*} & \phantom{*}* & \phantom{*}*\phantom{*} & *\phantom{*} & *\\ 3 & 4\phantom{*} & \phantom{*}5 & \phantom{*}*\phantom{*} & *\phantom{*} & *\\ \vdots & & & \ddots & \ddots &\vdots\vspace*{2mm}\\ \tfrac{n(n-3)}{2} +2 & &\phantom{*}\dots & & \tfrac{n(n-1)}{2}\phantom{*} & * \end{pmatrix} \,.$

The mapping can be done without explicit representation of A_n by solving a vectorized quadratic equation. Per sample of the random graphs, this results in m edge tuples of form (i, j) and represented as a np.ndarray.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

erdospy

Installation

Empirical performance

Implementation details

Files

README.md

Latest commit

History

README.md

File metadata and controls

erdospy

Installation

Empirical performance

Implementation details