Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random_walks() seems slow #20

Open
hananell opened this issue Aug 15, 2022 · 1 comment
Open

random_walks() seems slow #20

hananell opened this issue Aug 15, 2022 · 1 comment

Comments

@hananell
Copy link

hananell commented Aug 15, 2022

Hi
I am searching for a quick way to do random walks and saw this package that claims to be good, but when comparing it with naive python approach, the naive approach was much faster... what am I missing?

import networkx as nx
import numpy as np
import csrgraph as cg
from time import time
import random

walk_len = 20
G = nx.fast_gnp_random_graph(100, 0.3, directed=True)    # just a random graph
GG = cg.csrgraph(G)

t1 = time()

walks = GG.random_walks(walklen=walk_len)

t2 = time()

walks = np.zeros((len(G.nodes()), walk_len))
for i, node in enumerate(G.nodes()):
    cur_node = node
    for j in range(walk_len):
        neighbor = random.choice(list(G.neighbors(cur_node)))   # choose one random neighbor
        walks[i][j] = neighbor   # add it to walk
        cur_node = neighbor   # save it to cur_node to continue from it next iteration

t3 = time()

print('cg ', t2-t1)
print('naive ', t3-t2)

output:
cg 5.419240713119507
naive 0.010957479476928711

@YLTsai0609
Copy link

YLTsai0609 commented Mar 1, 2023

Hi @hananell

some concern here

And I tried another test below:

import networkx as nx
import numpy as np
from time import time
import random

walk_len = 20
trials = 100
warm_start = 15
warm_start_csr_runtime = []
csr_runtime = []

G = nx.fast_gnp_random_graph(100, 0.3, directed=True)    # just a random graph
GG = cg.csrgraph(G)


for t in range(warm_start):
    tic = time()

    walks = GG.random_walks(walklen=walk_len)

    toc = time()
    warm_start_csr_runtime.append(toc-tic)


print(
    warm_start_csr_runtime,
    np.mean(warm_start_csr_runtime),
    sep='\n\n'
)

gives

[1.511685848236084, 0.0009768009185791016, 0.0008192062377929688, 0.0009591579437255859, 0.001013040542602539, 0.0010869503021240234, 0.0008718967437744141, 0.0007932186126708984, 0.0009889602661132812, 0.0010161399841308594, 0.0010478496551513672, 0.0009562969207763672, 0.001085042953491211, 0.00102996826171875, 0.001010894775390625]

0.10168941815694173

As you see, the first trail took more time.

it seems like there is a compilation issue when executing _random_walk()

Since numba need to compile the data type to static one. so it cause some time when first run.

just do a warm_start to solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants