GraphSAGENodeGenerator in GraphSAGE #2063

kakaroa · 2022-11-08T00:32:13Z

The parameter weighted in GraphSAGENodeGenerator, if weighted=True, is the sampling process sampling the first n nodes with the highest weight? Is it possible to interpret the weighted undirected graph as the original input data?

cloudmadeofcandy · 2023-05-03T10:33:27Z

After following a string of many different functions, particularly, this string:

GraphSageNodeGenerators -> _samplers = SeededPerWalk(SampledBreadthFirstWalk) -> _sample_neighbours_untyped() -> naive_weighted_choices()

I reached the definition of "naive_weighted_choices":

def naive_weighted_choices(rs, weights, size=None):
    """
    Select indices at random, weighted by the iterator `weights` of
    arbitrary (non-negative) floats. That is, `x` will be returned
    with probability `weights[x]/sum(weights)`.

    For doing a single sample with arbitrary weights, this is much (5x
    or more) faster than numpy.random.choice, because the latter
    requires a lot of preprocessing (normalized probabilties), and
    does a lot of conversions/checks/preprocessing internally.
    """
    probs = np.cumsum(weights)
    total = probs[-1]
    if total == 0:
        # all weights were zero (probably), so we shouldn't choose anything
        return None

    thresholds = rs.random() if size is None else rs.random(size)
    idx = np.searchsorted(probs, thresholds * total, side="left")

    return idx

So, the weights are normalized by the sum of weights, then are converted into probabilities.

Hope that this helps!!!

Quân Trương

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphSAGENodeGenerator in GraphSAGE #2063

GraphSAGENodeGenerator in GraphSAGE #2063

kakaroa commented Nov 8, 2022

cloudmadeofcandy commented May 3, 2023

GraphSAGENodeGenerator in GraphSAGE #2063

GraphSAGENodeGenerator in GraphSAGE #2063

Comments

kakaroa commented Nov 8, 2022

cloudmadeofcandy commented May 3, 2023