Network effects
---

"Network effects" is a generic name given to the effect that one user of a product has on the value of the product as perceived by others. If none of my friends are using Myspace anymore, then its value drops in my eyes, even though the product itself hasn't changed.

We've seen a long succession of social networks rise and fall -- Bebo, Myspace, Facebook, Orkut, Twitter. They don't fade gradually -- instead, there is a kind of mass exodus, as people realise that all their friends are leaving. But there may be pockets of users who are densely connected to each other and all stay together. Because they are densely connected to each other (perhaps in a clique or near-clique), and less densely connected to other users, they don't perceive a huge drop in the product's value, and so they may stay a long time.

It's interesting to use simulation methods, like the information flow model above, to study these effects in different types of graphs. Let's try to create a model.




Social network join/leave model
---

Let's suppose there is one dominant network, FriendFace (F), and a
startup one, MyMates (M). Every day, all users may post some messages to 
some or all of their friends. Suppose cross-posting is enabled, so that anyone
can post to their friends, regardless of which network they're on. But the source
of the message is visible to the receiver, so I always know whether my friends are posting
from F or from M.

Let's suppose everyone is currently using F, and M starts up by 
convincing a small number $i$ of users to switch over.
From then on, users may switch from F to M probabilistically: if they
perceive a majority of their friends using M, then they may switch to posting from there.
Let's simplify by assuming that users never switch back.

In what scenarios will M take over, and in what scenarios will M fail?
It will depend on the graph connections, edge density, and the relationship between
the number of users observed and the probability of switching.

In [3]:
from pregel import Vertex, Pregel
import random
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

We set up the usual items we need for Pregel: a stats function and a Vertex subclass. The Vertex subclass encapsulates all the behaviour of the individual vertices.

In [88]:
def social_stats(vertices):
    # return a tuple: (superstep, proportion on F, proportion on M)
    superstep = vertices[0].superstep
    n = len(vertices)
    pF = sum(v.value == "F" for v in vertices) / float(n)
    pM = sum(v.value == "M" for v in vertices) / float(n)
    return superstep, pF, pM

class SocialVertex(Vertex):
    def update(self):
        if self.superstep < nsteps:
            
            # First, check whether to switch
            if self.value == "M":
                # we have already switched
                pass
            else:
                
                # let's see what our friends are doing
                f = 0
                m = 0
                for v, w, msg in self.incoming_messages:
                    if msg == "F":
                        f += 1
                    elif msg == "M":
                        m += 1
                    else:
                        raise ValueError
                if switch(f, m):
                    self.value = "M"
                    
            # Next, send out messages to some of our friends
            self.outgoing_messages = [(v,1,self.value) for v in self.out_vertices
                                     if random.random() < a]


        else:
            self.active = False

Our main function is going to accept a NetworkX graph, convert it to Pregel's format, and run the simulation.

In [91]:
def social_main(G, i):
    # G is a NetworkX graph
    # i is the number of nodes which should be randomly assigned to M at the start
    vertices = [SocialVertex(j, "F", [], [])
                for j in range(G.order())]
    initial_F = random.sample(G.nodes(), i)

    for i, v in zip(G.nodes(), vertices):
        v.out_vertices = [vertices[j] for j in G.neighbors(i)]
        if i in initial_F:
            v.value = "M"
            
    p = Pregel(vertices,stats_fn=social_stats)
    p.run()
    return p.data

In this simulation, we need a probabilistic function to say whether to switch -- given a current observation of others.

In [95]:
def switch(f, m):
    # f and m are the *numbers* of our friends we 
    # have perceived using F and M respectively in the current time-step.
    # return True if we switch from F to M, False otherwise.
    
    # in this implementation, we check that m is larger than f.
    # if so, we switch with a certain probability
    if m > f:
        if random.random() < sp:
            return True
    return False

All the interesting details go here. The following parameters affect the behaviour of each vertex, and affect the graph connectivity too. We'll print out two interesting properties of the graph, which strongly affect the behaviour in a simulation like this: the edge density, and the clustering.

In [109]:
n = 1000 # number of nodes
m = 5 # number of edges per node
rp = 0.3 # rewiring probability in Watts-Strogatz

i = 10 # number of nodes initially on M
a = 0.4 # activity: 1.0 means we hit all our friends with messages every day
sp = 0.9 # switch probability (when we perceive a majority have switched)
nsteps = 30 # how many time-steps to run

G = nx.watts_strogatz_graph(n, m, rp)

print(nx.density(G))
print(nx.average_clustering(G))
social_main(G, i)

0.004004004004004004
0.19647142857142808


[(0, 0.99, 0.01),
 (1, 0.988, 0.012),
 (2, 0.985, 0.015),
 (3, 0.977, 0.023),
 (4, 0.971, 0.029),
 (5, 0.966, 0.034),
 (6, 0.957, 0.043),
 (7, 0.948, 0.052),
 (8, 0.939, 0.061),
 (9, 0.933, 0.067),
 (10, 0.927, 0.073),
 (11, 0.92, 0.08),
 (12, 0.912, 0.088),
 (13, 0.902, 0.098),
 (14, 0.89, 0.11),
 (15, 0.88, 0.12),
 (16, 0.866, 0.134),
 (17, 0.849, 0.151),
 (18, 0.821, 0.179),
 (19, 0.8, 0.2),
 (20, 0.775, 0.225),
 (21, 0.745, 0.255),
 (22, 0.715, 0.285),
 (23, 0.684, 0.316),
 (24, 0.652, 0.348),
 (25, 0.615, 0.385),
 (26, 0.573, 0.427),
 (27, 0.527, 0.473),
 (28, 0.471, 0.529),
 (29, 0.423, 0.577),
 (30, 0.423, 0.577)]

Exercises
---

1. Try changing `m`. What is the relationship between `m`, density, and success or failure of M?
2. Try changing `rp`. What is the relationship between `rp`, clustering, and the success or failure of M?
3. Try choosing the initial $i$ nodes of M from some related nodes, instead of from random nodes. Does it make a difference?