### MODELS OF INFLUENCE & DIFFUSION ON PRODUCT ADOPTION
This notebook will explain about the problem of finding influencer to promote new product adoptions to the target customers.
We will use a facebook datasest available on http://snap.stanford.edu/data/ego-Facebook.html to compute the expected number of facebook users that will adopt the pruduct due to the influence of their freinds.

The information diffusion process about the product adoption will be computed using Independent Cascade Model(ICM) and Linear Threshold Model. Different metrics such as degree centrality, closeness centrality will be used to find the source nodes (facebook users) that will be treated as the influencer for the product difusssion.


In [189]:
#import libraries
%matplotlib inline
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from random import randint
import ndlib.models.ModelConfig as mc
import ndlib.models.epidemics as ep
from bokeh.io import output_notebook, show
from ndlib.viz.bokeh.DiffusionTrend import DiffusionTrend

In [190]:
#import datasets
#read the datasets
facebook = pd.read_csv(
    "facebook_combined.txt.gz",
    compression="gzip",
    sep=" ",
    names=["start_node", "end_node"],
)
facebook

Unnamed: 0,start_node,end_node
0,0,1
1,0,2
2,0,3
3,0,4
4,0,5
...,...,...
88229,4026,4030
88230,4027,4031
88231,4027,4032
88232,4027,4038


In [191]:
#creating the graph
G = nx.from_pandas_edgelist(facebook, "start_node", "end_node")

In [192]:
G = nx.from_pandas_edgelist(facebook, 
                            source='start_node',
                            target='end_node',
                            create_using=nx.DiGraph())
                            
#nx.draw_networkx(G)

In [193]:
#total number of facebook users available in datasets
G.number_of_nodes()

4039

###  Information Diffusion Models

In this section we will implement two popular basic model using NDlib which is a python libraries for the describing, simulate, and study diffusion processes on complex networks


1. Linear Threshold Model(LT): 
It defines a threshold-based behavior, where the influence accumulates from multiple neighbors of the node, which becomes activated only if the cumulative influence passed a certain threshold. In our problem, this behaviour might eventually convince some facebook users to adopt the new product after hearing a lot about if from their freinds.

2. Independent Cascade model(IC):
In this model the active neighbors has a probabilistic and independent chance to activate the node. This resembles a viral virus spread, such as in Covid-19, where each of the social interactions might trigger the infection.


### Linear Threshold (LT)
The model works as follows; each node has its own threshold; during a generic iteration every node is observed: if the percentage of its infected neighbors is greater than its threshold it becomes infected as well. 

During the simulation a node can have one among the two statuses code; O means Susceptible and 1 means infected

In [194]:
def linear_threshold(g,s):
    # Model selection
    model = ep.ThresholdModel(g)
    
    # Model Configuration
    config = mc.Configuration()
    
    # Setting node parameters
    config.add_model_initial_configuration("Infected", s)
    threshold = 0.25
    for i in g.nodes():
        config.add_node_configuration("threshold", i, threshold)

    model.set_initial_status(config)

    # Simulation execution

    iterations = model.iteration_bunch(20)
    trends = model.build_trends(iterations)
    
    #visualizing the results

    viz = DiffusionTrend(model, trends)
    p = viz.plot(width=400, height=400)
    output_notebook()
    show(p)
    
    return iterations

### Independent Cascade (IC)
This model starts with an initial set of active nodes A0, the diffusive process work as follows:
1. When node v becomes active in step t, it is given a single chance to activate each currently inactive neighbor w; it succeeds with a probability p(v,w).
2. If w has multiple newly activated neighbors, their attempts are sequenced in an arbitrary order.
3. If v succeeds, then w will become active in step t + 1; but whether or not v succeeds, it cannot make any further attempts to activate w in subsequent rounds.
4. The process runs until no more activations are possible.

During the simulation a node can have one among the three statuses code; O means Susceptible and 1 means infected and 2 means removed


In [214]:
def independent_cascade(g,s):
    # Model selection
    model = ep.IndependentCascadesModel(g)
    
    # Model Configuration
    config = mc.Configuration()
    
    # Setting node parameters
    config.add_model_initial_configuration("Infected", s)
    threshold = 0.25
    for i in g.nodes():
        config.add_node_configuration("threshold", i, threshold)

    model.set_initial_status(config)

    # Simulation execution
    iterations = model.iteration_bunch(20)
    #trends = model.build_trends(iterations)
    
    #visualizing the results

    viz = DiffusionTrend(model, trends)
    p = viz.plot(width=400, height=400)
    output_notebook()
    show(p)
    
    return iterations

In [210]:
#this fuction will output the infected node by each iteration

def getting_infected_node(d):
    data = {'iteration':[],'infected_nodes': []}
    n=len(d)
    
    for i in range(1,n):
        infected_nodes=[]

        for key,value in d[i]['status'].items():
            if value == 1:
                infected_nodes.append(key)
        data['iteration'].append(i)
        data['infected_nodes'].append(len(infected_nodes))
            

    df = pd.DataFrame(data,index=None)       
    return df


In [211]:
data= getting_infected_node(lc_threshold)

In [205]:
#this fuction will visualize the results of the model, number of infected nodes with respect to time

def visualizer(d):
    model_results = {}
    
    n=len(d)
    infected_nodes=[]
    for i in range(1,n):
        count=0
        print("Iteration ",i)
        for key,value in d[i]['status'].items():
            infected_nodes = [key for key,value in d[i]['status'].items() if value ==1 ]
            #model_results['iteration']= i
            #model_results['no_infected']= len(infected_nodes)
            #print(infected_nodes)
    print(len(infected_nodes))
    print("=================================================================")        
        #print(model_results['no_infected'])
        #count+=len(infected_nodes)
        #return count
    

In [125]:
visualizer(lc_threshold)

Iteration  1
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
21

2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103
2103


In [212]:
def visualizer(data):
    x = np.linspace(data)
    fig, ax = plt.subplots(figsize=(10, 8))
    ax.plot(x, x, label='linear')  # Plot some data on the axes.
    ax.plot(x, x**2, label='quadratic')  # Plot more data on the axes...
    ax.plot(x, x**3, label='cubic')  # ... and some more.
    ax.set_xlabel('x label')  # Add an x-label to the axes.
    ax.set_ylabel('y label')  # Add a y-label to the axes.
    ax.set_title("Simple Plot")  # Add a title to the axes.
    ax.legend();  # Add a legend.

In [213]:
visualizer(data)

TypeError: _linspace_dispatcher() missing 1 required positional argument: 'stop'

In [None]:
#this function will compare the results of different model with respect to the Initial infected nodes
def plot(d):  # bar plot of final results

    names = list(d.keys())
    values = list(d.values())

    plt.bar(range(len(d)), values, tick_label=names)
    plt.xticks(rotation=45)
    plt.show()

### Choosing The Initial Infected Nodes
In both model(LT,IC) the initial infection status can be defined in two ways:

1. fraction_infected: Model Parameter, float in [0, 1], this set the initial set of infected nodes as 1% of the overall population, and assign a threshold of 0.1 to all the edges as a default values in not given.To set the initial infected node add this line in model configuration

         config.add_model_parameter('fraction_infected', 0.1)
    
2. Infected: Status Parameter, set of nodes, you can choose some of the node in your datasets randomly, or by using some metrics such as degree centrality, betweeness centrality, closeness centrality. Choosing between these measures depend on the nature of your datasets and the problem you want to solve. To set initial nodes in your model add the follwoing line in node parameters setting, where s represent the list of selected node. 

         config.add_model_initial_configuration("Infected", s)

Since we are using facebook datasets which is social network we will use the following measures:

1. Degree Centrality
2. Closeness Centrality
3. Betweeness Centrality

With respect to our problem we want to find the facebook users that might become influencer for the product adoption





In [27]:
#finding initial active nodes
def initial_infected_node(measure,no_nodes):
    return list(dict(sorted(measure.items(), key=lambda item: item[1],reverse=True)).keys())[:no_nodes]

In [28]:
#dictionary to store the iterations among the degree centrality,closeness and betweeness
results_dictionary = {}

#### Choosing The Source Node by  Degree Centrality
Degree Centrality is measured as the number of direct links that involve a given node. Since our object is to promote the product adoption, we will find a 400 facebook users with the highest degree centrality among all. In this case we will focus more in out- degreee centrality, because the propagation of information need to be focused from an influencer to the community.

In [116]:
S_out_degree = initial_infected_node(nx.out_degree_centrality(G),400)

In [117]:
lc_threshold=linear_threshold(G,S_out_degree)

In [120]:
lc_threshold

[{'iteration': 0,
  'status': {0: 1,
   1: 0,
   2: 0,
   3: 0,
   4: 0,
   5: 0,
   6: 0,
   7: 0,
   8: 0,
   9: 0,
   10: 0,
   11: 0,
   12: 0,
   13: 0,
   14: 0,
   15: 0,
   16: 0,
   17: 0,
   18: 0,
   19: 0,
   20: 0,
   21: 1,
   22: 0,
   23: 0,
   24: 0,
   25: 1,
   26: 1,
   27: 0,
   28: 0,
   29: 0,
   30: 0,
   31: 0,
   32: 0,
   33: 0,
   34: 0,
   35: 0,
   36: 0,
   37: 0,
   38: 0,
   39: 0,
   40: 0,
   41: 0,
   42: 0,
   43: 0,
   44: 0,
   45: 0,
   46: 0,
   47: 0,
   48: 0,
   49: 0,
   50: 0,
   51: 0,
   52: 0,
   53: 0,
   54: 0,
   55: 0,
   56: 1,
   57: 0,
   58: 0,
   59: 0,
   60: 0,
   61: 0,
   62: 0,
   63: 0,
   64: 0,
   65: 0,
   66: 0,
   67: 1,
   68: 0,
   69: 0,
   70: 0,
   71: 0,
   72: 0,
   73: 0,
   74: 0,
   75: 0,
   76: 0,
   77: 0,
   78: 0,
   79: 0,
   80: 0,
   81: 0,
   82: 0,
   83: 0,
   84: 0,
   85: 0,
   86: 0,
   87: 0,
   88: 0,
   89: 0,
   90: 0,
   91: 0,
   92: 0,
   93: 0,
   94: 0,
   95: 0,
   96: 0,
   97: 0,
  

In [148]:
#visualizer(threashold_iterations)

In [46]:
independent_cascade(G,S_out_degree)



#### Choosing The Source Node by Closeness Centrality
Is the shortest path between a node and all other reacheable nodes. It measure of how long it will take for information to spread from given node to other nodes in the network. Here we want to select the facebook user with the highest closeness centrality among all the connected node.


In [47]:
S_closeness = initial_infected_node(nx.closeness_centrality(G),400)

In [48]:
threashold_iterations = linear_threshold(G,S_closeness)
#results_dictionary['Closeness Centrality'] = threashold_iterations['node_count']



In [49]:
ic_iterations = independent_cascade(G,S_closeness)



#### Choosing The Source Node by  Betweeness Centrality
Betweenness centrality measures the extent to which a node plays this bridging role in a network. Specifically, betweenness centrality measures the extent that the user falls on the shortest path between other pairs of users in the network. The more people depend on a user to make connections with other people, the higher that user's betweenness centrality becomes. Our goal is to propagate the product adoption among different community, therefore getting facebook users with the highest betweeness centrality will speed up the process of information diffusion easily.

In [50]:
S_betweeness = initial_infected_node(nx.betweenness_centrality(G),400)

In [51]:
threashold_iterations = linear_threshold(G,S_betweeness)



In [52]:
ic_iterations = independent_cascade(G,S_betweeness)

